Pergunta de entrevista da empresa Oracle

Describe 3 different optimisations applied to LLM inference.

Resposta da entrevista

Sigiloso

7 de jul. de 2025

KV caching, speculative decoding, operator fusion