Pergunta de entrevista da empresa Google

How would you optimize a large- scale model for real-time inference while maintaining a strict latency budget