Pergunta de entrevista da empresa NVIDIA

Tell me how you can conserve GPU memory when running inference on LLMs.