ML Basics Questions: * How to evaluate an LLM-based system — both offline and online * How to reduce model latency by decreasing parameters or using other techniques without significant information loss * What additional approaches can be used to improve model serving latency * How the teacher–student (knowledge distillation) approach works