Evals (Model Evaluation)

Task‑specific tests that measure quality, robustness, and safety of model outputs on real workloads. Strong evals guide model, prompt, and guardrail choices.