Quantization

Reducing numerical precision of model weights/activations (e.g., FP16 → INT8) to lower memory footprint and increase inference speed, often with minimal quality loss.