Context Caching

A technique that reuses previously computed attention/key‑value states for repeated prefixes, reducing latency and cost in long or iterative prompts.