Multi‑Query Attention (MQA)

An attention optimization that shares key/value across heads for lower memory and faster decoding.

Related terms

GQAFlashAttention