Multi-Head Attention

Attention mechanism that runs multiple attention operations in parallel, allowing models to focus on different aspects simultaneously.