Speculative Decoding

Speedup technique where a small draft model proposes tokens that a larger target model verifies.

Related terms