vLLM

An inference engine with PagedAttention for efficient serving of LLMs at high throughput.