TensorRT‑LLM

NVIDIA’s inference stack optimizing transformer execution on GPUs for low latency and high throughput.