Latency (AI Systems)

The time between sending a request and receiving a response. Optimizations span model choice, prompt size, retrieval efficiency, batching, and parallelization.