Continuous batching from first principles

omnivore inference

Read on Omnivore | Read Original

Highlights

One of the most impactful optimizations is continuous batching, which attempts to maximize performance by processing multiple conversations in parallel and swapping them out when they are done. ⤴️