LLM Router with Fallback Chains and Circuit Breakers
How to build a production LLM gateway that routes across Ollama, Groq, Gemini, and OpenRouter with automatic failover, circuit breakers, and a unified OpenAI-compatible API.
LLM Router with Fallback Chains and Circuit Breakers Running one LLM provider in production is a single point of failure. Rate limits, outages, and model deprecations happen. This is the exact router we run in our AI infrastructure — a unified gateway that tries providers in order and fails over automatically. Architecture Each provider speaks slightly different JSON. The router normalises everything to OpenAI-compatible format so callers don't know or care which backend answered. Circuit Breaker Implementation Key pattern: always instantiate at module level, never per-request. The breaker's state must persist across calls. Provider Normalisation Each provider has quirks. Groq adds extra fields. Ollama uses a different finish_reason. The router strips all of it: Fallback Chain Cost Discipline Ollama runs locally $0. Groq is free tier. Gemini free tier. OpenRouter is the paid safety net that almost never fires. This stack runs our entire AI infrastructure for ~$0/month on most months. The router also logs which provider answered each request — essential for debugging why a response quality dropped Groq answered instead of Ollama means Ollama was down. Exposing as a Unified API All internal services — brain, categoriser, deep reader — call this one endpoint. Swapping a provider means changing one URL in settings, nothing else. --- Our Stack Runs on This Every AI system we build — voice pipelines, RAG, agents — routes through this gateway. It's why we can promise reliability: if one provider goes down, the system keeps working. No single point of failure. This is the kind of infrastructure we bring to client projects. Not a wrapper around one API — a resilient, cost-conscious system designed to keep running. See our AI infrastructure services →https://appopoleis.com/services Let's talk about your project →https://appopoleis.com/contact --- Related Reading - RAG with pgvector and Supabase: From Zero to Semantic Searchhttps://appopoleis.com/blog/rag-pgvector-supabase-sem
Appopoleis Team