LLM Gateways: The Missing Infrastructure Layer for Production AI
Your application doesn't need another wrapper. It needs a control plane.
The Problem: LLM Calls Are Not Just API Calls
When you first integrate an LLM into your application, it feels simple. You install the OpenAI SDK, pass a prompt, get a response. Ship it.
Then reality hits.
You want to try Claude for summarization because it's cheaper. Now you have two SDKs, two authentication flows, two response formats. A teammate adds Mistral for a classification task. Someone else wants to experiment with Llama running on Bedrock. Suddenly, your "simple" integration has become a tangle of provider-specific code scattered across your codebase.
But the API sprawl is only the surface. The deeper problems creep in once LLM calls sit on the critical path of your system:
Cost blindness. Token-based pricing is unpredictable. A single runaway loop can burn through your monthly budget in minutes. Without centralized tracking, you have no idea which team, feature, or prompt is responsible for the spend.
Fragile reliability. LLM providers have outages. Rate limits get hit. Latency spikes. Without retries, fallbacks, and circuit breaking, your application is only as reliable as the weakest provider endpoint you depend on.
Zero observability. Traditional APM tools can tell you an HTTP call took 3 seconds. They can't tell you that it consumed 4,200 tokens, cost $0.12, and the response quality degraded because you hit a rate limit and silently fell back to a weaker model.
Governance gaps. As LLM usage grows across teams, you need to answer questions like: who has access to which models? Are we leaking PII in prompts? Is anyone using unapproved providers? Without a centralized layer, these questions are nearly impossible to answer.
This is the problem space LLM gateways were built to solve.
What Is an LLM Gateway?
An LLM gateway is a centralized service that sits between your applications and LLM providers. Instead of your code talking directly to OpenAI, Anthropic, or a self-hosted model, requests flow through the gateway. The gateway then handles routing, authentication, retries, caching, spend tracking, and observability — all without your application needing to know the details.
Think of it as what an API gateway (like Kong or Nginx) does for microservices, but purpose-built for the unique demands of LLM traffic: token-based billing, streaming responses, semantic caching, prompt-level security, and multi-model routing.
From the application's perspective, there's one stable interface. From the platform team's perspective, there's one place to observe, control, and govern all LLM usage across the organization.
The Contenders: LiteLLM, Portkey, OpenRouter, and Kong AI Gateway
Not all gateways solve the same problems in the same way. Let's break down four popular options and what makes each one distinct.
1. LiteLLM — The Open-Source Swiss Army Knife
What it is: An open-source proxy and SDK that provides a unified, OpenAI-compatible API across 100+ LLM providers.
Why choose it:
- Full self-hosted control. You deploy it on your own infrastructure. Your data never leaves your network. For teams in regulated industries or with strict data residency requirements, this matters enormously.
- Provider breadth. LiteLLM supports virtually every major provider — OpenAI, Anthropic, Bedrock, Vertex, Mistral, Ollama, and many more — behind a single
completion()call. - Cost tracking built in. Automatic spend tracking across providers, with the ability to log costs to S3, GCS, or your data warehouse. You can set budgets per team or per API key.
- Open source (MIT). The core is free. You can inspect the code, contribute, and customize. Enterprise features like SSO, JWT auth, and audit logging are available as paid add-ons.
The trade-off: LiteLLM is infrastructure you operate. You're responsible for scaling, availability, and monitoring. Observability is basic out of the box — you'll likely want to pair it with something like Langfuse for deeper tracing and evaluation. There's no native enterprise governance (RBAC, workspaces, approval workflows) without additional tooling.
Best for: Platform teams that want maximum control and flexibility, are comfortable managing infrastructure, and need to give internal developers unified access to many LLMs with cost guardrails.
2. Portkey — The Production Control Plane
What it is: A managed AI gateway and observability platform designed for production GenAI workloads, supporting 1,600+ models.
Why choose it:
- Enterprise-grade out of the box. Portkey ships with features many teams would otherwise spend months building: RBAC, workspaces, audit trails, SSO/SCIM, and data residency controls.
- Deep observability. Detailed logs, latency metrics, token and cost analytics — broken down by app, team, or model. This is not an afterthought; it's core to the product.
- Guardrails and security. Request and response filters, jailbreak detection, PII redaction, and policy-based enforcement are built in. If compliance is a first-class concern, Portkey addresses it natively.
- Prompt management. Reusable templates, variable substitution, versioning, and environment promotion (dev → staging → prod) for prompts.
- Reliability primitives. Automatic retries, fallbacks with exponential backoff, and configurable routing across providers.
The trade-off: Portkey introduces a managed layer into your architecture. It's more opinionated than LiteLLM, which means less customization but faster time to production. Advanced governance features sit in higher-tier paid plans. For lightweight prototyping, it can feel heavier than needed.
Best for: Product and engineering teams building production LLM applications that need reliability, cost control, and compliance without building the platform layer themselves.
3. OpenRouter — The Model Marketplace
What it is: A developer-focused gateway that provides a single API for accessing 280+ models across providers, abstracting billing and credentials behind a unified endpoint.
Why choose it:
- Zero infrastructure. There's nothing to deploy. Point your OpenAI SDK to OpenRouter's base URL, and you immediately have access to models from OpenAI, Anthropic, Mistral, Meta, Google, and dozens of open-source providers.
- Effortless experimentation. Want to compare GPT-4o against Claude Sonnet against Llama 3? Change the model string in your request. No new accounts, no new API keys, no provider-specific SDKs.
- Automatic failover. Requests can be transparently routed around provider outages to maintain availability.
- Unified billing. One account, one bill, regardless of how many providers you use.
The trade-off: OpenRouter adds a 5% markup on requests — that's the cost of convenience. Observability is limited; there's no deep tracing, token-level debugging, or per-team cost attribution. Governance and access controls are minimal, making it difficult to use as an internal platform for large organizations. You're also trusting a third party with your prompts and API traffic.
Best for: Individual developers and small teams in the experimentation and prototyping phase who prioritize model flexibility and speed of iteration over infrastructure control or enterprise governance.
4. Kong AI Gateway — The Enterprise API Gateway, Extended
What it is: AI-specific capabilities built into Kong Gateway (the widely-deployed open-source API gateway), delivered as a suite of plugins.
Why choose it:
- Leverage existing infrastructure. If your organization already runs Kong for API management, adding AI gateway capabilities is an incremental step — not a new tool. All 1,000+ existing Kong plugins (auth, rate limiting, transformations, logging) work alongside AI traffic.
- Semantic intelligence. Kong's AI plugins go beyond basic proxying. Semantic caching reduces redundant LLM calls. Semantic routing dispatches requests to the best model based on prompt content. A prompt guard enforces topic-level allow/deny lists.
- Security and compliance. PII sanitization across 18 languages, integration with AWS Bedrock Guardrails, Azure AI Content Safety, and Google Cloud Model Armor. Prompt injection detection. These are production-grade security features.
- MCP and A2A support. Kong has moved aggressively into supporting Model Context Protocol and Agent-to-Agent workflows, making it a strong choice for teams building agentic systems.
- Deployment flexibility. Self-hosted, Kubernetes-native (via Kong Ingress Controller), hybrid, or managed through Kong Konnect.
The trade-off: Kong is a general-purpose API gateway with AI capabilities bolted on through plugins. For teams that don't already use Kong, the operational overhead of running a full Kong deployment may be excessive. Advanced AI-specific features (token-based rate limiting, advanced analytics) are locked behind enterprise tiers. The per-service licensing model can get expensive as you add model endpoints.
Best for: Enterprise teams that already use Kong Gateway and want to extend it to govern LLM traffic alongside existing API infrastructure, especially in regulated environments.
How They Compare at a Glance
| Dimension | LiteLLM | Portkey | OpenRouter | Kong AI Gateway |
|---|---|---|---|---|
| Deployment | Self-hosted (open source) | Managed SaaS | Managed SaaS | Self-hosted / Managed (Konnect) |
| Provider support | 100+ | 1,600+ | 280+ | Major providers via plugins |
| Cost model | Free (OSS) / Enterprise paid | Starts ~$49/mo | 5% markup on usage | Free (OSS) / Enterprise licensed |
| Observability | Basic (needs Langfuse) | Deep, native | Limited | Metrics via plugins + OTEL |
| Governance | Minimal native | Strong (RBAC, SSO, audit) | Minimal | Strong (enterprise tier) |
| Security | Basic | Guardrails, PII, jailbreak detection | Basic | PII, prompt guard, RAG, Bedrock/Azure guardrails |
| Best fit | Platform teams, self-hosters | Production AI teams | Prototyping, experimentation | Enterprises with existing Kong |
Where Langfuse Fits: The Observability Layer
Here's the critical insight: a gateway routes your requests, but Langfuse helps you understand them.
Langfuse is an open-source LLM observability platform that provides tracing, monitoring, evaluation, and debugging for LLM applications. It's not a gateway — it doesn't route traffic. Instead, it ingests telemetry from your gateway (and your application code) and gives you deep visibility into what's happening across your LLM stack.
The good news is that Langfuse integrates natively with all four gateways discussed above.
LiteLLM + Langfuse
This is one of the most popular pairings in the ecosystem. LiteLLM supports Langfuse as a callback target via OpenTelemetry. Set your Langfuse credentials as environment variables, add litellm.callbacks = ["langfuse_otel"], and every LLM call flowing through LiteLLM is automatically traced in Langfuse — with token counts, latencies, costs, and model metadata. You can also send logs from the LiteLLM Proxy directly, meaning every request from every team member gets captured without any SDK changes on their side.
Portkey + Langfuse
Portkey's API is OpenAI-compatible, so you can use Langfuse's OpenAI SDK wrapper (from langfuse.openai import OpenAI) and point it at Portkey's gateway URL. Every request gets dual visibility: Portkey's native analytics for routing and reliability, plus Langfuse's tracing for prompt-level debugging and evaluation. This pairing gives you the best of both worlds — Portkey for traffic control, Langfuse for deep observability.
OpenRouter + Langfuse
OpenRouter supports a "Broadcast" feature that automatically sends traces to Langfuse without any code changes. You connect your Langfuse API keys in your OpenRouter settings, and all requests are traced. For teams that want more control — custom metadata, nested tracing, session grouping — you can use Langfuse's OpenAI SDK wrapper since OpenRouter follows the OpenAI API schema.
Kong AI Gateway + Langfuse
Kong integrates with Langfuse through an ai-tracing plugin. Once configured with your Langfuse API keys, the plugin automatically captures every AI request proxied through Kong and creates traces in Langfuse. You can enrich traces with user IDs, session IDs, and organization metadata via HTTP headers. Because it's a Kong plugin, it works alongside all other Kong capabilities (auth, rate limiting, logging) with zero application code changes.
Why This Matters
The gateway gives you the operational layer: routing, fallbacks, cost controls, security. Langfuse gives you the intelligence layer: understanding prompt quality, debugging regressions, evaluating model outputs, tracking experiments over time. Together, they form a complete platform for running LLMs in production.
Without Langfuse (or equivalent observability), you can route and control your LLM traffic, but you're flying blind on quality. Did that model switch degrade user experience? Is the new prompt template actually better? Which conversations are hitting guardrails? These are the questions Langfuse answers.
So, Which Should You Pick?
There's no single right answer. The choice depends on where you are in your journey:
You're experimenting and iterating fast → Start with OpenRouter. Zero setup, instant access to hundreds of models. Pair with Langfuse (via Broadcast) to start building observability habits early.
You're building an internal LLM platform for your team → Go with LiteLLM. Self-host it, configure budgets and access per team, and integrate Langfuse for the observability LiteLLM doesn't natively provide. This is the stack companies like Lemonade and RocketMoney have adopted.
You're shipping a production AI product and need reliability + compliance → Choose Portkey. Its managed approach means less operational burden, and its native guardrails, RBAC, and prompt management will save you months of build time. Add Langfuse for deeper tracing and evaluation workflows.
You already run Kong and need to govern LLM traffic alongside APIs → Extend with Kong AI Gateway. You get enterprise security, semantic routing, and MCP support without introducing a new tool. The Langfuse plugin gives you AI-specific observability on top of Kong's existing monitoring.
The LLM gateway space is maturing rapidly. The teams that invest in this infrastructure layer now — routing, observability, governance — will be the ones that can move fastest as models improve and use cases multiply. The gateway handles the plumbing. Langfuse ensures you can see what's flowing through it. Together, they turn LLM usage from a series of isolated API calls into a managed, observable, improvable system.
Start with the gateway that matches your constraints today. Add Langfuse from day one. Iterate from there.