Tuesday, 24 February 2026

 

LLM Gateways: The Missing Infrastructure Layer for Production AI

Your application doesn't need another wrapper. It needs a control plane.


The Problem: LLM Calls Are Not Just API Calls

When you first integrate an LLM into your application, it feels simple. You install the OpenAI SDK, pass a prompt, get a response. Ship it.

Then reality hits.

You want to try Claude for summarization because it's cheaper. Now you have two SDKs, two authentication flows, two response formats. A teammate adds Mistral for a classification task. Someone else wants to experiment with Llama running on Bedrock. Suddenly, your "simple" integration has become a tangle of provider-specific code scattered across your codebase.

But the API sprawl is only the surface. The deeper problems creep in once LLM calls sit on the critical path of your system:

Cost blindness. Token-based pricing is unpredictable. A single runaway loop can burn through your monthly budget in minutes. Without centralized tracking, you have no idea which team, feature, or prompt is responsible for the spend.

Fragile reliability. LLM providers have outages. Rate limits get hit. Latency spikes. Without retries, fallbacks, and circuit breaking, your application is only as reliable as the weakest provider endpoint you depend on.

Zero observability. Traditional APM tools can tell you an HTTP call took 3 seconds. They can't tell you that it consumed 4,200 tokens, cost $0.12, and the response quality degraded because you hit a rate limit and silently fell back to a weaker model.

Governance gaps. As LLM usage grows across teams, you need to answer questions like: who has access to which models? Are we leaking PII in prompts? Is anyone using unapproved providers? Without a centralized layer, these questions are nearly impossible to answer.

This is the problem space LLM gateways were built to solve.


What Is an LLM Gateway?

An LLM gateway is a centralized service that sits between your applications and LLM providers. Instead of your code talking directly to OpenAI, Anthropic, or a self-hosted model, requests flow through the gateway. The gateway then handles routing, authentication, retries, caching, spend tracking, and observability — all without your application needing to know the details.

Think of it as what an API gateway (like Kong or Nginx) does for microservices, but purpose-built for the unique demands of LLM traffic: token-based billing, streaming responses, semantic caching, prompt-level security, and multi-model routing.

From the application's perspective, there's one stable interface. From the platform team's perspective, there's one place to observe, control, and govern all LLM usage across the organization.


The Contenders: LiteLLM, Portkey, OpenRouter, and Kong AI Gateway

Not all gateways solve the same problems in the same way. Let's break down four popular options and what makes each one distinct.

1. LiteLLM — The Open-Source Swiss Army Knife

What it is: An open-source proxy and SDK that provides a unified, OpenAI-compatible API across 100+ LLM providers.

Why choose it:

  • Full self-hosted control. You deploy it on your own infrastructure. Your data never leaves your network. For teams in regulated industries or with strict data residency requirements, this matters enormously.
  • Provider breadth. LiteLLM supports virtually every major provider — OpenAI, Anthropic, Bedrock, Vertex, Mistral, Ollama, and many more — behind a single completion() call.
  • Cost tracking built in. Automatic spend tracking across providers, with the ability to log costs to S3, GCS, or your data warehouse. You can set budgets per team or per API key.
  • Open source (MIT). The core is free. You can inspect the code, contribute, and customize. Enterprise features like SSO, JWT auth, and audit logging are available as paid add-ons.

The trade-off: LiteLLM is infrastructure you operate. You're responsible for scaling, availability, and monitoring. Observability is basic out of the box — you'll likely want to pair it with something like Langfuse for deeper tracing and evaluation. There's no native enterprise governance (RBAC, workspaces, approval workflows) without additional tooling.

Best for: Platform teams that want maximum control and flexibility, are comfortable managing infrastructure, and need to give internal developers unified access to many LLMs with cost guardrails.


2. Portkey — The Production Control Plane

What it is: A managed AI gateway and observability platform designed for production GenAI workloads, supporting 1,600+ models.

Why choose it:

  • Enterprise-grade out of the box. Portkey ships with features many teams would otherwise spend months building: RBAC, workspaces, audit trails, SSO/SCIM, and data residency controls.
  • Deep observability. Detailed logs, latency metrics, token and cost analytics — broken down by app, team, or model. This is not an afterthought; it's core to the product.
  • Guardrails and security. Request and response filters, jailbreak detection, PII redaction, and policy-based enforcement are built in. If compliance is a first-class concern, Portkey addresses it natively.
  • Prompt management. Reusable templates, variable substitution, versioning, and environment promotion (dev → staging → prod) for prompts.
  • Reliability primitives. Automatic retries, fallbacks with exponential backoff, and configurable routing across providers.

The trade-off: Portkey introduces a managed layer into your architecture. It's more opinionated than LiteLLM, which means less customization but faster time to production. Advanced governance features sit in higher-tier paid plans. For lightweight prototyping, it can feel heavier than needed.

Best for: Product and engineering teams building production LLM applications that need reliability, cost control, and compliance without building the platform layer themselves.


3. OpenRouter — The Model Marketplace

What it is: A developer-focused gateway that provides a single API for accessing 280+ models across providers, abstracting billing and credentials behind a unified endpoint.

Why choose it:

  • Zero infrastructure. There's nothing to deploy. Point your OpenAI SDK to OpenRouter's base URL, and you immediately have access to models from OpenAI, Anthropic, Mistral, Meta, Google, and dozens of open-source providers.
  • Effortless experimentation. Want to compare GPT-4o against Claude Sonnet against Llama 3? Change the model string in your request. No new accounts, no new API keys, no provider-specific SDKs.
  • Automatic failover. Requests can be transparently routed around provider outages to maintain availability.
  • Unified billing. One account, one bill, regardless of how many providers you use.

The trade-off: OpenRouter adds a 5% markup on requests — that's the cost of convenience. Observability is limited; there's no deep tracing, token-level debugging, or per-team cost attribution. Governance and access controls are minimal, making it difficult to use as an internal platform for large organizations. You're also trusting a third party with your prompts and API traffic.

Best for: Individual developers and small teams in the experimentation and prototyping phase who prioritize model flexibility and speed of iteration over infrastructure control or enterprise governance.


4. Kong AI Gateway — The Enterprise API Gateway, Extended

What it is: AI-specific capabilities built into Kong Gateway (the widely-deployed open-source API gateway), delivered as a suite of plugins.

Why choose it:

  • Leverage existing infrastructure. If your organization already runs Kong for API management, adding AI gateway capabilities is an incremental step — not a new tool. All 1,000+ existing Kong plugins (auth, rate limiting, transformations, logging) work alongside AI traffic.
  • Semantic intelligence. Kong's AI plugins go beyond basic proxying. Semantic caching reduces redundant LLM calls. Semantic routing dispatches requests to the best model based on prompt content. A prompt guard enforces topic-level allow/deny lists.
  • Security and compliance. PII sanitization across 18 languages, integration with AWS Bedrock Guardrails, Azure AI Content Safety, and Google Cloud Model Armor. Prompt injection detection. These are production-grade security features.
  • MCP and A2A support. Kong has moved aggressively into supporting Model Context Protocol and Agent-to-Agent workflows, making it a strong choice for teams building agentic systems.
  • Deployment flexibility. Self-hosted, Kubernetes-native (via Kong Ingress Controller), hybrid, or managed through Kong Konnect.

The trade-off: Kong is a general-purpose API gateway with AI capabilities bolted on through plugins. For teams that don't already use Kong, the operational overhead of running a full Kong deployment may be excessive. Advanced AI-specific features (token-based rate limiting, advanced analytics) are locked behind enterprise tiers. The per-service licensing model can get expensive as you add model endpoints.

Best for: Enterprise teams that already use Kong Gateway and want to extend it to govern LLM traffic alongside existing API infrastructure, especially in regulated environments.


How They Compare at a Glance

DimensionLiteLLMPortkeyOpenRouterKong AI Gateway
DeploymentSelf-hosted (open source)Managed SaaSManaged SaaSSelf-hosted / Managed (Konnect)
Provider support100+1,600+280+Major providers via plugins
Cost modelFree (OSS) / Enterprise paidStarts ~$49/mo5% markup on usageFree (OSS) / Enterprise licensed
ObservabilityBasic (needs Langfuse)Deep, nativeLimitedMetrics via plugins + OTEL
GovernanceMinimal nativeStrong (RBAC, SSO, audit)MinimalStrong (enterprise tier)
SecurityBasicGuardrails, PII, jailbreak detectionBasicPII, prompt guard, RAG, Bedrock/Azure guardrails
Best fitPlatform teams, self-hostersProduction AI teamsPrototyping, experimentationEnterprises with existing Kong


Where Langfuse Fits: The Observability Layer

Here's the critical insight: a gateway routes your requests, but Langfuse helps you understand them.

Langfuse is an open-source LLM observability platform that provides tracing, monitoring, evaluation, and debugging for LLM applications. It's not a gateway — it doesn't route traffic. Instead, it ingests telemetry from your gateway (and your application code) and gives you deep visibility into what's happening across your LLM stack.

The good news is that Langfuse integrates natively with all four gateways discussed above.

LiteLLM + Langfuse

This is one of the most popular pairings in the ecosystem. LiteLLM supports Langfuse as a callback target via OpenTelemetry. Set your Langfuse credentials as environment variables, add litellm.callbacks = ["langfuse_otel"], and every LLM call flowing through LiteLLM is automatically traced in Langfuse — with token counts, latencies, costs, and model metadata. You can also send logs from the LiteLLM Proxy directly, meaning every request from every team member gets captured without any SDK changes on their side.

Portkey + Langfuse

Portkey's API is OpenAI-compatible, so you can use Langfuse's OpenAI SDK wrapper (from langfuse.openai import OpenAI) and point it at Portkey's gateway URL. Every request gets dual visibility: Portkey's native analytics for routing and reliability, plus Langfuse's tracing for prompt-level debugging and evaluation. This pairing gives you the best of both worlds — Portkey for traffic control, Langfuse for deep observability.

OpenRouter + Langfuse

OpenRouter supports a "Broadcast" feature that automatically sends traces to Langfuse without any code changes. You connect your Langfuse API keys in your OpenRouter settings, and all requests are traced. For teams that want more control — custom metadata, nested tracing, session grouping — you can use Langfuse's OpenAI SDK wrapper since OpenRouter follows the OpenAI API schema.

Kong AI Gateway + Langfuse

Kong integrates with Langfuse through an ai-tracing plugin. Once configured with your Langfuse API keys, the plugin automatically captures every AI request proxied through Kong and creates traces in Langfuse. You can enrich traces with user IDs, session IDs, and organization metadata via HTTP headers. Because it's a Kong plugin, it works alongside all other Kong capabilities (auth, rate limiting, logging) with zero application code changes.

Why This Matters

The gateway gives you the operational layer: routing, fallbacks, cost controls, security. Langfuse gives you the intelligence layer: understanding prompt quality, debugging regressions, evaluating model outputs, tracking experiments over time. Together, they form a complete platform for running LLMs in production.

Without Langfuse (or equivalent observability), you can route and control your LLM traffic, but you're flying blind on quality. Did that model switch degrade user experience? Is the new prompt template actually better? Which conversations are hitting guardrails? These are the questions Langfuse answers.


So, Which Should You Pick?

There's no single right answer. The choice depends on where you are in your journey:

You're experimenting and iterating fast → Start with OpenRouter. Zero setup, instant access to hundreds of models. Pair with Langfuse (via Broadcast) to start building observability habits early.

You're building an internal LLM platform for your team → Go with LiteLLM. Self-host it, configure budgets and access per team, and integrate Langfuse for the observability LiteLLM doesn't natively provide. This is the stack companies like Lemonade and RocketMoney have adopted.

You're shipping a production AI product and need reliability + compliance → Choose Portkey. Its managed approach means less operational burden, and its native guardrails, RBAC, and prompt management will save you months of build time. Add Langfuse for deeper tracing and evaluation workflows.

You already run Kong and need to govern LLM traffic alongside APIs → Extend with Kong AI Gateway. You get enterprise security, semantic routing, and MCP support without introducing a new tool. The Langfuse plugin gives you AI-specific observability on top of Kong's existing monitoring.

The LLM gateway space is maturing rapidly. The teams that invest in this infrastructure layer now — routing, observability, governance — will be the ones that can move fastest as models improve and use cases multiply. The gateway handles the plumbing. Langfuse ensures you can see what's flowing through it. Together, they turn LLM usage from a series of isolated API calls into a managed, observable, improvable system.

Start with the gateway that matches your constraints today. Add Langfuse from day one. Iterate from there.

Wednesday, 12 July 2017

Raspberrypi setup

Prerequisite:

1) Raspberry Pi device with data cable.
2) Card reader
3) Micro SD card. Here is the compatible documentation
4) A monitor with an HDMI interface
5) HDMI cable
6) USB keyboard
7) USB Mouse
8) Ethernet Cable [Optional]

Following are the steps to move with: 

1) Format SD card:
    Recommended way to format  by SD Card Formatter, you can download it from here

2) Let say we have to load RASPBIAN image onto SD Card, download image from here
 
3) Now its time to load image on SD Card

4) Open a MAC terminal window and then run command: diskutil list

5) Identify your removable drive device address, it will probably be look like the ones below:



/dev/disk0

#: TYPE NAME        SIZE       IDENTIFIER

0:GUID_partition_scheme *500.3 GB   disk0


/dev/disk2

#: TYPE NAME         SIZE     IDENTIFIER

0: Macintosh HD  *378.1 GB   disk1
 Logical Volume on disk0s2
   DAB8DF9E23-A4RR-420D-00R1-FRT67WE
Unlocked Encrypted

/dev/disk3
  #:       TYPE NAME     SIZE       IDENTIFIER
  0:  FDisk_partition_scheme *7.9 GB     disk3
    1:   DOS_FAT_32 AJ     7.9 GB     disk3s1


Note that your removable drive must be DOS_FAT_32 formatted. In this example, /dev/disk3 is the drive address of an 8GB SD card.

6) Unmount your SD card via command: diskutil unmountDisk <drive address>

7) When successful, you should see a message similar to this one:

Unmount of all volumes on <drive address> was successful

8) You can now copy/load the image onto SD card, using the following command:

sudo dd bs=32m if=<image file path> of=<drive address>

Thereafter you will see the following message output and also by pressing Ctrl+t you can see like below records in/out bytes

3420+1 records in
3420+1 records out
286665744 bytes transferred in 524.761215 secs (5069924 bytes/sec)

9) Now you can eject your removable drive. You are ready to install RASPBIAN image on your device.

In Part Two we will learn about how to use SSH and VNC as a virtually desktop on MACOSX.


Friday, 9 June 2017

Raspberry Pi : SSH and VNC : Virtual Desktop on your mac

In previous tutorial we have learnt How to raspberrypi setup?

Now its time to see its display for Raspberry Pi with Laptop. Below steps are with Macbook

1) Open a Terminal in RaspberryPi
   Enable SSH and VNC either from sudo rasp-config or Preferences=>Raspberry Pi Configuration => Interfaces

2) ifconfig and find ip address

3) Open Mac Terminal
  Connect via SSH ssh pi@192.168.1.236
  password: raspberry (by default)

  Now you will see you are connected to pi@raspberrypi:~ $

4) Now its time to get update packages, tightvnc server
   sudo apt-get update
   sudo apt-get install xrdp
   sudo apt-get install tightvncserver

5) cd .config

6) mkdir autostart
   nano tighvnc.desktop

   [Desktop Entry]
   Type=Application
   Name=TightVNC
   Exec=vncviewer :1
   StartupNotify=false

  save contents with above lines in same file.

7) cd /home/pi

8) Run tightvncserver

9) Download Chicken VNC client on macosx & connect with
    host:192.168.1.236
    Display: 1
    Password: raspberry(by default)

Now you are able to see Macintosh as Display for a Raspberry Pi.

Cheers!

Saturday, 29 April 2017

Why Katappa killed Bahuballi?

For those who can't wait for it badly:

Background:

Bahubali owns mashismati kingdom, bhallaldeva and bijjalladeva jealous about the victory.
So they actually planned both to kill rajmata in a dramatic way and Veer bahadur kumar verma (which is very close to bahubali and his wife Devsena) has listen this gossip and  bijjalladeva slapped his father to tell don't dig into this matter and emotionally character played by bijjalladeva in front of Veer Bahadur to provoke to kill his own son (Bhallaldeva) so that they will tell Rajmata that veer bahadur came to kill Bhallaldeva from Bahubali order. 
So below is the actual story:

Katappa: Is this happening with your consent? I beg you queen mother. Ask the king to take back his command. Even integrity can take a backseat. Bahubali has no mean bone in his body. The son you raised. He grew up drinking your milk. The Dharma you taught him runs in his blood.

Rajmaata: Bahubali has to die.

Katappa: No, I can’t do it. He gave sword to rajmata as punishment for my refusal. Please sever my head

Rajmata: Will you kill him? or shall I finish this task?

Katappa: No queen mother. Your hands should not be stained by that sin. I will kill him.

Bijjaladeva: Can we trust this Dog bhalla?

Katappa: and he killed bahubali 

More you will need to watch the film the actual scene :)

Cheers!
Enjoy



Tuesday, 14 February 2017

ionic2 based using Angular2 TypeScript

This tutorial is demonstrating "How to create ionic2 based app using Angular2 TypeScript"?

Here is the demo example with complete guide documentation process to install and run this demo project.

Happy Coding!

Monday, 13 February 2017

Mobile and Web Client sample code with OAuth2.0 | Symfony 2 RESTful API Project with FOSUserBundle, FOSRestBundle, FOSOAuthServerBundle

OAuth is an open standard for authorization. It provides client application secure access to server resources on behalf of resource owner.

OAuth2.0 focuses on client developer simplicity while providing specific authorization flows for web, desktop applications and mobile devices.


Here is the complete guide documentation process and demo example for Mobile and Web Client:

 AJOAuth2 - iOS

MSOAuth2 - Android

authOauth - Symfony2

Above sample code on different platforms are showing the process of authenticating against an OAuth 2 provider.


Cheers!!
~ Nerd Team


Friday, 3 February 2017

AFNetworking - How to handle HTTP status error codes and messages in failure block?

failure:^(NSURLSessionDataTask * _Nullable task, NSError * _Nonnull error) {
 NSLog(@"Failure: %@", error); 
 NSHTTPURLResponse *httpResponse = (NSHTTPURLResponse *)task.response;  
 NSLog(@"%zd", httpResponse.statusCode); 
 id errorJson = [NSJSONSerialization  JSONObjectWithData:error.userInfo[AFNetworkingOperationFailingURLResponseDataError  Key] options:0 error:nil];
 NSDictionary *errorJsonDict = (NSDictionary *)errorJson; 
 if (!errorJsonDict)    
    return;
 if ([errorJsonDict isKindOfClass:[NSDictionary class]] == NO) 
     NSAssert(NO, @"Expected an Dictionary, got %@",NSStringFromClass([errorJsonDict  class])); 

  NSLog(@"%@",errorJsonDict.description); 
}