LLM and AI Agent Applications with Langchain and Langgraph – Part 29: Model Agnostic Patterns and LLM API Gateway

Author(s): michaelczarnecki

Originally published on Towards AI.

hello! In this part we are moving from experiments and prototypes to the real world – production deployment,

Because the truth is: creating a working notebook or proof of concept is only the beginning. The real challenges begin when your application must serve hundreds or thousands of users, run reliably 24/7 and still stay within budget.

Let’s start from the first base: a Model-agnostic approach,

Model-agnostic from day one

Many teams building AI applications quickly lock themselves into a single provider – only OpenAI, or only Anthropic. This is understandable: choosing and focusing on one API is faster. But this is a big risk in the long run. If the provider raises prices, cuts prices, or changes licensing terms – your entire application may grind to a halt.

So it’s worth thinking about it from the beginning Model-agnostic gateway layer,

In practice, this means that your code does not talk directly to a specific model. Instead, it states an abstraction:

“Give me a chat-class LLM”, or
“Give me an embedding generator”

And only the gateway decides whether under the hood it should call GPT-5, Cloud 4.5 Sonnet, or a local LLaMA running on your own infrastructure.

API Gateway + Routing + Fallback

The second basis is a api gateway,

Imagine you expose a simple endpoint POST /v1/chatWhere users send requests. like in the header X-ModelThe client specifies which model should be used.

The gateway can run multiple models in parallel – and it can also implement fallback logic: if the primary model does not respond within a certain time, you automatically switch to a backup model, for example an open-source model running locally.

This pattern not only improves reliability – it also opens the door to experimentation.

Without changing the entire system, you can route 1% of the traffic to the new model and see how it performs compared to the previous model.

monitoring and cost control

The third foundation – often neglected – is monitoring and cost control.

In a prototype it is enough to say “it works”. Difficult questions you will face in production:

How much does it cost per day?
What is our hallucination rate?
How often do we reject output?

This is where tools like langsmith Help – but a simple internal logging system can work too.

We measure latency (because users don’t want to wait 30 seconds), we measure cost, and we measure quality – for example: how many answers were rejected by guardrails or evaluation.

And we can set very simple but effective alerts:

If daily cost exceeds $50 → send a notification,
If the average response time goes above 5 seconds → trigger a second alert.

With this, you have real visibility into what is happening inside the system.

These three elements – Model-agnostic gateway, api gatewayAnd Supervision – Are not “good-natured”. They are the foundation. If you treat them seriously, your application will not only run in production, but also remain resilient to changes in the market and technology.

Now let’s come to the code.

Install libraries and load environment variables

!pip install -U langchain langchain-openai langgraph fastapi uvicorn

from dotenv import load_dotenv
load_dotenv()

human in the loop

from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain.agents import create_agent
from langchain.agents.middleware import HumanInTheLoopMiddleware
from langgraph.checkpoint.memory import MemorySaver
from langgraph.types import Command@tool
def risky_operation(secret: str) -> str:
"""Perform a risky operation that requires manual approval."""
return f"Executed risky operation with: {secret}"
tools = (risky_operation)
model = ChatOpenAI(model="gpt-4o-mini", temperature=0)
hitl = HumanInTheLoopMiddleware(
interrupt_on={
"risky_operation": {"allowed_decisions": ("approve", "edit", "reject")}
},
description_prefix="Manual approval required for risky operation:"
)
checkpointer = MemorySaver()
agent = create_agent(
model=model,
tools=tools,
middleware=(hitl),
checkpointer=checkpointer,
debug=True
)
config = {"configurable": {"thread_id": "hitl-demo-1"}}
result = agent.invoke(
{"messages": ({"role": "user", "content": "Please run the risky operation with secret code $%45654@."})},
config=config,
)

Output:

(values) {'messages': (HumanMessage(content='Please run the risky operation with secret code $%45654@.', additional_kwargs={}, response_metadata={}, id='589244c7-9860-48fa-b68a-eca595510a73'))}
(updates) {'model': {'messages': (AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 60, 'total_tokens': 79, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CaJj7md4CRaAN2mcI1ju8uek8BJti', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--35ad04bd-5d01-4649-a64c-d8c583ffe3aa-0', tool_calls=({'name': 'risky_operation', 'args': {'secret': '$%45654@'}, 'id': 'call_dK786IhVaO3Z4VssPOI1cM6y', 'type': 'tool_call'}), usage_metadata={'input_tokens': 60, 'output_tokens': 19, 'total_tokens': 79, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}))}}
(values) {'messages': (HumanMessage(content='Please run the risky operation with secret code $%45654@.', additional_kwargs={}, response_metadata={}, id='589244c7-9860-48fa-b68a-eca595510a73'), AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 60, 'total_tokens': 79, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CaJj7md4CRaAN2mcI1ju8uek8BJti', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--35ad04bd-5d01-4649-a64c-d8c583ffe3aa-0', tool_calls=({'name': 'risky_operation', 'args': {'secret': '$%45654@'}, 'id': 'call_dK786IhVaO3Z4VssPOI1cM6y', 'type': 'tool_call'}), usage_metadata={'input_tokens': 60, 'output_tokens': 19, 'total_tokens': 79, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}))}
(updates) {'__interrupt__': (Interrupt(value={'action_requests': ({'name': 'risky_operation', 'args': {'secret': '$%45654@'}, 'description': "Manual approval required for risky operation:nnTool: risky_operationnArgs: {'secret': '$%45654@'}"}), 'review_configs': ({'action_name': 'risky_operation', 'allowed_decisions': ('approve', 'edit', 'reject')})}, id='a3abdfe342bd7c8be8b1b586ee9f8815'),)}

Handle interruptions:

if "__interrupt__" in result:
print("Interrupt detected!")
decisions = ({"type": "approve"})result = agent.invoke(
Command(resume={"decisions": decisions}),
config=config,
)

Output:

(values) {'messages': (HumanMessage(content='Please run the risky operation with secret code $%45654@.', additional_kwargs={}, response_metadata={}, id='589244c7-9860-48fa-b68a-eca595510a73'), AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 60, 'total_tokens': 79, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CaJj7md4CRaAN2mcI1ju8uek8BJti', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--35ad04bd-5d01-4649-a64c-d8c583ffe3aa-0', tool_calls=({'name': 'risky_operation', 'args': {'secret': '$%45654@'}, 'id': 'call_dK786IhVaO3Z4VssPOI1cM6y', 'type': 'tool_call'}), usage_metadata={'input_tokens': 60, 'output_tokens': 19, 'total_tokens': 79, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), ToolMessage(content='Executed risky operation with: $%45654@', name='risky_operation', id='13109032-38fb-4d94-920c-90026acc41f3', tool_call_id='call_dK786IhVaO3Z4VssPOI1cM6y'))}

Model Agnostic API Gateway

To run the example code below with model agnostic API Gateway:
1. Place the above code in a file app.py


# Place the above code in a file app.pyfrom fastapi import FastAPI, Header
from pydantic import BaseModel
from langchain_core.runnables import RunnableLambda
from langchain_core.messages import AIMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
class ChatRequest(BaseModel):
message: str
class ChatResponse(BaseModel):
provider: str
model: str
answer: str
prompt = ChatPromptTemplate.from_messages((
("system", "You are a helpful assistant."),
("human", "{message}")
))
def build_model(x_model: str):
"""
x_model format:
- 'openai:gpt-4o-mini'
"""
if ":" in x_model:
provider, model_name = x_model.split(":", 1)
else:
provider, model_name = "openai", x_model
provider = provider.lower().strip()
if provider == "openai":
return provider, model_name, ChatOpenAI(model=model_name, temperature=0)
# if provider == "anthropic": # support for another LLM API provider
# return provider, model_name, ChatAnthropic(model=model_name, temperature=0)
def _unknown(inputs: dict):
return AIMessage(content=f"(unknown provider) Echo: {inputs.get('message','')}")
return "unknown", x_model, RunnableLambda(_unknown)
app = FastAPI(title="Model-Agnostic LangChain Gateway")
@app.post("/chat", response_model=ChatResponse)
def chat_endpoint(
req: ChatRequest,
x_model: str = Header(default="openai:gpt-4o-mini", alias="X-Model"),
):
provider, model_name, model = build_model(x_model)
chain = prompt | model | StrOutputParser()
answer: str = chain.invoke({"message": req.message})
return ChatResponse(provider=provider, model=model_name, answer=answer)

2. Start the Server:

uvicorn app:app - reload

3. Send request:

curl -X POST 'http://127.0.0.1:8000/chat' 
-H 'Content-Type: application/json' 
-H 'X-Model: openai:gpt-5-mini' 
-d '{"message":"Podaj 3 zalety Pythona."}'curl -X POST 'http://127.0.0.1:8000/chat' 
-H 'Content-Type: application/json' 
-H 'X-Model: openai:gpt-4o-mini' 
-d '{"message":"Podaj 3 zalety Pythona."}'

The future of GenAI

This brings us to the second part of this episode: the future of GenAI.

What will this industry look like in the next few years? No one has a crystal ball – but some trends are already pretty clear.

Trend #1: Versatility

Models like GPT-5 or Cloud 4.5 can already analyze images, audio and video. Soon this will become standard.

When you create the application, you have to assume that users will not just send text. They will upload screenshots, photos of documents, audio recordings. Your architecture should be ready for this.

Trend #2: Agentic Workflows

Classic APIs and linear workflows are not enough when a process is complex and dynamic.

Instead of hardcoding conditions in traditional code, we will declare State graphs of agents: Researchers, critics, experts – and let the system iterate based on status and quality signals.

By keeping these trends in mind, we can prepare our applications for the next generation of even more capable AI models.

That’s all in this chapter dedicated to model-agnostic patterns, LLM API gateways, and future AI trends.

Look next chapter

Look previous chapter

See the full code from this article in GitHub treasury

Published via Towards AI