
Image by editor
# Introduction
Python decorators are purpose-built solutions designed to help simplify complex software logic in a variety of applications, including LLM-based applications. Dealing with LLM often involves unpredictable, slow, and often expensive third-party APIs, and decorators have a lot to do with making this task cleaner by, for example, wrapping API calls with customized logic.
Let’s take a look at five useful Python decorators that will help you customize your LLM-based applications without any extra burden.
The attached examples illustrate the syntax and approach to using each decorator. They are sometimes shown without actual LLM usage, but they are code fragments designed to eventually become part of larger applications.
# 1. In-Memory Caching
This solution comes from Python functools standard library, and it is useful for expensive tasks like those using LLM. If we have an LLM API call to a function defined below, wrapping it in an LRU (Least Recently Used) decorator adds a cache mechanism that prevents redundant requests with the same input (signal) in the same execution or session. This is a great way to optimize latency issues.
This example shows its use:
from functools import lru_cache
import time
@lru_cache(maxsize=100)
def summarize_text(text: str) -> str:
print("Sending text to LLM...")
time.sleep(1) # A simulation of network delay
return f"Summary of {len(text)} characters."
print(summarize_text("The quick brown fox.")) # Takes one second
print(summarize_text("The quick brown fox.")) # Instant
# 2. Persistent on-disk caching
Talking about caching, external library diskcache SQLite takes this a step further by implementing a persistent cache on disk via the database: very useful for storing the results of time-consuming operations such as LLM API calls. This way, results can be obtained immediately in subsequent calls when needed. Consider using this decorator pattern when in-memory caching is not sufficient because execution of the script or application may halt.
import time
from diskcache import Cache
# Creating a lightweight local SQLite database directory
cache = Cache(".local_llm_cache")
@cache.memoize(expire=86400) # Cached for 24 hours
def fetch_llm_response(prompt: str) -> str:
print("Calling expensive LLM API...") # Replace this by an actual LLM API call
time.sleep(2) # API latency simulation
return f"Response to: {prompt}"
print(fetch_llm_response("What is quantum computing?")) # 1st function call
print(fetch_llm_response("What is quantum computing?")) # Instant load from disk happens here!
# 3. Network-resilient apps
Since LLMs can often fail due to transient errors as well as timeouts and “502 Bad Gateway” responses on the Internet, such as using the Network Resilience Library tenacity with @retry The decorator can help prevent these common network failures.
The example below shows this implementation of flexible behavior by randomly simulating a 70% probability of network error. Try it a few times, and sooner or later you’ll see this error pop up: completely expected and intended!
import random
from tenacity import retry, wait_exponential, stop_after_attempt, retry_if_exception_type
class RateLimitError(Exception): pass
# Retrying up to 4 times, waiting 2, 4, and 8 seconds between each attempt
@retry(
wait=wait_exponential(multiplier=2, min=2, max=10),
stop=stop_after_attempt(4),
retry=retry_if_exception_type(RateLimitError)
)
def call_flaky_llm_api(prompt: str):
print("Attempting to call API...")
if random.random() < 0.7: # Simulating a 70% chance of API failure
raise RateLimitError("Rate limit exceeded! Backing off.")
return "Text has been successfully generated!"
print(call_flaky_llm_api("Write a haiku"))
# 4. Client-Side Throttling
This combined decorator uses ratelimit Library for controlling the frequency of calls to a (usually highly demanded) function: useful for avoiding client-side limitations when using external APIs. The following example does this by defining a requests per minute (RPM) limit. The provider will reject signals from the client application when too many concurrent signals are launched.
from ratelimit import limits, sleep_and_retry
import time
# Strictly enforcing a 3-call limit per 10-second window
@sleep_and_retry
@limits(calls=3, period=10)
def generate_text(prompt: str) -> str:
print(f"({time.strftime('%X')}) Processing: {prompt}")
return f"Processed: {prompt}"
# First 3 print immediately, the 4th pauses, thereby respecting the limit
for i in range(5):
generate_text(f"Prompt {i}")
# 5. Structured Output Binding
The fifth decorator in the list uses this magentic in conjunction with the library Pydantic Providing an efficient interaction mechanism with LLM through API and receiving structured responses. This simplifies the process of calling the LLM API. This process is important to cause the LLM to return formatted data such as JSON objects in a reliable manner. The decorator will handle the underlying system hints and pedantic-based parsing, resulting in optimizing token usage and helping maintain a cleaner codebase.
To try this example, you will need an OpenAI API key.
# IMPORTANT: An OPENAI_API_KEY set is required to run this simulated example
from magentic import prompt
from pydantic import BaseModel
class CapitalInfo(BaseModel):
capital: str
population: int
# A decorator that easily maps the prompt to the Pydantic return type
@prompt("What is the capital and population of {country}?")
def get_capital_info(country: str) -> CapitalInfo:
... # No function body needed here!
info = get_capital_info("France")
print(f"Capital: {info.capital}, Population: {info.population}")
# wrapping up
In this article, we have listed and featured five Python decorators based on different libraries that have particular importance when used in the context of LLM-based applications to simplify logic, make processes more efficient, or improve network resiliency, among other aspects.
ivan palomares carrascosa Is a leader, author, speaker and consultant in AI, Machine Learning, Deep Learning and LLM. He trains and guides others in using AI in the real world.
