Runtime reinforcement: preventing "instruction decay" in long reference windows

Author(s): Shreyash Shukla

Originally published on Towards AI.

Image Source: Google Gemini

“Floating Brain” Problem

In our previous articles, we discussed how to give the agent knowledge (graphs), vision (shapes), and empathy (user context). But even an ideal agent suffers from a fundamental flaw inherent in the Large Language Model architecture: It exists in a timeless void.

There is no internal clock in LLM. It doesn’t know that “today” is Tuesday. Furthermore, as the conversation progresses, it is at a disadvantage “Context drift.”

research from Stanford (Liu et al., 2023) have quantified a behavior known as “Lost in the Middle” Event. As the context window fills with conversation history (SQL queries, data tables, user chit-chat), the model’s attention mechanism begins to degrade. It prioritizes the very beginning (the system prompt) and the very end (the user’s latest question), but important instructions in between – like negative constraints (“Don’t use this”). eng_rate metric”)—often ignored (Lost in the middle: how language models use long references – arXiv).

we call it “Instruction Decay.”

Temporary failure: A user asks, “How’s the performance this week?” The agent, due to the lack of context of “this week”, hallucinates the date range or misses the training data timestamp.
Compliance Failure: In turn #1, the agent remembers to “exclude returns and allowances.” By turn #20, after processing 50,000 tokens of other data, that specific instruction is “washed out” by recency bias (Following Instructions in the LLM – MIT Press).

we can’t trust it steady The system prompts the line to be held permanently. we need one Runtime Interceptor – A “just-in-time” layer that maps current time and non-negotiable business rules to the very end of the quick reference, millisecond Before the model generates any response.

Temporal grounding (“when”)

For LLMs, “time” is a frozen concept. It knows that World War II ended in 1945, but doesn’t know whether “last quarter” refers to Q4 2023 or Q1 2024. This is because the internal state of the model is stable, defined by its training cutoff.

When a user asks, “Show me sales for the last 7 days,” A standard agent can be either:

Hallucinations: Estimate the date range based on example signals observed during training.
fail: Refused to answer due to ambiguity.

To fix this, we need to get the agent involved present moment. We do this by injecting an accurate, timezone-aware timestamp into the instantiated context at the exact moment of inference.

we can use before_model_callback To generate a dynamic header. using the datetime We add a timestamp prefix to each user to ensure that “today” aligns with the user’s definition of “today”.

import datetime
import pytzdef get_temporal_context():
# 1. Define the Corporate Timezone (Critical for Data Alignment)
# If your DB is in UTC, this must be UTC. If PST, use PST.
target_tz = pytz.timezone("America/Los_Angeles")
# 2. Get the specific "Now"
current_time = datetime.datetime.now(target_tz).strftime(
"%Y-%m-%d %H:%M:%S %Z"
)
# 3. Format as a System Directive
return f"(Timestamp): The current date and time is {current_time}."

when the user types “How’s the performance?”The callback silently appends this string. LLM receives: (Timestamp): The current date and time is 2025-10-27 14:30:00 PDT. User: How is performance?

The model now has the variables needed to calculate relative dates. It automatically interprets “display” as “display” till 27th October“Resolving ambiguity without a single follow-up question.

critical reinforcement (“what”)

In an ideal world, an LLM would read your system prompt once and remember it forever. In reality, LLMs suffer from “recency effect.” They pay disproportionate attention to the last few user messages and often “forget” the 10,000 tokens buried in history.

This is dangerous for enterprise data. If the model forgets to “exclude internal test accounts” because it got distracted by a long conversation about revenue, the resulting SQL is technically valid but business-wrong.

Solution: “Sticky Note” Pattern

we treat before_model_callback As a system of slapping a “sticky note” at the user’s prompt just before it enters the model’s brain. it is active governance.

Engineering: Reinforcement Block

We construct a string block that contains only the most important, high-failure-rate instructions. We add this block to Ending of the prompt, ensuring that this is the freshest context that the model sees.

What happens inside? (universal pattern)

We focus on two categories of common failure modes in all LLMs (GPT-4, Cloud, Gemini, Llama):

Hard business logic (“must”): Rules that, if missed, could lead to compliance or accuracy issues.

Example: “Mandatory: When calculating ‘Active Users’, you must exclude any user_IDs starting with ‘test_’.”
Why here? Because the test data looks exactly like the real data. model Desire Include it unless explicitly warned at runtime.

Syntactic hallucination (“how”): The complex technical syntax in which models often fail when dealing with nested data types.

Example (JSON/NoSQL): “Reminder: When Inquiring properties Columns, do not assume that keys exist. always use JSON_EXTRACT(properties, '$.region') instead of properties.region“
Solution: We inject a snippet that forces safe, explicitly verbose syntax to prevent runtime errors.

resulting prompt structure LLM sees an overall signal:

(System Prompt) (normal personality)
(Conversation History) (last 20 turns)
(User Input) (“Show me the numbers.”)
(Runtime Injection) (“Timestamp: 10:00 AM. Rules: extract ‘test_%’. Syntax: use JSON_EXTRACT.”)

By physically keeping the rules after With user input, we mathematically increase the probability of compliance.

Code Artifact (Implementation)

We have theory (recency bias) and strategy (sticky notes). Now we need the code.

The following Python class shows how to implement Runtime Interceptor. It uses a standard callback pattern to intercept llm_request Generate the object, the reinforcement block, and add it to the prompt stream before it is seen by the model.

code walkthrough it ImprovePrompt The class performs three important functions:

time stamping: This calculates the current time in the business’s primary timezone (for example, PST/EST), ensuring that the model knows “when” it is.
Rule Aggregation: It defines a dictionary of non-negotiable rules (syntax and logic).
Prompt Surgery: It physically connects these rules Ending of user messages, maximizing their importance in the model’s attention mechanism.

execution

import datetime
import pytz
from typing import Optional# Assumption: You are using a standard framework where requests are mutable objects
# (e.g., Google ADK)
class ImprovePrompt:
"""
A runtime callback that injects Temporal Context and Business Rules
into the prompt milliseconds before inference.
"""
def before_model_callback(self, callback_context, llm_request) -> Optional(None):
"""
Intercepts the LLM request to append the 'Sticky Note' of context.
"""
# 1. TEMPORAL GROUNDING
# We enforce a specific timezone to align with database ETL cycles.
target_tz = pytz.timezone("America/Los_Angeles")
current_time = datetime.datetime.now(target_tz).strftime(
"%Y-%m-%d %H:%M:%S %Z"
)
timestamp_prefix = f"(System Clock): Current Time is {current_time}."
# 2. CRITICAL REINFORCEMENT BLOCK
# These are rules that the model is prone to "forgetting" in long chats.
reinforcement_prompt = """
+-------------------------------------------------------+
| RUNTIME GOVERNANCE PROTOCOLS |
+-------------------------------------------------------+
(1) DATA EXCLUSION RULES (MANDATORY):
- IF querying 'Active Users', you MUST exclude 'test_%' IDs.
- IF querying 'Revenue', you MUST exclude 'Trial' SKUs.
(2) SYNTAX SAFETY (NO HALLUCINATIONS):
- JSON Fields: Do NOT use dot notation (data.id). 
ALWAYS use safe extraction: JSON_EXTRACT(data, '$.id').
- Date Literals: ALWAYS cast strings to dates 
(e.g., CAST('2023-01-01' AS DATE)).
(3) TEMPORAL LOGIC:
- "Today" is defined by the (System Clock) above.
- "Last Week" means the complete 7-day period ending yesterday.
"""# 3. PROMPT SURGERY
# We locate the user's latest input and append our block.
if not llm_request.contents:
return None
# Access the raw text parts of the prompt
current_parts = list(llm_request.contents(0).parts)
# Create the new injection part
injection_text = (
f"nn--- INTERNAL SYSTEM INJECTION ---n"
f"{timestamp_prefix}n"
f"{reinforcement_prompt}n"
f"-------------------------------------n"
)
# Prepend or Append? 
# We Prepend to the latest turn to ensure it frames the user's request immediately.
# (Some architectures prefer appending; both work depending on attention tuning).
from google.genai import types # or your specific framework types
injection_part = types.Part(text=injection_text)
# Modify the request in-place
new_parts = (injection_part) + current_parts
llm_request.contents(0).parts = new_parts
return None

the outcome

With this 50-line class, every single conversation – whether it’s the first turn or the 100th – is based on exactly the same reality. The model effectively “wakes up” with a refreshed memory of the rules every time you speak.

create a complete system

This is part of the article cognitive agent architecture series. We’re going through the engineering required to go from a basic chatbot to a secure, deterministic enterprise advisor.

To see the full roadmap – including Semantic Graph (Brain), Gap Analysis (Discernment)And Sub-Agent Ecosystem (Organization) – See master index below:

Cognitive Agent Architecture: From Chatbot to Enterprise Advisor

Published via Towards AI

Runtime reinforcement: preventing “instruction decay” in long reference windows