Designing for non-deterministic dependencies - O'Reilly

For most of the history of software engineering, we have built systems based on a simple and comfortable assumption: given similar inputs, a program will produce similar outputs. When something goes wrong, it is usually due to a bug, misconfiguration, or a dependency that was not behaving as advertised. Our tools, testing strategies, and even our mental models have evolved around the expectation of determinism.

AI quietly shatters that assumption.

As larger language models and AI services make their way into production systems, they often arrive through familiar shapes. It has an API endpoint, a request payload, and a response body. Latency, retries, and timeouts all seem manageable. Architecturally, it seems natural to treat these systems like libraries or external services.

In practice, that familiarity is deceptive. AI systems behave less like deterministic components and more like non-deterministic collaborators. The same prompt can produce different outputs, small changes in context can lead to inconsistent changes in results, and even retries can change behavior in ways that are difficult to reason about. These features are not bugs; They are inherent in how these systems work. The real problem is that our architecture often pretends otherwise. Instead of asking how to integrate AI as yet another dependency, we need to ask how to design systems around components that do not guarantee stable output. Framing AI as a non-deterministic dependability proves to be far more useful than treating it like a smart API.

One of the first places where this mismatch shows up is in retries. In deterministic systems, retries are generally safe. If a request fails due to a momentary problem, retrying increases the chance of success without changing the result. With AI systems, retries do not simply repeat the same calculations. They produce new outputs. Retrying may solve a problem, but it could just as easily introduce a different problem. In some cases, retry silently increases rather than reduces failure, all the while appearing to be successful.

Testing reveals similar breakdowns in perceptions. Our current testing strategies rely on repetition. Unit tests validate accurate output. Integration tests verify known behaviors. With AI in the loop, those strategies quickly lose their effectiveness. You can test whether a response is syntactically valid or conforms to certain constraints, but claiming that it is “correct” becomes far more subjective. As models evolve over time, matters become more complex. A test that passed yesterday may fail tomorrow without any code changes, leaving teams unsure whether the system has reverted or simply changed.

Observability presents a more subtle challenge. Traditional monitoring excels at detecting vigorous failures. Error rates increase. Latency increases. Request failed. AI-related failures are often quiet. The system responds. Downstream services continue. Dashboards remain green. Yet the output is incomplete, confusing, or wrong in context. These “acceptable but wrong” results are far more harmful than direct errors because they gradually erode trust and are difficult to detect automatically.

Once teams accept non-determinism as a first-order concern, design priorities begin to change. Rather than trying to eliminate variability, the focus moves toward controlling it. This often means isolating AI-powered functionality behind clear boundaries, limiting where AI outputs can influence critical reasoning, and introducing explicit validation or review points where ambiguity matters. The goal is not to force deterministic behavior from the underlying probabilistic system, but to prevent that variability from leaking into parts of the system that are not designed to handle it.

This change also changes the way we think about purity. Instead of asking whether an output is correct, teams often need to ask whether it is acceptable for a given context. That reframing may be inconvenient, especially for engineers accustomed to precise specifications, but it more accurately reflects reality. Acceptability can be controlled, measured, and improved over time, even if it cannot be completely guaranteed.

Along with this change, there is a need to develop observation. Infrastructure-level metrics are still necessary, but they are no longer sufficient. Teams need visibility into the outputs themselves: how they change over time, how they vary across contexts, and how those variations correlate with downstream outcomes. This doesn’t mean logging everything, but it does mean designing signals that surface before users even notice. If anyone is paying attention, qualitative degradation is often visible long before traditional alerts are activated.

The hardest lesson learned by teams is that AI systems do not provide guarantees like traditional software. What they offer instead is possibility. In response, successful systems rely less on guarantees and more on guardrails. Guardrails control behavior, limit the radius of the explosion, and provide escape routes if things go wrong. They don’t promise correctness, but they make failure survivable. Fallback paths, conservative defaults, and human-in-the-loop workflows become architectural features rather than afterthoughts.

For architects and senior engineers, this represents a subtle but significant shift in responsibility. The challenge is not in choosing the right model or creating the right prompt. It is reshaping expectations within engineering teams and across the organization. This often means pushing back on the idea that AI can only replace deterministic reasoning, and being clear about where uncertainty exists and how the system handles it.

If I were starting over today, there are a few things I would do first. I would document explicitly where non-determinism exists in the system and how it is managed rather than letting it remain implicit. I would soon invest in output-focused observation, even if the signals seem imperfect at first. And I would spend more time helping teams remove assumptions that are no longer valid, because the hardest bugs to fix are rooted in old mental models.

AI is not just another dependency. It challenges some of the most deeply rooted assumptions in software engineering. Treating this as a non-deterministic dependency does not solve every problem, but it provides a much more honest basis for system design. It encourages architectures that expect variation, tolerate ambiguity, and fail gracefully.

This shift in thinking may be the most significant architectural change brought about by AI, not because the technology is magical, but because it forces us to confront the limitations of the determinism we have relied on for decades.

Designing for non-deterministic dependencies – O’Reilly

Top 5 AI Code Review Tools for Developers

Download: A Blockchain Puzzle, and the Algorithms That Control Our Lives

Related Articles

Leave a Comment Cancel Reply