AI is becoming very good at finding hidden software bugs – even in decades-old code

Whatawin via iStock/Getty Images Plus

Follow ZDNET: Add us as a favorite source On Google.

ZDNET Highlights

AI is proving better than expected at finding old, obscure bugs.
Unfortunately, AI is also good at finding bugs for hackers to exploit.
In short, AI is not yet ready to replace programmers or security professionals.

recently Linkedin Post, Microsoft Azure CTO Mark Russinovich said he used Anthropic’s new AI model Cloud Opus 4.6 to read and analyze assembly code written in 1986 for the Apple II 6502 processor.

Also: Why AI is both a curse and a boon for open-source software, according to developers

The cloud didn’t just interpret the code; it did what they said “Security Audit,” Subtle logic errors are encountered, including a case where a routine failed to check the carry flag after an arithmetic operation.

This is a classic bug that was hidden, dormant, for decades.

good news and bad news

Russinovich’s experiment is surprising because the code is older than today’s languages, frameworks, and security checklists. However, the AI was able to reason about low-level control flow and CPU flags to pinpoint actual faults. For experienced developers, it’s a reminder that long-running codebases may still contain bugs that traditional tools and developers have learned to live with.

Also: 7 AI coding techniques I use to ship real, reliable products faster

Yet despite the progress, some experts believe the experiment raises concerns.

As veteran go-to-market engineer Matthew Trifiro said: “Oh, my God, am I seeing this right? Attack surface expanded to include every compiled binary ever sent. When AI can reverse-engineer 40-year-old, obfuscation architectures, current obscurity and security-through-obscurity approaches are essentially useless.”

Trifiro makes a point. On the one hand, AI will help us find bugs so that we can fix them. This is good news. On the other hand, and here’s the bad news, AI can still break into programs in use that are no longer being patched or supported.

As Adedeji Olowe, founder of LandscarSaid, “This is Scarier than we’re telling you. Billions of legacy microcontrollers exist globally, many of which are likely running such fragile or poorly audited firmware.”

Also: Is Perplexity’s new computer a secure version of OpenClaw? how it works

He continued: “The real implication is that bad actors can send models like Opus to systematically find and exploit vulnerabilities, while leaving many of these systems effectively unpatched.”

LLM detectors complement the instruments

Traditional static analysis tools such as spotbugs, codeqlAnd sneak code Scan the source code for patterns associated with bugs and vulnerabilities. These tools excel at catching well-understood issues, such as null-pointer dereference, common injection patterns, and API abuse, and they do so extensively across large Java and other language codebases.

Now it is clear Large language models (LLMs) can complement those larger detector tools. one in In a 2025 head-to-head study, LLMs like GPT-4.1, Mistral Large, and DeepSeek v3 were as good as industry-standard static analyzers at finding bugs. In many open-source projects.

Also: This new cloud code review tool uses AI agents to check your pull requests for bugs – here’s how

How do these models do this? Instead of asking, “Does this line rule jointly, this approach is a powerful pairing.

For example, Anthropic’s Cloud Opus 4.6 AI is helping clean up Firefox’s open-source code. According to Mozilla, Anthropic’s Frontier Red Team finds more high-severity bugs in Firefox This usually takes just two weeks, compared to two months when people report. Mozilla announced, “It is There is clear evidence that large-scale, AI-assisted analytics is a powerful new contribution to security The Engineer’s Toolbox.”

Anthropic isn’t the only organization to use AI engines to find bugs in code. black duck signal For example, the product connects multiple LLMs, Model Context Protocol (MCP) servers, and AI agents to autonomously analyze code in real-time, detect vulnerabilities, and propose solutions.

Too: I used Cloud Code to code a Vibe Mac app in 8 hours, but it was even more work than magic

Meanwhile, security advisories, such as NCC GroupExperimenting with LLM-powered plugins for software reverse-engineering tools such as GhidraTo help discover security issues, including potential buffer overflows and other memory-safety issues that may be difficult for people to identify.

AI passing security checks

These successes don’t mean we’re ready to hand over our security checks to AI. Far away from.

Too: I Tried to Save $1,200 by Vibe Coding for Free – and I Immediately Regretted It

Researchers have found that LLM-powered bug detection is not a drop-in replacement for mature static analysis pipelines. Studies comparing AI coding agents to human developers show that AI can be prolific, but it also introduces security flaws at high rates, including insecure password handling and insecure object references.

code rabbit found that “there are some bugs that humans make more often and some that AI makes more often. For example, humans make more typos and difficult-to-test code than AI. But overall, AI created 1.7 times more bugs than humans.

Code generation tools promise speed but fail because of the errors they make. It’s not just small bugs: AI has caused 1.3-1.7 times more serious and major issues.”

Also: Turning on AI? 5 Security Tips Your Business Can’t Go Wrong With – And Why

You can also ask Daniel Steinberg, creator of the popular open-source data transfer program curl. They have complained loudly and legitimately that their project has been flooded Fake, AI-written security reports Which immerses the maintainers in unnecessary busyness.

the moral of the story

AI, when in the right hands, becomes a great assistant, but it is not ready to become a top programmer or security investigator. Maybe someday, but not today. So, use AI carefully with existing tools, and your programs will be much more secure than they currently are.

As far as legacy code goes, this is a real concern. I think people will turn to firmware-powered devices out of genuine fear that they will soon be compromised.

AI is becoming very good at finding hidden software bugs – even in decades-old code

ZDNET Highlights

good news and bad news

LLM detectors complement the instruments

AI passing security checks

the moral of the story

YouTube is filling up with scary AI slop for kids

NASA spacecraft expected to re-enter atmosphere with possibility of debris rain

Related Articles

Leave a Comment Cancel Reply