Auto-Reviewing Cloud Code – O’Reilly

by
0 comments
Auto-Reviewing Cloud Code - O'Reilly

A well-crafted system prompt will increase the quality of code produced by your coding assistant. It does make a difference. If you provide guidelines in your system prompt for writing code and tests, the coding assistant will follow the guidelines.

However that depends on your definition of “will follow”. If your definition is “will often follow”, then that is accurate. If your definition is “will always obey” or even “will obey most of the time”, that’s wrong (unless you’ve found a way to make them reliable which I haven’t – please let me know).

Coding agents will ignore instructions in the system prompt on a regular basis. As soon as the reference window fills up and they start getting high, all bets are off.

Even with the latest Opus 4.5 model, I haven’t noticed any major improvements. So if we can’t trust models to follow system signals, we need to invest in feedback loops.

I’ll show you how I’m using cloud code hooks to apply automated code review to all AI-generated code so that the code quality is higher before it reaches a human in the loop.

You can find a code example that demonstrates the concepts discussed in this post on my github.

Auto code review for fast, semantic feedback

When I talk about auto code review in this post, I’m describing a fast feedback mechanism aimed at reviewing common code quality issues. This will be run when Cloud has finished editing, so it needs to be fast and efficient.

For example, I also use coding assistants for detailed code reviews when reviewing PRs. This will involve many sub-agents and will take a little more time. I’m not talking about that here.

The purpose of auto code review is to reinforce your system prompt, project documentation, and on-demand skills. Things that Claude might have overlooked. Part of a multidimensional approach.

Wherever possible, I recommend using your lint and test rules to ensure quality, and leaving auto code review for more semantic issues that tools can’t check for.

If you want to set a maximum length or maximum level of indentation for your files, use your lint tool. If you want to enforce minimum test coverage, use your own testing framework.

Semantic Code Review

Semantic code review looks at how well the code is designed. For example, nomenclature: Does the code accurately describe the business concepts it represents?

AI will often default to names like “helpers” and “utils”. But the AI ​​is also good at understanding nuances and finding better names if you challenge it, and it can do this job quickly. Hence this is a good example of a semantic rule.

You can ban certain words like “helper” and “utils” with lint tools. (I recommend doing so.) But it won’t explain everything.

Another example is the leaking of logic from the domain model. When a use case/application service queries an entity and then makes a decision, it is highly likely that your domain logic is leaking into the application layer. Not so easy to catch with lint tools, but worth paying attention to.

domain argument leak

Another example is the default fallback value. When Cloud has an undefined value where a value is expected, it will set a default value. It seems to hate throwing exceptions or challenging the type signature and asking, “Should we allow undefined here?” It wants to run the code no matter what and no matter how much the system prompts tell it not to do so.

default fallback value

You can catch some of this with lint rules but it’s very subtle and depends on the context. Sometimes it is right to go back to the default value.

Building an Auto Code Review with Cloud Hooks

If you’re using Cloud Code and want to create an auto code review for checks that you can’t easily define from Lint or testing tools, one solution is to configure a script that runs on Stop hook.

Stop The hook is when the cloud finishes working and hands control back to the user to make decisions. So here, you can trigger a subagent to perform a review on modified files.

To trigger the subagent you need to return an error status code that blocks the main agent and forces them to read the output.

trigger subagent

I think it’s generally considered best practice to use a subagent focused on reviews with a very critical mindset. Asking the main agent to mark your own homework is clearly not a good approach, and will use up your reference window.

The solution I use is Available on GitHub. You can install it as a plug-in in your repo and customize the code review instructions, or simply use it as inspiration for your own solution. Any feedback is greatly appreciated.

In the above example you can see that it took 52 seconds. Probably faster than me reviewing and providing feedback myself. But this is not always the case. Sometimes this may take a few minutes.

If you’re sitting blocked waiting for a review, it may be slower than doing it yourself. But if you’re not blocked and working on something else (or watching TV), this saves you time because the end result will be of higher quality and require less of your time to review and fix.

Scanning for updated files

I want my auto code review to only review files that have been modified since the last pull request. But the cloud does not provide this information in this context Stop hook.

I can find all files modified or unstable using Git, but that’s not enough.

What I do instead is hook up PostToolUse By keeping a log of each modified file.

posttool usage

When Stop Once the hook is triggered, the review will find files modified since the last review and the sub-agent will be asked to review only those. If there are no modified files, code review is not active.

challenges with Stop hook

unfortunately Stop For some reasons the hook is not 100% reliable in this use case. First, the cloud may pause to ask a question, for example to clarify some of your needs. You don’t want auto review to be on here until you’ve responded to the cloud and it’s finished.

The second reason is that the cloud can make changes even before Stop hook. So by the time the sub-agent reviews, the changes are already committed to Git.

This may not be a problem, and if it is there are simple ways to resolve it. These are just extra things to keep in mind and setup.

The ideal solution would be for Anthropic (or other tool vendors) to provide us with hooks that are higher level in abstraction – more aligned with the software development workflow, not just low-level file modification operations.

What I would really like is a CodeReadyForReview Hook that provides all files modified by the cloud. Then we can throw in our custom solutions.

If you have a better approach let me know

I don’t know if I’m not looking in the right places or the information isn’t out there, but I feel like this solution is solving a problem that should already be solved.

I would be really grateful if you could share any advice that helps improve code quality before a human in the loop reviews it.

Until then I will continue to use this auto code review solution. When you’re giving the AI ​​some autonomy to implement tasks and review what it produces, this is a useful pattern that can save you time and reduce the frustration of having to repeat the same feedback to the AI.

Related Articles

Leave a Comment