Using AI to settle your taxes is likely to yield great results

by
0 comments
Using AI to settle your taxes is likely to yield great results

Tax season, that dreaded time of year, is upon us. But if you were hoping that new AI technology could help you with laborious paperwork filing — and maybe find a way to save you a few bucks — think again.

After testing several of the four major AI chatbots, new York Times found All of them struggled to select and fill the correct forms, getting key calculations wrong. Overall, the bots miscalculated the amount of tax owed to the IRS by an average of more than $2,000.

“The problem with taxes is all the little details that matter, and it won’t get every little detail right,” said Benedict Evans, an analyst who writes for a technology newspaper. NYT.

“These models get dramatically better every six months,” he adds. “But they still give you what is roughly the right answer, and it’s not what you want.”

AI can be useful for processing and summarizing large amounts of information, but it struggles with accuracy in almost every domain. Chatbots often fabricate false factual claims, even when asked to summarize a single document. AI programming assistants will spot errors in your code. Image generators produce strange visual artifacts and anomalies.

It is the same puzzle with arithmetic. Combine this with byzantine tax laws and all its conforming, highly specialized forms, and you have a recipe for, if not disaster, a taxing and expensive back-and-forth with the IRS.

To test AI models – OpenAI’s ChatGPT, Anthropic’s Cloud, Google’s Gemini, and XAI’s Grok – NYT They were asked to attempt to solve a series of tax scenarios described in training materials by tax service TaxSlayer. Only after supplying the models with highly specific instructions, such as where each piece of information should go in each IRS document, did the AI ​​begin to perform better.

You could argue that there’s no point in using an automated tool in the first place. Your average person uses overpriced tax software precisely because they don’t know the ins and outs of the process. SSoftware like TurboTax or TaxAct “are procedural, following ‘if-then’ logic built to mathematical precision,” explained Eric Brynjolfsson, a senior fellow at the Stanford Institute for Human-Centered AI. NYT – Whereas large language models are prediction engines that “can be superhuman at many tasks yet fail at some tasks that seem simple to humans.”

A prime example of how misleading LLMs can screw up your tax homework? TurboTax’s own experiments with technology. When the tax software company deployed its “Intuit Assist” chatbot to answer tax questions, it would remove irrelevant answers. When the answers were on topic, they were often wrong.

More on AI: Grammarly offering manuscript review by AI versions of recently deceased professors

Related Articles

Leave a Comment