A new study digs into why modern AI models stumble over multi-digit multiplication and what kind of training finally makes the task click.
Others are reading now
Artificial intelligence systems can write software and reason through complex problems. Yet even basic arithmetic can expose surprising weaknesses.
A new study digs into why modern AI models stumble over multi-digit multiplication and what kind of training finally makes the task click.
Jagged abilities
The paper, posted to arXiv by University of Chicago researchers Xiaoyan Bai and Chenhao Tan with collaborators from MIT, Harvard, the University of Waterloo, and Google DeepMind, looks at what the authors call AI’s “jagged frontier.”
That term describes how models can excel at advanced reasoning while failing at tasks most people learn in elementary school, such as multiplying two four-digit numbers.
According to the researchers, the problem lies not in scale alone but in how models handle information across multiple steps.
Also read
Where training breaks
Standard large language models are typically trained through fine-tuning, which relies on more data, deeper networks, or longer training.
But when the team tested models ranging from two to twelve layers, all scored below 1% accuracy on four-digit multiplication. The models consistently got stuck in what researchers describe as a local optimum, unable to move beyond shallow pattern matching.
Multiplication requires keeping track of partial products and carry values. Without a way to store and retrieve this information, the models fail no matter how much data they see.
A different approach
The breakthrough came from a method called Implicit Chain of Thought, or ICoT. Unlike standard training, ICoT gradually removes explicit reasoning steps during training, forcing the model to internalize the process.
Using ICoT, the researchers achieved 100% accuracy on the same multiplication task. They found that the model learned to encode intermediate values in its internal states, something standard models never did.
Also read
The ICoT-trained model also organized its attention over time, computing digit pairs early and retrieving them later to assemble the final answer.
An unexpected structure
Inside the successful model, the team found surprisingly elegant representations. Digits were encoded as wave-like Fourier patterns rather than simple symbols.
Operations such as multiplication emerged as geometric processes, including Minkowski sums, which the researchers did not explicitly design. These structures arose naturally as the model learned how to carry out arithmetic efficiently.
A simple fix
Building on these insights, the team added a modest change to standard training: an extra objective that teaches models to track running sums.
With that single addition, even a small two-layer model reached 99% accuracy without explicit step-by-step supervision. Its internal mechanisms began to resemble those seen in ICoT-trained models.
Also read
Bigger implications
The authors argue that the findings go beyond arithmetic. Long-range dependencies appear in many language and reasoning tasks, not just math.
“As AI is increasingly integrated into critical decision-making, it’s essential to understand its unique ways of learning and thinking,” Tan said. “Our research is trying to chart that terrain.”
Sources: University of Chicago