Study: Are Reasoning models less powerful than expected?

The contribution study: Are Reasoning models less powerful than expected? First appeared at the online magazine Basic Thinking. You can start the day well every morning via our newsletter update.

Reasoning models AI Artificial Intelligence LRM

Reasoning models should be able to draw conclusions, to analyze problems and to carry out multi-level thinking processes. But a new Paper from Apple shows that doubts about the actual reasoning competence of modern AI models.

Large Reasoning Models (LRMS) differ from other AI models such as Large Language Models (LLMS) primarily by their ability to think logically. Instead of recognizing patterns, this type of AI model relies on conclusions.

Reasoning models should therefore be able to solve multi-stage tasks. This usually fails classic models when, for example, intermediate steps are necessary.

LRMS are geared to think about how people think. But the AI models still have serious weaknesses like a new one Paper of Apple researchers examined.

Table of Contents

Doubts about thinking skills of Reasoning models

Despite impressive progress in benchmarks for logical thinking, central questions about the skills and limits of modern LRMS have so far remained open, according to the Apple researchers. Because current reviews of Reasoning models, including from Openai, Google and Anthropic, are mainly based on mathematical and coded benchmarks with a view of the end response. However, this procedure neglects the analysis of the thinking processes and is susceptible to distortions in the data.

In order to examine these thinking processes, the researchers have used controlled puzzle environments. Among other things, they took a close look at the O3 Mini models from Openai, Deepseek-R1 and Claude 3.7 and compared the “normal” AI mode with the Reasoning mode.

In doing so, they were able to vary the complexity of a task, but at the same time maintain the logical structures. In this way, they were not only able to evaluate the result, but also analyze the internal path of thought of the models.

How did the respective models cut off?

But the results are sobering. Because from a certain complexity, the models collapse completely in their performance. In addition, the researchers found surprising scaling effects. Because with the increasing difficulty of a task, the cognitive effort of the models initially increased. However, this then drops again, although there would be enough computing capacity.

The researchers have divided their results into three performance zones. Accordingly, classic AI models sometimes perform better for simple tasks. With medium complexity, LRMS show benefits through additional steps. With high complexity, on the other hand, both types of model experience a “complete breakdown”.

The researchers were able to analyze that LRMs often do not use explicit algorithms and therefore often argue inconsistent in logical tasks. Structural weaknesses in thinking behavior could also be uncovered.

Even if the researchers have made the right solution algorithm available to the models, this could not be used. “Despite demanding mechanisms for self-reflection, these models fail to develop general reader skills over a certain level of difficulty,” says the publication.

Also interesting:

Confidential profile: How to hide apps under Android
Apple Sidecar: This is how you can use your iPad as a second screen
Google: Deactivate “Overview with AI” – that’s how it works
Nuclear power plants will not be able to breastfeed the Ki energy hunger

The contribution study: Are Reasoning models less powerful than expected? First appeared on Basic Thinking. Follow us too Google News and Flipboard Or subscribe to our update newsletter.

As a Tech Industry expert, I believe that the study on reasoning models being less powerful than expected is a valuable contribution to the field. It is important to critically evaluate the capabilities of these models, as they play a crucial role in various applications such as natural language processing, machine learning, and decision-making systems.

The findings of the study could have significant implications for the development and deployment of reasoning models in real-world scenarios. It is essential for researchers and practitioners to understand the limitations of these models and explore ways to enhance their effectiveness and reliability.

Moving forward, it is crucial to continue exploring different approaches and techniques to improve the performance of reasoning models. This may involve incorporating additional data sources, refining algorithms, or implementing more advanced learning methods. By addressing these challenges, we can unlock the full potential of reasoning models and leverage their capabilities to drive innovation and progress in the tech industry.

Credits