Can AI systems reason?
In recent weeks the largest players in the generative AI space announced early versions of AI agents; software that can act as a personal assistant undertaking tasks on our behalf. We set the goal, then agentic AI can work autonomously to figure out the best course of action and make decisions along the way.
Already able to complete tasks from data entry, documentation and filing to scheduling and web research, AI agents will be capable of entire workflows. Where AI agents can be put to work, the person’s role will shift from delivery of repetitive activities to strategic direction, guidance and quality assurance.
These technologies will automate certain tasks, but we still need to provide and specify the data to be used, as well as refine their outputs through prompting –such as requesting certain details, providing critique and corrections. However, very soon our role as fact checker may no longer be as crucial with AI technologies that can reason. Hybrid AI systems that can review and corroborate their answers as part of their generative process.
Tina
Can AI systems reason?
An advancement made by a mathematics-solving AI system shows how future AI could reason and overcome the lack of reliability that plagues current generative AI tools.
The team at Google research laboratory DeepMind say that its generative AI system can now surpass the level of the average gold medallist at the International Maths Olympiad – a prestigious competition for gifted high school students. Reaching this level of performance in the competition requires the AI system to check its own reasoning and logic to generate correct answers.
AlphaGeometry2 is an upgraded system that furthers the progress made last year by DeepMind’s AlphaGeometry – AI systems at the forefront of AI research exploring how generative AI systems might include validation of their outputs inbuilt. These AI systems utilise a combination of components including language models as well as other specialised AI systems to undertake reasoning and check for logic.
How it works
AlphaGeometry works by combining two main parts. The first is a specialised AI that understands and writes math. The second is a problem solving system that uses clear rules written by humans, rather than learning from examples like most AI does. The team taught the AI to write math in a precise, formal way - like writing in a strict mathematical code. This allows them to automatically check if the AI's answers make sense and catch any mistakes or made-up solutions, which are common problems with AI chatbots.
AG2 improves on AG1's simple search method by running multiple searches at the same time, each configured differently. These searches share useful information with each other as they work. The system uses multiple language models for each search to improve reliability. This approach is called SKEST (Shared Knowledge Ensemble of Search Trees).

AlphaGeometry2 search algorithm employs several different search trees which can share facts they proved via a special knowledge sharing mechanism. Source: Chervonyi et al 2025
When a search attempts a solution step, it either succeeds and completes the proof, or it fails but saves any proven facts to a shared database. These shared facts exclude temporary construction steps and only include information relevant to the main problem, making them useful to all other ongoing searches.
AI misinterpretation and hallucination
The inability to check responses for logic has been one of reasons generative AI is unreliable. For example, according to recent research by the BBC, four of the major artificial intelligence (AI) chatbots are inaccurately summarising news stories. In the study, ChatGPT, Copilot, Gemini and Perplexity were asked to summarise 100 news stories and then journalists with expertise in the topic of the article rated the quality of the answers from the different AI assistants:
“It found 51% of all AI answers to questions about the news were judged to have significant issues of some form. Additionally, 19% of AI answers which cited BBC content introduced factual errors, such as incorrect factual statements, numbers and dates.”
(Rahman-Jones 2025)
ChatGPT alone has 300 million people weekly using ChatGPT to help discover and create content by providing summaries, drafts of text, quotes and find sources.

AI capability strengths (green) and key problems to be solved (orange) Source: Supermind.design
Why the ability to reason matters
Once AI systems are able to undertake reasoning to confirm findings are correct (as far as can be objectively verified) before presenting them to the user, AI systems will be able to outperform humans in more subject areas, providing they have been trained with sufficient data. In the case of AlphaGeometry2 the subject was maths. However, for other subject areas a key obstacle that still needs to be overcome is accessing or creating quality information or understandings (digitised as datasets) to train specialised AI systems to be able to check their answers.

References
Chervonyi et al. 2025. Gold-medallist Performance in Solving Olympiad Geometry with AlphaGeometry2. Arxiv Pre-print research paper documenting work conducted at Google DeepMind.
Rahman-Jones, I. 2025. AI chatbots unable to accurately summarise news, BBC finds.
Related
Edaith’s Skills Brief. 2024. Generative AI validation.
Edaith News. 2025. New AI agents that can do tasks for you.