AI's grammar vs facts problem

A discussion paper just published from the Institute of Labor Economics in Germany draws a parallel between large language models and autoregressive forecasting models used in economics. Both excel at pattern recognition and prediction yet fail when confronted with structural breaks or genuinely novel situations.

The research identifies a critical distinction: grammar operates through predictable, repeating patterns that AI can master and generalise, whilst factual accuracy depends on specific knowledge that resists pattern-based learning. LLMs can navigate grammatical structures with remarkable consistency because these patterns appear repeatedly across contexts, allowing generalisation from limited examples. Truth, however, requires specific knowledge that cannot be inferred from surrounding patterns. Knowing Marie Curie's actual birthplace demands access to that particular fact, not mastery of biographical sentence structures.

 

Self-Consuming Generative Models Go MAD - Model Autophagy Disorder (Alemohammad et al., 2023): Training generative artificial intelligence (AI) models on synthetic data progressively amplifies artefacts.

 

This distinction has implications for workplace applications. The research warns of emerging model collapse scenarios where AI systems increasingly trained on synthetic content begin amplifying their own statistical artefacts. As organisations replace entry-level knowledge workers with AI agents, they risk creating a feedback loop that degrades both the quality of training data for future models and the pipeline of human expertise needed to maintain accuracy standards.

 

Source: Askitas, N. 2025. Notes on a world with generative AI. IZA Discussion Paper No. 18070. Institute of Labor Economics.

Back to posts

Keep up-to-date with the monthly skills digest.