Can AI Help Solve the Math Crisis?
US math scores have been declining for over a decade—does AI offer possible solutions?
I've been digging into the evidence base on using AI in math instruction. In case you haven’t noticed, math scores in the U.S. are in rough shape. Over the past decade, math achievement has steadily declined. The pandemic made things worse: learning losses in math were more severe than those in reading. Although some students have rebounded, many, particularly those who were already behind, have not.
Because math is cumulative, missing foundational concepts can derail a student’s entire academic trajectory. We're focusing on this issue in our upcoming State of the American Student report. In the meantime, I wanted to share some of what I’ve been learning about the role AI might play in helping students catch up.
Can LLMs Do Math?
LLMs (large language models) like Claude, GPT-4, and Gemini have a reputation as hit-or-miss with math—and that’s fair. They’re built to process language, not numbers, so their reasoning is textual rather than symbolic or computational.
That said, there’s more nuance. LLMs are quite good at:
Algebraic manipulation
Solving word problems using arithmetic logic
Generating symbolic derivations when trained on examples
But they struggle with:
Deep symbolic reasoning and formal proofs
Step-by-step precision (can "hallucinate" math steps)
Geometry and graph interpretation (limited visual-spatial reasoning)
These models can do math problems, but their performance varies based on the exam.
AP Calculus AB: LLMs like GPT-4 and Claude 3 score around 4 out of 5 with the right prompts.
GSM8K (word problems, grade-school level): near 100% accuracy
MATH dataset (12,500+ high school competition-level problems): around 50–55% accuracy, even for the best models
Can LLMs Teach Math?
While LLMs aren't perfect solvers, they are already proving useful as instructional aids. They can:
Explain concepts clearly, emphasizing process over product.
Help teachers generate problem sets, lesson plans, and formative assessments.
Analyze student answers to detect misconceptions (here’s some interesting research on this front).
Support students by offering alternative explanations they may not have grasped in class (I know many students who turn to LLMs to get different or better explanations about math concepts when they are confused about the teacher’s explanation).
Platforms like Khanmigo (Khan Academy + GPT-4) and Edia show promise in augmenting instruction, and new tools are under development to provide integrated support for math teachers.
On the other hand, LLMs can also shortcut learning by offering answers instead of prompting understanding. As such, teachers will need to carefully define how and when students can use AI in math classes.
EdWeek asked teachers about use cases in math, and it’s interesting to see how teachers say they think AI should be used—their views align pretty closely with what the research says. Interestingly, teachers think AI is most appropriate for the high school level.
Especially in light of the longstanding ideological fight about how to teach math (listen to a great discussion on this below), developers will need to be savvy about following the evidence, not the rhetoric, on what instruction students need to gain key math skills - and to repair foundational skill gaps.
This last point is especially pressing: a shocking number of students today are multiple grade levels behind in math or lack essential foundational skills. A new TNTP analysis found that almost half of the students sampled started Algebra I knowing just one-third of all the algebra-related concepts and skills from prior grades. Teachers say addressing the varied gaps in foundational skills is one of the hardest challenges they face.
How Are Math Tools Integrating AI?
The number of AI-integrated math tools is growing fast:
Startup boom: Between 2018 and 2023, the number of AI-in-education startups grew by 200% —a little less than half were STEM-related(Gitnux). Platforms like Mathful, MathGPT, and CoachON Math are designed to assist students with mathematical problem-solving through AI technologies.
Platform integration: Khan Academy, DreamBox, Carnegie Learning (MATHia), and others embed AI into their systems, though many use machine learning rather than generative AI.
New tools emerging: Khanmigo, Saga AI Tutor, and Microsoft’s Phi-4 aim to provide more human-like tutoring at scale (Dataconomy).
However, not all “smart” math tools use generative AI. For example, Zearn, a widely used and research-backed platform, uses adaptive logic, not AI. This means that it dynamically adjusts based on student performance but doesn’t incorporate machine learning or natural language processing.
What’s the Evidence So Far?
The evidence base is limited, but growing. A few tools stand out (note however that some of these tools use early forms of AI or adaptive machine learning, not LLMs):
ASSISTments
Evidence of Impact on Math Scores: ✅ Statistically significant gains on state tests
Research Quality: ✅ RCTs (SRI, NSF-funded)
MATHia (Carnegie)
Evidence of Impact on Math Scores: ✅ positive in algebra and negative in geometry (*edited from original post)
Research Quality: ✅ RCTs (*edited from original post) (RAND)
DreamBox
Evidence of Impact on Math Scores: ✅ Gains in K–5 math with moderate use
Research Quality: ✅ Third-party studies (Harvard CEPR)
Zearn
Evidence of Impact on Math Scores: ✅ Strong results, especially for low-income students
Research Quality: ✅ Quasi-Experimental (with RCT coming) + large-scale implementation (Zearn) (*edited from original post)
Khanmigo
Evidence of Impact on Math Scores: 🚧 Promising pilot, no formal outcomes yet
Research Quality: ❌ No RCTs or test-linked studies yet
Saga AI Tutor
Evidence of Impact on Math Scores: 🚧 Early stage; human version is proven
Research Quality: ❌ AI version not yet tested
What’s Coming Next?
AI is evolving rapidly, and several developments could significantly improve math-solving capabilities. In particular, advancements in image recognition (see Photomath, Google’s Socratic, Microsoft’s Math Solver, or GeoGebra) and understanding and new hybrid models that combine LLM reasoning with the computing power of high powered calculators (this one from Google came out while I was drafting this piece) could dramatically improve accuracy. AI will also likely become embedded in more and more of the classroom and school infrastructure that supports excellent math instruction, including data analysis.
As CRPE has previously argued, it’s critical that education leaders envision what they want education to look like and then work with ed tech leaders to see if AI can help overcome long-standing challenges. The new TNTP report shows that there are critical “key predecessor” skills that kids need to know before being successful in Algebra I, yet nearly half of students don’t have them by 8th grade. Could AI be leveraged across a student’s K-8 experience to ensure every child enters Algebra I proficient in these skills?
Could a trained AI be used to assess how well various math curricula use evidence-based instructional strategies? An article in Nature argues for a dedicated research agenda to see if AI can help reduce math anxiety. Math experts, what would you want AI to solve for?
In other words, what comes will in part be determined by emerging technologies but will also depend on what education leaders (and hopefully students and parents) signal they need from the ed tech community. This speaks to the need for a more demand-driven agenda (research, development, investment, etc.) in ed tech.
The Bottom Line
Right now, proven tools like Zearn, DreamBox, and in-person tutoring remain the best bets for accelerating math achievement, especially for students who are behind. But AI is already proving valuable in helping teachers work smarter and giving students on-demand support outside class.
If AI can eventually become a reliable tutor for foundational math recovery, it could be a game-changer. But it’s unlikely a single tool will do this alone. Integrated systems that support teachers, students, and families holistically with out-of-the-box whole school models that challenge current thinking, especially around use of student time and motivation, staffing configurations, and leveraging community resources, will be the real breakthrough opportunities, in my view.
We’ll continue to track AI and math, so please send thoughts, corrections, additions! Post a comment or shoot me a note.
In Other News…
New AI developments (some creepier than others)
Does CHAT seem annoyingly and cloyingly accommodating lately? Well, OpenAI went off the rails with sycophancy and had to scale things back—a good reminder that these tools can be used for emotional manipulation, even unintentionally. In related news, a group of researchers attempted a large-scale but unauthorized experiment on Reddit to change users' views on highly controversial topics (yet another reason why we need guardrails!)
This new wearable recording technology could clearly be used for evil. And maybe good? (Do we see a positive use case in classrooms? I don’t know.)
One major manifesto predicts AGI by 2027, while another predicts a much longer, “normal” evolution of the technology. Which do you believe? I believe we in education should plan for both scenarios.
Professors are using AI to improve or augment lesson planning—but some students at prestigious institutions feel misled, filing formal complaints and requesting the return of tuition fees.
Final Thoughts
For those of you wondering what the future of AI will mean for work, “The Alchemy of Job Transitions” in American Council on Science and Health offers a crucial reminder: While the jobs of the future may prioritize skills like collaboration and creativity, students will need, more than ever before, strong foundations in basic skills that “every child in this country deserves but increasingly doesn’t get.”
“Transforming coal miners into coders sounds efficient, even inspiring. But without foundational skills, all the job training in the world will, like the alchemists, yield only lead.”
From "The Alchemy of Job Transitions” in American Council on Science and Health
Thoughtful, as always. I appreciate the detail you provided about the research and the difference in AI use with the tools. I am curious if you uncovered anything that would support the transition from very traditional math learning to a more inquiry-based model? I am engaging with math folks in curriculum and instruction conversations, and much of it is about ensuring that kids don't use the tools to just get the answers, but not much in the realm of rethinking the design.