Adaptive AI Tutors in 2026: What Khanmigo, Synthesis, and Class Companion Actually Deliver
Benjamin Bloom's 1984 paper established a haunting baseline: one-on-one human tutoring moves student achievement roughly two standard deviations above the classroom mean — a gap so large that virtually no scalable intervention has closed it since. Generative AI arrived promising to finally crack that problem at cost. Three years into production deployments by Khan Academy's Khanmigo, Synthesis Tutor, and Class Companion, the honest answer is: the technology is genuinely promising, the methodology underlying most effectiveness claims is soft, and the vendors willing to acknowledge this openly are rarer than the press releases suggest.
As of April 2026, none of the three flagship LLM-based tutoring platforms has published a peer-reviewed randomized controlled trial showing sustained gains on independent standardized assessments. What exists is a growing body of observational data, a handful of district-level pilots, and a consistent pattern: adaptive AI tutors improve engagement, surface learning gaps faster than traditional homework cycles, and modestly accelerate formative assessment scores on their own in-platform measures. The leap from those metrics to durable state test gains requires bridging evidence that almost no vendor has yet built.
The Evidence Landscape: RCTs vs. Press Releases
The richest evidentiary tradition in AI-assisted tutoring predates the LLM era entirely. ASSISTments, a homework-feedback platform built by researchers at Worcester Polytechnic Institute, completed one of ed-tech's few genuine randomized controlled trials in 2016. The Roschelle et al. study — published in AERA Open — followed approximately 2,850 seventh-grade students across 43 Massachusetts schools. Students using the platform for nightly math homework scored a statistically significant 0.18 standard deviations higher on state math assessments than peers in the control condition. That effect size — modest, rigorously obtained, replicated in follow-up work — remains the benchmark that LLM-based tutors have not yet cleared in a comparable study design.
Carnegie Learning's MATHia adaptive algebra platform earned similar treatment when the Department of Education's What Works Clearinghouse reviewed the literature in 2023: effect sizes in the modest-but-positive range across multiple algebra studies, with consistent positive findings in Algebra I. These are not dramatic numbers. They are, however, real, independently verified, and meaningful at population scale applied across millions of students. They are also the product of years of iterative research — not the first or second school year of deployment.
LLM-based tutoring arrived at scale in 2023 without that research infrastructure. Khanmigo, Synthesis Tutor, and Class Companion have had, at most, two to three school years to accumulate data. The gap between what their internal dashboards show and what a properly designed randomized trial would reveal is currently unknown — and that uncertainty is the central fact any honest assessment must lead with.
Khanmigo: The Most Transparent Platform in the Room
Khan Academy launched Khanmigo publicly in March 2023, powered by OpenAI's GPT-4 under a partnership announced the same month. The design constraint is pedagogically deliberate: Khanmigo asks Socratic questions rather than providing direct answers, explicitly to prevent students from using it as a homework-completion service. Khan Academy made Khanmigo free for all U.S. teachers in 2024, consistent with its nonprofit mandate.
Khan Academy's 2024 effectiveness data, drawn from districts where students logged more than 30 Khanmigo sessions, showed measurable improvements on Khan Academy's own formative assessments — particularly in algebra readiness and foundational reading comprehension. Sal Khan, in his May 2024 book Brave New Words (Penguin Press), explicitly described these as in-platform metrics and called for rigorous third-party controlled studies. That transparency is worth noting: Khan Academy has been more candid about methodological limits than most competitors in this space.
What districts have reported publicly is more complicated. Pilots across Texas and Colorado found that sustained engagement — the prerequisite for any measurable effect — was difficult to maintain past the sixth week without substantial teacher scaffolding. Students who maintained consistent engagement showed vocabulary gains and improved self-correction on embedded assessments. Students who churned showed nothing. Implementation quality, not the AI itself, emerged as the dominant variable in field deployment.
Synthesis Tutor: Viral Claims and Their Context
Synthesis was founded by Josh Dahn — the principal of Ad Astra, the school Elon Musk created for SpaceX employees' children at its Hawthorne, California campus — and launched to the public around 2022. The platform originally focused on collaborative simulation games designed to build strategic and mathematical reasoning; beginning in 2023, it incorporated a conversational AI math tutor. The company raised a Series A funding round in 2023–2024.
Synthesis's most prominent marketing claim — that students on the platform advance in math at a rate several times faster than the national average — spread widely through education and technology media during 2023 and 2024. The figure originated from internal cohort analysis: Synthesis measured skill-progression rates on its own platform for high-engagement subscribers, converted those rates to grade-level equivalents, and compared them against national NAEP growth norms.
The methodological problems are substantial. The comparison pits a self-selected population — families motivated enough to pay for and persist with an educational subscription — against a national average that includes every student in every context. There is no control group. The outcome measure is internal and not independently validated. NAEP growth norms were not designed as a benchmark for subscription-program comparisons. Synthesis has not published this figure in a peer-reviewed venue; it appears on the company blog and marketing materials, labeled as internal data.
None of this means Synthesis does not help students. The simulation-based approach to mathematical reasoning has genuine theoretical grounding in constructivist and collaborative learning research, and the AI tutoring layer mirrors the pedagogical structure that gave ASSISTments its RCT-backed effect size. What it means is that the specific viral figures are marketing metrics, not study outcomes, and should be read accordingly.
Class Companion: Writing Feedback's Honest Record
Class Companion occupies a narrower niche: AI-mediated Socratic feedback on student writing, aimed particularly at AP History, Civics, and argumentative writing courses. Teachers configure the AI's focus areas; students engage in iterative back-and-forth dialogue before revising drafts. The platform positions itself as a teacher-efficiency tool as much as a student-learning tool, and targets secondary-education writing rather than math.
Class Companion's published outcomes data is largely teacher-facing. Internal reports from 2024 showed students revised essays an average of 2.3 times per assignment, compared to 0.4 times in comparison classrooms relying on traditional feedback cycles alone. Teacher grading and feedback time dropped by a reported 60 to 70 percent. These are operationally meaningful figures for teachers managing 120-student course loads.
What Class Companion has not yet published is whether those additional revision cycles translate into improved scores on standardized writing assessments — AP exam free-response sections, SAT Evidence-Based Reading and Writing, or state ELA assessments. The mechanism is plausible: increased revision with targeted feedback is a well-evidenced route to writing improvement in the research literature. But the connection to high-stakes outcomes remains demonstrated at the process level, not the outcome level.
What the District Data Actually Shows
The most reliable independent signals come from state assessment trend analyses and district accountability reports. As of the 2024–2025 school year, no large-scale comparative analysis has attributed statistically significant test score improvements specifically to AI tutoring deployments, separate from broader pandemic-recovery trends. The strongest documented recovery gains in math since 2022 have come from high-dosage human tutoring programs — intensive interventions providing three or more sessions per week with trained human tutors. AI platforms deployed at typical usage rates have not produced comparable signals in available accountability data.
District-reported correlations between AI tutoring engagement and in-platform mastery completion are more positive, but these self-reported figures are vulnerable to standard selection effects: students who engage consistently with an AI tutor are not a random sample of all students, and families who maintain subscription engagement are systematically different from those who churn within the first month.
Parent Reception: Enthusiasm and Anxiety in Equal Measure
Common Sense Media and EdWeek Research Center surveys from 2024 show a consistent split: parents in higher-income districts with existing 1:1 device programs are broadly supportive of AI tutoring tools, particularly when they can review their child's activity history. Parents in under-resourced districts more frequently cite data privacy concerns, screen time management, and worry that students are performing for a platform rather than developing genuine independent understanding. The equity concern is structural — districts most in need of tutoring access are frequently the least positioned to implement AI tools reliably due to broadband and device-maintenance infrastructure gaps.
What Would Actually Constitute Proof
A credible RCT for an LLM-based tutoring platform in 2026 would require at least 60 schools randomized at the school level to prevent contamination, a 12–18 month treatment window, pre-registered primary outcomes on state assessments or independently validated instruments rather than platform-internal scores, and intent-to-treat analysis that captures real-world engagement drop-off rather than restricting results to completers. Both Khanmigo and Synthesis have publicly acknowledged discussions with university research partners toward this end. Class Companion has not announced a comparable initiative as of April 2026.
The honest framing for school administrators right now: AI tutoring platforms likely provide real value in closing formative feedback loops, surfacing where students are stuck, and giving teachers better visibility into learning gaps. The technology is mature enough to be deployed thoughtfully. The jump to reliably improved state test scores requires evidence that does not yet exist for LLM-based tutors — and any vendor who claims otherwise is operating ahead of the data.
Frequently asked
Does Khanmigo raise scores on standardized tests, or only on Khan Academy's own platform?
What is Synthesis's claim of learning math several times faster than average actually based on?
Has any LLM-based AI tutoring platform completed a genuine randomized controlled trial as of 2026?
Why do some districts report gains while others see nothing from these tools?
Are AI tutoring tools worth the cost for budget-constrained schools?
Sources & further reading
- What Works Clearinghouse — Institute of Education Sciences, U.S. Dept. of Education
- ASSISTments Research Platform — Worcester Polytechnic Institute
- Carnegie Learning — Research and Efficacy
- Khan Academy — Khanmigo
- Synthesis — AI Math Tutor
- Common Sense Media — Education Technology
- Education Week — Artificial Intelligence in K-12
Last reviewed Apr 30, 2026. AI Pulled News is editorial; corrections welcome at /news/about.html.