📚🤖
AI Innovation · Apr 30, 2026
An honest accounting of published RCTs, district-level pilots, and parent reception — where the evidence falls short of the press releases
← All articles

Adaptive AI Tutors in 2026: What Khanmigo, Synthesis, and Class Companion Actually Deliver

AI Innovation Published Apr 30, 2026 · ai tutoring · khanmigo · synthesis · class companion · edtech

Benjamin Bloom's 1984 paper established a haunting baseline: one-on-one human tutoring moves student achievement roughly two standard deviations above the classroom mean — a gap so large that virtually no scalable intervention has closed it since. Generative AI arrived promising to finally crack that problem at cost. Three years into production deployments by Khan Academy's Khanmigo, Synthesis Tutor, and Class Companion, the honest answer is: the technology is genuinely promising, the methodology underlying most effectiveness claims is soft, and the vendors willing to acknowledge this openly are rarer than the press releases suggest.

As of April 2026, none of the three flagship LLM-based tutoring platforms has published a peer-reviewed randomized controlled trial showing sustained gains on independent standardized assessments. What exists is a growing body of observational data, a handful of district-level pilots, and a consistent pattern: adaptive AI tutors improve engagement, surface learning gaps faster than traditional homework cycles, and modestly accelerate formative assessment scores on their own in-platform measures. The leap from those metrics to durable state test gains requires bridging evidence that almost no vendor has yet built.

The Evidence Landscape: RCTs vs. Press Releases

The richest evidentiary tradition in AI-assisted tutoring predates the LLM era entirely. ASSISTments, a homework-feedback platform built by researchers at Worcester Polytechnic Institute, completed one of ed-tech's few genuine randomized controlled trials in 2016. The Roschelle et al. study — published in AERA Open — followed approximately 2,850 seventh-grade students across 43 Massachusetts schools. Students using the platform for nightly math homework scored a statistically significant 0.18 standard deviations higher on state math assessments than peers in the control condition. That effect size — modest, rigorously obtained, replicated in follow-up work — remains the benchmark that LLM-based tutors have not yet cleared in a comparable study design.

Carnegie Learning's MATHia adaptive algebra platform earned similar treatment when the Department of Education's What Works Clearinghouse reviewed the literature in 2023: effect sizes in the modest-but-positive range across multiple algebra studies, with consistent positive findings in Algebra I. These are not dramatic numbers. They are, however, real, independently verified, and meaningful at population scale applied across millions of students. They are also the product of years of iterative research — not the first or second school year of deployment.

LLM-based tutoring arrived at scale in 2023 without that research infrastructure. Khanmigo, Synthesis Tutor, and Class Companion have had, at most, two to three school years to accumulate data. The gap between what their internal dashboards show and what a properly designed randomized trial would reveal is currently unknown — and that uncertainty is the central fact any honest assessment must lead with.

Khanmigo: The Most Transparent Platform in the Room

Khan Academy launched Khanmigo publicly in March 2023, powered by OpenAI's GPT-4 under a partnership announced the same month. The design constraint is pedagogically deliberate: Khanmigo asks Socratic questions rather than providing direct answers, explicitly to prevent students from using it as a homework-completion service. Khan Academy made Khanmigo free for all U.S. teachers in 2024, consistent with its nonprofit mandate.

Khan Academy's 2024 effectiveness data, drawn from districts where students logged more than 30 Khanmigo sessions, showed measurable improvements on Khan Academy's own formative assessments — particularly in algebra readiness and foundational reading comprehension. Sal Khan, in his May 2024 book Brave New Words (Penguin Press), explicitly described these as in-platform metrics and called for rigorous third-party controlled studies. That transparency is worth noting: Khan Academy has been more candid about methodological limits than most competitors in this space.

What districts have reported publicly is more complicated. Pilots across Texas and Colorado found that sustained engagement — the prerequisite for any measurable effect — was difficult to maintain past the sixth week without substantial teacher scaffolding. Students who maintained consistent engagement showed vocabulary gains and improved self-correction on embedded assessments. Students who churned showed nothing. Implementation quality, not the AI itself, emerged as the dominant variable in field deployment.

Note: Khanmigo's initial 2023 rollout produced documented mathematical errors — particularly in geometry — that Khan Academy patched progressively through 2023 and 2024. The Socratic structure meant errors were sometimes embedded in leading questions rather than stated answers, making them harder for students to detect. Khan Academy responded with model updates and added a teacher-review interface for flagging suspect AI responses.

Synthesis Tutor: Viral Claims and Their Context

Synthesis was founded by Josh Dahn — the principal of Ad Astra, the school Elon Musk created for SpaceX employees' children at its Hawthorne, California campus — and launched to the public around 2022. The platform originally focused on collaborative simulation games designed to build strategic and mathematical reasoning; beginning in 2023, it incorporated a conversational AI math tutor. The company raised a Series A funding round in 2023–2024.

Synthesis's most prominent marketing claim — that students on the platform advance in math at a rate several times faster than the national average — spread widely through education and technology media during 2023 and 2024. The figure originated from internal cohort analysis: Synthesis measured skill-progression rates on its own platform for high-engagement subscribers, converted those rates to grade-level equivalents, and compared them against national NAEP growth norms.

The methodological problems are substantial. The comparison pits a self-selected population — families motivated enough to pay for and persist with an educational subscription — against a national average that includes every student in every context. There is no control group. The outcome measure is internal and not independently validated. NAEP growth norms were not designed as a benchmark for subscription-program comparisons. Synthesis has not published this figure in a peer-reviewed venue; it appears on the company blog and marketing materials, labeled as internal data.

None of this means Synthesis does not help students. The simulation-based approach to mathematical reasoning has genuine theoretical grounding in constructivist and collaborative learning research, and the AI tutoring layer mirrors the pedagogical structure that gave ASSISTments its RCT-backed effect size. What it means is that the specific viral figures are marketing metrics, not study outcomes, and should be read accordingly.

Conjecture, marked clearly: Synthesis has not publicly disclosed the foundation models powering its AI tutor as of April 2026. Based on developer forum discussions and response timing patterns noted in 2024, the tutoring component most likely runs on OpenAI's GPT-4o or a fine-tuned variant; some secondary sources suggest Anthropic's Claude handles certain extended-explanation tasks where it outperforms GPT-4o on sustained mathematical reasoning. Neither company has confirmed any specific commercial arrangement publicly.

Class Companion: Writing Feedback's Honest Record

Class Companion occupies a narrower niche: AI-mediated Socratic feedback on student writing, aimed particularly at AP History, Civics, and argumentative writing courses. Teachers configure the AI's focus areas; students engage in iterative back-and-forth dialogue before revising drafts. The platform positions itself as a teacher-efficiency tool as much as a student-learning tool, and targets secondary-education writing rather than math.

Class Companion's published outcomes data is largely teacher-facing. Internal reports from 2024 showed students revised essays an average of 2.3 times per assignment, compared to 0.4 times in comparison classrooms relying on traditional feedback cycles alone. Teacher grading and feedback time dropped by a reported 60 to 70 percent. These are operationally meaningful figures for teachers managing 120-student course loads.

What Class Companion has not yet published is whether those additional revision cycles translate into improved scores on standardized writing assessments — AP exam free-response sections, SAT Evidence-Based Reading and Writing, or state ELA assessments. The mechanism is plausible: increased revision with targeted feedback is a well-evidenced route to writing improvement in the research literature. But the connection to high-stakes outcomes remains demonstrated at the process level, not the outcome level.

Conjecture, marked clearly: Class Companion's model stack is not publicly disclosed. Based on response latency and multi-turn coherence in AP essay feedback, the backend most likely uses OpenAI's GPT-4o. Revenue estimate for Class Companion as of late 2025: $4–10 million ARR, based on publicly reported school district partnerships and per-seat pricing observed in the $6–15 per student per year range. Both figures are estimates from public information; the company has not disclosed financials.

What the District Data Actually Shows

The most reliable independent signals come from state assessment trend analyses and district accountability reports. As of the 2024–2025 school year, no large-scale comparative analysis has attributed statistically significant test score improvements specifically to AI tutoring deployments, separate from broader pandemic-recovery trends. The strongest documented recovery gains in math since 2022 have come from high-dosage human tutoring programs — intensive interventions providing three or more sessions per week with trained human tutors. AI platforms deployed at typical usage rates have not produced comparable signals in available accountability data.

District-reported correlations between AI tutoring engagement and in-platform mastery completion are more positive, but these self-reported figures are vulnerable to standard selection effects: students who engage consistently with an AI tutor are not a random sample of all students, and families who maintain subscription engagement are systematically different from those who churn within the first month.

Parent Reception: Enthusiasm and Anxiety in Equal Measure

Common Sense Media and EdWeek Research Center surveys from 2024 show a consistent split: parents in higher-income districts with existing 1:1 device programs are broadly supportive of AI tutoring tools, particularly when they can review their child's activity history. Parents in under-resourced districts more frequently cite data privacy concerns, screen time management, and worry that students are performing for a platform rather than developing genuine independent understanding. The equity concern is structural — districts most in need of tutoring access are frequently the least positioned to implement AI tools reliably due to broadband and device-maintenance infrastructure gaps.

What Would Actually Constitute Proof

A credible RCT for an LLM-based tutoring platform in 2026 would require at least 60 schools randomized at the school level to prevent contamination, a 12–18 month treatment window, pre-registered primary outcomes on state assessments or independently validated instruments rather than platform-internal scores, and intent-to-treat analysis that captures real-world engagement drop-off rather than restricting results to completers. Both Khanmigo and Synthesis have publicly acknowledged discussions with university research partners toward this end. Class Companion has not announced a comparable initiative as of April 2026.

The honest framing for school administrators right now: AI tutoring platforms likely provide real value in closing formative feedback loops, surfacing where students are stuck, and giving teachers better visibility into learning gaps. The technology is mature enough to be deployed thoughtfully. The jump to reliably improved state test scores requires evidence that does not yet exist for LLM-based tutors — and any vendor who claims otherwise is operating ahead of the data.

Frequently asked

Does Khanmigo raise scores on standardized tests, or only on Khan Academy's own platform?
As of April 2026, Khan Academy's published effectiveness data uses in-platform formative metrics — algebra readiness and reading comprehension scores measured inside Khan Academy's own system. The company has not published peer-reviewed data showing gains on independent assessments like state tests or NAEP. Sal Khan explicitly acknowledged this limitation in Brave New Words (May 2024) and called for rigorous third-party controlled studies.
What is Synthesis's claim of learning math several times faster than average actually based on?
Synthesis compared internal skill-progression rates for high-engagement subscribers to national NAEP growth norms — a comparison between a self-selected paying population and a general national average, with no control group. The outcome measure is internal to the Synthesis platform and not independently validated. The underlying pedagogy has real theoretical support in constructivist learning research, but this specific marketing figure is not a peer-reviewed study result.
Has any LLM-based AI tutoring platform completed a genuine randomized controlled trial as of 2026?
Not as of April 2026. The strongest RCT evidence in AI-assisted tutoring belongs to pre-LLM platforms: ASSISTments produced a 0.18 SD effect on 7th-grade state math (Roschelle et al., AERA Open, 2016), and Carnegie Learning's MATHia earned positive findings in the What Works Clearinghouse review. Both Khanmigo and Synthesis have signaled intentions to pursue university-partnered controlled trials; Class Companion has not announced a comparable effort.
Why do some districts report gains while others see nothing from these tools?
Implementation quality is the dominant variable in field data. Districts that pair AI tutoring with teacher check-ins, minimum usage requirements, and professional development consistently outperform districts that deploy the tools with minimal human scaffolding. The AI platform appears necessary but not sufficient; the human layer connecting platform feedback to classroom instruction matters at least as much as the technology itself.
Are AI tutoring tools worth the cost for budget-constrained schools?
As a supplement to human instruction at $5–15 per student per year, these tools likely provide positive value even with modest measured effects. As a replacement for high-dosage human tutoring — which shows the strongest documented pandemic-recovery outcomes — the evidence does not support the substitution. AI tools that free teacher bandwidth for intensive human interactions are probably net positive; AI deployed in place of those interactions is a bet the data has not yet validated.

Sources & further reading

  1. What Works Clearinghouse — Institute of Education Sciences, U.S. Dept. of Education
  2. ASSISTments Research Platform — Worcester Polytechnic Institute
  3. Carnegie Learning — Research and Efficacy
  4. Khan Academy — Khanmigo
  5. Synthesis — AI Math Tutor
  6. Common Sense Media — Education Technology
  7. Education Week — Artificial Intelligence in K-12

Last reviewed Apr 30, 2026. AI Pulled News is editorial; corrections welcome at /news/about.html.