GM. This is The Great Unlock, your weekly cheat sheet for the AI revolution. We filter out the noise, red pen fake news, and give out stickers to those deserving.

A couple of issues ago, we told you to trust the research on overhelping. This issue, we give you the tools to interrogate any research — including that one.

Let's unlock it.

 

DEEP DIVE

When the Research Doesn't Hold

Half of social-science studies can't be replicated. Here's what that means for every decision you make.

Two issues ago, we anchored an argument in a 44-study meta-analysis. We stand by the argument. But a paper published in Nature this week is a reminder that even well-constructed research deserves scrutiny — and that district leaders who buy programs on the strength of a single study are operating on thinner ice than they realize.

The paper comes from the SCORE project — the Systematizing Confidence in Open Research and Evidence initiative — a seven-year effort involving 865 researchers who examined nearly 4,000 social-science studies across economics, education, psychology, and sociology.

Only half of the studies that researchers attempted to fully replicate — gathering data from scratch and re-running the analysis — held up with statistical significance. That figure aligns with earlier, smaller projects. But the scale of SCORE makes it harder to dismiss as an outlier result or a methodological quirk.

Treat any single study as a piece of the puzzle, not the whole picture. The research says so — and so does the research on the research.

The SCORE team also found that of 600 papers tested for basic reproducibility — same data, same analysis — only 145 contained enough detail to attempt it, and of those, just 53% matched precisely. This is not a fraud problem. In most cases, it's a transparency problem: insufficient documentation of procedures, missing raw data, and methods described at too high a level of abstraction for anyone else to follow.

1. WHY EDUCATION RESEARCH IS ESPECIALLY VULNERABLE

The replication problem is not unique to education, but education research carries specific structural liabilities that amplify it. Classrooms are not laboratories. Students are not controlled variables. Teachers bring enormous variation in delivery, relationship, and context. The intervention that produced a statistically significant result in 24 classrooms in suburban Ohio is not guaranteed to do the same in your district — even if the study design was sound.

Action research compounds this further. When teachers study their own practice, sample sizes are small, controls are absent, and publication bias runs strong: studies showing that an intervention worked get shared; studies showing it didn't largely disappear. This doesn't make action research worthless — it makes it directional, not definitive. The moment it gets cited in a board presentation as proof, something has gone wrong.

FOR DISTRICT LEADERS

A study's sample size is not the same as its relevance to your district. Before citing research to justify a procurement decision, ask: which research, by whom, with what sample, and replicated where? A well-powered RCT in one context is still a hypothesis in yours until replicated locally.

2. THE PROCUREMENT PROBLEM THIS CREATES

The programs most aggressively marketed to you — the ones with the slickest decks and the most confident efficacy claims — are also the ones most likely to be citing a single study, conducted by the program's own developers, in a context that does not match yours.

The SCORE findings are not an argument against using research to make decisions. They are an argument for using research more carefully. The difference between a single study and a convergent body of evidence is significant. The difference between a developer-conducted efficacy study and an independent replication is enormous. Your procurement framework should reflect both.

More support does not mean better outcomes. More precise support does. The same logic applies to research: more studies is not the same as more certainty. Convergence across independent replications is.

3. THE GREAT UNLOCK THESIS, REVISITED

The overhelping research in the past newsletter issue still holds. Not because a single meta-analysis is definitive, but because the mechanism it describes — autonomy development, productive failure, independent decision-making — is supported by a convergent body of literature spanning decades and multiple fields. Meta-analyses, by design, aggregate across studies, which partly addresses the replication problem for any individual paper within them.

The replication crisis and the overhelping research point in the same direction: the answer is precision, not volume. Precision in what we call support. Precision in what we call evidence. And precision in how we distinguish between a compelling story and a replicable finding.

AI makes this more urgent, not less. A district that builds its AI-powered support layer on poorly-replicated research will deploy that research at scale, faster than any human system could. The amplification works in both directions.

4. THREE QUESTIONS TO ASK BEFORE YOU CITE A STUDY

01  Has it been independently replicated?  Developer-funded efficacy studies are not independent replications. One study, however well-designed, is a hypothesis. Ask what the convergent evidence base looks like, not just the flagship study.

02  Does the sample match your context?  School demographics, resource levels, teacher experience, and cultural context all moderate how an intervention performs. A result in one setting is a starting point for your district, not a guarantee.

03  What does the study actually measure?  Engagement metrics, completion rates, and short-term test scores are not the same as durable learning gains. Ask whether the outcome measured in the study is the outcome you actually care about.

Source: Tyner, A. H. et al. Nature 652, 143–150 (2026). SCORE Project — Systematizing Confidence in Open Research and Evidence. 

 

📋  THE BULLETIN BOARD

The Research Isn't Broken. Your Relationship With It Might Be.

The SCORE findings don't mean education research is useless. They mean the way most districts consume research is broken. There's a meaningful difference between using a study as a directional signal and using it as proof. The former is epistemically honest. The latter is how programs that don't work get renewed for a fifth year.

The good news from SCORE: things are improving. Studies published in 2022–23 showed an 85% computational reproducibility rate, compared to far lower rates in papers from 2009–18. Better norms around data disclosure, pre-registration, and methods transparency are starting to show up in the literature. The question is whether the people purchasing programs on the basis of that research are keeping pace.

KEY INSIGHT FOR PROCUREMENT

When a vendor says their program is "research-backed," ask them to specify: which research, by whom, with what sample, and replicated where? If they can't answer all four questions, you don't have evidence. You have a pitch.

  

🧯  BS DETECTOR

"Proven Effective" Is Doing a Lot of Work

The phrase that appears in almost every EdTech deck. Almost none of them mean the same thing by it.

You will see the words "proven effective" in the marketing material of the next program pitched to your district. Here is what that phrase is hiding.

"Proven" typically means: one study was conducted, usually by or on behalf of the developer, usually with a sample selected for convenience, usually measuring outcomes over a short timeframe, and the results were statistically significant at p < 0.05 — a threshold that the replication literature has spent fifteen years demonstrating is insufficient on its own.

"Effective" typically means: students in the program performed better on a specific measure than students who weren't — under conditions that no one has attempted to replicate in a different setting.

Statistical significance at p < 0.05 tells you the result probably isn't random noise. It does not tell you the result will hold in your district, with your students, with your teachers delivering it.

Ask vendors for the study itself, not just the summary slide. Look at the sample size. Look at who funded the study. Look at whether anyone outside the developer has attempted to replicate it. And look at what was actually measured — because "student growth" and "teacher-reported engagement" are not the same outcome, even if both are described as evidence of effectiveness.

ASK THE VENDOR

Show me the primary study. Who funded it? What was the control condition? Has it been independently replicated? What happens to student outcomes when the programme ends? If the answer to any of these is unclear, that is the answer.

  

🍎  THE TEACHER'S LOUNGE

The Teachers Already Knew This Too

Here is the thing about the replication crisis that no press release will say out loud: experienced teachers have been living it for decades.

They've watched the reading war flip from one evidence base to another. They've implemented three different frameworks for differentiated instruction in five years, each one "research-backed." They've sat through professional development sessions anchored on studies that later turned out not to replicate. They've been told that the science is settled — and then watched the science get revised.

The best teachers respond to this the same way the best scientists do: they treat research as one input among several, they watch what actually happens in their classrooms, and they remain skeptical of any single finding that arrives pre-packaged with a solution. That is not anti-intellectualism. That is appropriate epistemic humility.

The teachers who pushed back weren't rejecting research. They were practising it — in the most honest sense of the word.

The shout-out this week goes to the teachers who asked "but has this actually been replicated?" in the professional development session — and didn't get a straight answer, but kept asking anyway.

  🏅 Recognition Sticker: "Asked the Follow-Up Question the Study Didn't Answer"   

 ________________________________________________________

 

That's all the unlock for today. Tune in next week.

 

Stay awesome, you unlockers!

Want more? Order The Great Unlock here.

Keep Reading