Digging deeper into the i3 Grant evaluation

Photo Credit: Krista Lundgren/USFWS

Photo Credit: Krista Lundgren/USFWS

by Saro Mohammed

In recent weeks, I have shared my thoughts on the recently released i3 evaluation both in The Hechinger Report and on this blog. As you may suspect, news columns do not provide a venue for diving deeply into the nuances of interpreting academic research, so I would like to take this opportunity to expand a bit on my thoughts regarding that report specifically, as well as other research on educational innovations in the future.

Honestly, both my conversation with and subsequent column in The Hechinger Report may have focused more on the difficulty of research and educational innovation, and not enough on the importance of building an evidence base. Evidence is the only way to determine which innovations are promising and effective, and integrating evidence helps prevent practitioners from implementing ineffective ideas, like the 1 development project found to have statistically negative effects in the i3 evaluation.

When looking at the i3 grants evaluation holistically, I think that the grants worked exactly as intended. As I mentioned in The Hechinger Report, “The study results are not all bad. Only one of the 67 programs produced negative results, meaning that kids in the intervention ended up worse off than learning as usual. Most studies ended up producing ‘null’ results and...that means ‘we’re not doing worse than business as usual. In trying these new things, we’re not doing harm on the academic side.”

These grants were designed to invest more funds in practices that had the strongest evidence of efficacy, and fewer funds in those that showed promise but had less evidence of efficacy. Because of the existing evidence base the grant required, we expected scale-up projects to mostly be successful – this was borne out by the data in that 100% of the scale-up grants found positive (50%) or null (50%) effects. In contrast, 93% of the findings for validation grants were positive (40%) or null, and 62% of the development grants had positive (8%) or null effects. Again, by design, the development grants were the most risky to begin with.

The reported percentages should further be contextualized by the fact that 18 of the 67 grants did not meet the evaluation’s inclusion criteria for supporting causal claims. Therefore, for those 18 grants (27%), we simply do not know if their effects were positive, null, or negative.

As I mentioned in The Hechinger Report piece,

“It’s sometimes hard to prove that an innovation works because of unintended consequences when schools try something new. For example, if a school increases the amount of time that children read independently to try to boost reading achievement, it might shorten the amount of time that students work together collaboratively or engage in a group discussion. Your reading outcomes may turn out to be the same [as the control group], but it’s not because independent reading doesn’t work. It’s because you inadvertently changed something else. Education is super complex. There are lots of moving pieces.”

In fact, a null finding may actually be good if there are other outcomes we care about (i.e., cost, exposure to technology, social-emotional skills) – because this implies we are not doing harm by introducing a new intervention that provides other benefits. In other words, it is unlikely that there are unintended negative academic consequences from an intervention with null findings.

Academic achievement is complex and usually requires behavior change over multiple domains. Isolating just one intervention is hard to do – both in a research study, and in practice in the classroom. Even if we are successful in holding all else constant, changing just one instructional practice may inadvertently change other (often unmeasured) things that could undermine outcomes.

To add yet another wrinkle to innovating in education, as I mentioned in the previous pieces, learning improvements are slow and incremental. It can take longer than even the three-to-five-year time horizon that the innovation grants allowed.

Innovation, by definition, has a weaker evidence base – not because it's inherently less effective, but because we don't yet know which innovations are effective. Making measurement a larger and more accessible part of blended and personalized learning is a crucial piece of our work at The Learning Accelerator, and we are excited for the opportunities to engage with other partners, like the Digital Learning Collaborative, to help push this work forward.

About the Author

At The Learning Accelerator, Saro focuses on understanding if, how, and when K-12 blended learning is effective nationally. She has ten years’ experience in researching/evaluating public, private, and non-profit education programs.