Rethinking Charter School Evaluations When the Gold Standard Is Off the Table

Way back in 2001, Northwestern Professor Tom Cook penned an article for Education Next that implored education researchers to embrace randomized experiments, complaining that education was the one area of social science research that was lagging behind in this regard. We’ve come a long way since then, thanks in no small part to Russ Whitehurst’s leadership of the Institute of Education Sciences within the U.S. Department of Education. The past decade in particular has witnessed substantial growth in the use of experimental methods in education, commonly referred to as “the gold standard.” This has been particularly helpful for evaluating the effectiveness of charter schools, a controversial education reform with a mixed record overall but one that shows remarkably large gains for disadvantaged students in urban areas.

The most up-to-date data suggest that 2.5 million students attend public charter schools across the country. Without a doubt, a gold standard evaluation of each of the 6,440 charter schools these students attend would be nice, but it’s impossible to do a randomized experiment of a school that isn’t oversubscribed. No waiting list? No experimental evaluation. It’s a bit of a conundrum for policymakers, advocates, and researchers with an interest in evaluating all charters, not just the most popular ones.

Enter the Stanford-based based Center for Research on Educational Outcomes (CREDO). Led by Macke Raymond, this research organization regularly publishes national, state, and city evaluations of charter effectiveness. How much stock can we put in an observational model of charter effectiveness, such as that used by CREDO? The answer to this question matters greatly, both to supporters and opponents of charter schools. Voicing the uncertainty and confusion shared by many in an interview with the Wall Street Journal last week, Center for Education Reform Senior Fellow and President Emeritus Jeanne Allen asked, “Can we know without a doubt that [these CREDO studies] are valid?”

In a new working paper released today, we kick the tires on CREDO’s model— identifying four potential methodological weaknesses that could lead to bias, and conclude that none raise serious red flags:

1. A well-publicized critique of the CREDO approach relates to their use of “virtual control records” that are generated from the records of up to seven public school “virtual twins”, instead of one-to-one matching. We find this does not materially affect the results.

2. Patrick Wolf, John Witte, and David Fleming have previously noted that the private school sector in Milwaukee systematically under-classifies students for government programs such as the National School Lunch Program. We applied that same logic to the charter sector in Florida, testing to see if student participation in such programs is inconsistently measured across sectors. We find evidence that this is certainly the case but that CREDO’s approach to matching on these variables only modestly affects their estimates of charter effectiveness.

3. We test the internal validity of the CREDO model by comparing its estimates to those produced by an instrumental variables (IV) approach, finding that a rigorous IV method produces similar results to CREDO’s observational estimates

4. Finally, we test the wisdom of relying on an experimental evaluation of popular charter schools to make inferences about all other charter schools. We do this by comparing estimates of the relative effectiveness of oversubscribed and undersubscribed charter schools in Florida, showing that impact estimates differ for oversubscribed and undersubscribed charters. This helps to explain why lottery-based studies tend to find charter impacts that are much larger than those reported by CREDO.

The findings we’re releasing today are good news for the education research community because they demonstrate that CREDO’s synthetic matching approach offers a reasonable alternative when the gold standard is not feasible or possible. This approach appears to produce reliable estimates of charter effectiveness and does so in a manner that ensures high rates of coverage for many different types of charter schools in diverse locations across the country.

— Anna Egalite and Matthew Ackerman

Anna J. Egalite is an Assistant Professor in the Department of Educational Leadership, Policy, and Human Development at North Carolina State University. Matthew Ackerman graduated with an AB in Economics from Harvard in 2014 and is now a graduate student in Economics at the London School of Economics.

Rethinking Charter School Evaluations When the Gold Standard Is Off the Table

Latest Issue

Summer 2025

NEWSLETTER

Business + Editorial Office

Discover

More Information