Critique of Study of Voucher Impact on College Enrollment Misguided

By 09/13/2012

9 Comments | Print | NO PDF |

We recently released a study that shows that school vouchers in New York City had a positive impact on the college enrollment rate for African-American students but not among Hispanic students.  We think the study is important because it provides the first experimental estimate of the impact of vouchers on college enrollment.

The National Education Policy Center has just released a critique of our study by Sara Goldrick-Rab of the University of Wisconsin-Madison.

Several of the issues raised by Goldrick-Rab have no merit and none undermine the primary conclusion of our study: The voucher intervention in New York City increased the college enrollment rates of African-American students.  Below are responses to the primary criticisms raised in the review:

1. The review questions the equivalence of the treatment and control groups by pointing to a modest difference between the treatment and control groups in the share of African-American students’ parents who completed a bachelor’s degree.  This difference is only marginally statistically significant and, as the review notes, there are other differences that favor the control group.  For example, control group families are less likely to have a father absent.  Because chance differences can appear for any one characteristic, statisticians have developed a test that uses information on all background characteristics to ascertain whether two groups are equivalent.  The overall treatment and control groups and the African-American and Hispanic subgroups all survive this test.

2. The review says that an interpretation of the results for African-American students is not appropriate because they do not differ significantly from those observed for Hispanic students. As stated in our report, it is true that the effects for African Americans and Hispanics are both positive and do not differ from one another by an amount that is statistically significant.  But we can confidently say that the effect for African-American students is positive (i.e. greater than zero), whereas we cannot say the same for Hispanic students.

3. The review asks for an interpretation of the results for the small number of white and Asian students.  But the treatment and control groups for this small number of students do not survive the equivalence test mentioned in item one; interpreting the results is therefore inappropriate.

4. The review raises a technical issue related to measurement error that is incorrect.  It is correct that our college attendance measure is not perfect because the process used to match students to college enrollment records is not precise.  But those errors appear as part of the standard error currently reported and no further adjustment is appropriate.

5. The review makes an error in its interpretation of a null finding.  It concludes that our report “convincingly demonstrates that in New York City a private voucher program failed to increase the college enrollment rates of students from low-income families.”  That statement is false.  The overall impact estimate is not estimated with enough precision to conclude that the voucher intervention had no effect.  The overall impact is not statistically significant from zero, but it is also not statistically significant from a negative impact of 3 percentage points or a positive impact of 4 percentage points.

6. The one result that can be reached with confidence is that the impact of vouchers for African Americans was positive.  None of the issues raised in this commentary compromise that conclusion.

– Matthew M. Chingos and Paul E. Peterson

Comment on this article
  • […] I share Sara’s concerns about statistical significance and subgroup effects. [UPDATE: Here is the authors' response to Sara's report, which is not surprising. If you like snark with your policy debates, I recommend checking out […]

  • Stuart Buck says:

    Interesting that this was published by NEPC, which is often guilty of far more egregious claims of causality (for example, its recent press release claiming that K12 students were “falling behind” public school students, based solely on cross-sectional test score means, as far as I could tell).

    Anyway, re: number 5: When I studied econometrics, the professor hammered home the point, in about the first week or so, that failure to reject the null is NOT the same thing as proof that the null is correct, and that you should never ever say that the null has been proven correct (that’s not what hypothesis testing is set up to do).

  • Liz says:

    If researchers were dedicated to the success of youth rather than the politics that occurs at young people’s expense, the education institution (or voucher or school choice or union, whatever) debate would emphasize what needs to be done in schools for young people to succeed. “Statistically significant” findings mean nothing unless it can be accompanied by a discussion of the mechanisms that make these vouchers create potentially better educational opportunities for African American youth. The conversation needs to focus more on what institutions can better do for kids and less about the preferences of political groups that are competing to “educate” them.

  • […] emailed Matthew Chingos at the Brookings Institute for a response to the Goldrick-Rab review. The link can be found here. Chingos’ response to the review does not address any of the analysis concerns, nor does it […]

  • Scott McLeod says:

    “Chingos’ response to the review does not address any of the analysis concerns, nor does it provide further supporting data. The response is largely anecdotal, and in essence, says ‘we know what we are doing…trust us.'”


  • Marshall says:

    Regarding #4 more generally, the measurement error inflates the standard errors, making it harder to reject the null. Thus, this actually lends more evidence to the claim that the effect exists. That she seems to actually state this clearly, but then implies that somehow this isn’t reflected in the variance estimates already, is very odd and very sloppy.

    It’s also interesting that she doesn’t work through the implications of her critique for other subgroups, particularly those where the estimate of the treatment could be downward biased/error estimates inflated because of non-random measurement error for different subgroups (so, the null was not rejected where it should have been). For instance, Hispanic students are notoriously underrepresented in these data sets, particularly recent immigrant students who are undocumented.

  • Sara Goldrick-Rab says:

    Dear Marshall,

    Nice try, but my claims aren’t “odd” or “sloppy” in the least. You have assumed random error, whereas I described differential error.

    See here for my full response to C&P’s piece.


  • Scott McLeod says:

    Here is Sara Goldrick-Rab’s reply to this post:

  • Stuart Buck says:

    A further comment on Goldrick-Rab’s final line:

    “Misconception #2: A nonsignificant difference (eg, P .05)
    means there is no difference between groups.

    “A nonsignificant difference merely means that a null effect is statistically consistent with the observed results, together with the range of effects included in the confidence interval. It does not make the null effect the most likely. The effect best supported by the data from a given experiment is always the observed effect, regardless of its significance.”

    For even more reading on what significant testing means, Ziliak and McCloskey’s book is useful.

  • Comment on this Article

    Name ()


    Sponsored Results

    The Hoover Institution at Stanford University - Ideas Defining a Free Society

    Harvard Kennedy School Program on Educational Policy and Governance

    Thomas Fordham Institute - Advancing Educational Excellence and Education Reform