Climate Change and Value-Added: New Evidence Requires New Thinking

By 10/24/2013

3 Comments | Print | NO PDF |

Anyone participating in the education policy debate for five years or more probably staked out their position on the use of value-added (or student achievement growth) in teacher evaluations  long ago.  That’s unfortunate, because, as has happened with research on climate change, there has been a slew of new research, especially in the last three years, on the strengths and weaknesses of such measures.  Given what we have learned, one wonders whether there would have been more consensus by now on the appropriate use of test-based measures in teacher evaluation if the debate had not started out so polarized.

On statistical volatility (or reliability) of value-added

Remarkably, there is no disagreement about the facts regarding volatility: the correlation in teacher-level value-added scores from one year to the next is in the range of .35 to .60.  For those teaching English Language Arts, the results tend toward the bottom end of the range.  For those teaching math, the results tend toward the top end of the range.  Also, in middle school and high school, where the number of students taught in a given subject is larger, the stability of the measures tends to be higher.

Critics of value-added measures frequently cite year-to-year volatility as a primary reason for not using such measures for evaluating individual teachers.  Indeed, if the measurements were so volatile that student achievement gains in one year were completely unrelated to a teacher’s likely success with future students, they would be right.

That is simply not the case.  For many purposes, such as tenure or retention decisions, it is not the “year to year” correlation that matters, but the “year-to-career”—that is, the degree to which a single year’s value-added measure would provide information about a teacher’s likely impact on students over their future careers.  It turns out that a year-to-year correlation of .35 to .60 implies that a single year of achievement gains is correlated .60 to .77 with a teacher’s average student achievement gain over their career.  (The year-to-year correlation is diminished by the fact that each single year is subject to measurement error. Such errors are averaged out over a career.)

For example, in a forthcoming analysis in three school districts where it was possible to track teachers’ student achievement growth over many years, Doug Staiger and I found that of those teachers who were in the bottom quartile of value-added in a single year, 55 to 65 percent were in the bottom quartile over their careers and 82 to 87 percent were in the bottom half.

Therefore, although they are subject to volatility, value-added measures do have predictive power.  They do provide information about a teacher’s likely future success with students.   Consequently, such evidence does deserve some weight in a supervisor’s decision about whether or not to retain a teacher, even if it is not the sole factor.

On the role of unmeasured student traits

As we all know, students are not randomly assigned to teachers or to schools.  The Measures of Effective teaching study confirmed considerable differences in the baseline achievement of students assigned to different teachers, even in the same schools.  Indeed, such tracking persists over multiple school years, as some teachers are assigned higher- or lower-achieving students than their colleagues year-after-year.

Fortunately, most state data systems make it possible to track individual students’ scores over multiple years and to control for prior student achievement.  Indeed, the whole point of “value-added” measures is to control for observed traits such as students’ prior achievement and characteristics.

However, skeptics have raised appropriate questions about whether such controls capture all the relevant traits which are used to sort students into different teachers’ classrooms.  A frequently cited paper by Jesse Rothstein in the Quarterly Journal of Economics in 2010 correctly points out that such selection on so-called “unobserved” student traits could lead to bias. Do some teachers receive students year-after-year that that are different in other ways that are much more difficult to control for?

There have been three new studies in recent years which test that concern directly. And, again, remarkably, there’s been little dispute about the findings.  One study by Raj Chetty, Jonah Rockoff and John Friedman studied what happened when high value-added or low value-added teachers moved across schools or across grades.  If a teacher’s apparent success was due to his or her students (and not to the teacher’s talent and skill), then we should not see scores move when a particularly high value-added (or low value-added) teacher moves between schools or grades.  However, they found that scores do move when teachers move.  In fact, the magnitude of the changes in achievement are indistinguishable from what we would predict if the value-added measures reflected causal teacher effects.

Two other studies—one involving 79 pairs of teachers in Los Angeles (which I wrote with Douglas Staiger) and the Measures of Effective Teaching study involving 1,591 teachers in six different school districts (which I wrote with Dan McCaffrey, Trey Miller and Douglas Staiger)—randomly assigned teachers to different groups of students within a grade and subject in a school.  In both studies, we used teachers’ effectiveness as measured using value-added methods in prior years.  We then tested whether the teachers who had been identified as more effective using the value-added measures had students who achieved more following random assignment.  They did.  And, in fact, the differences were statistically indistinguishable from what one would have predicted based on the value-added measures.

We should know even more in the coming months: The Chetty et. al. findings are currently being replicated in at least one other school district.  And a Mathematica study is due out soon studying the impact on student achievement when high value-added teachers were offered bonuses to move and randomly assigned among a set of schools that had volunteered to hire them.

On the long-term life consequences of high achievement growth teachers

Even if convinced by the evidence of the causal effects of test-based measures and their predictive power, skeptics have raised questions about the quality of the tests being used.  Multiple choice tests are inherently limited.  Skeptics—and many parents—worry about whether the teachers who generate success on those tests have any long-term positive impacts on children.

There are two new studies which shed light on these concerns.  One, by the same team of researchers who studied teachers switching between schools (Chetty, Friedman and Rockoff), tracked teachers long-term effects on student earnings.  They found that teachers had long-term effects on student earnings.  But even more remarkable, they found that a teacher’s impacts on future student earnings were associated with their effectiveness as measured by value-added.

A second study, recently published in the Proceedings of the National Academy of Sciences (PNAS) by Gary Chamberlain, using the same data as Chetty and his colleagues,  provides fodder both for skeptics and supporters of the use of value-added:  while confirming Chetty’s finding that the teachers who have impacts on contemporaneous measures of student learning also have impacts on earnings and college going, Chamberlain also found that test-scores are a very imperfect proxy for those impacts.   Only a fraction of the impact teachers’ have on earnings and college going is mediated through their apparent effect on test-based measures.

On the comparative advantages of other measures

We have also learned a lot about the alternatives to value-added measures—especially, classroom observations and student surveys—in the past three years.   Many of the same concerns about reliability and bias due to unmeasured student traits apply to these as well.  For instance, in the Measures of Effective Teaching project, we learned that even with trained raters, a single observation of a single lesson is an unreliable measure of a teacher’s practice.  Indeed, the reliability that we saw with single classroom observations (around .4) would have been on the low-end of the reliability of value-added measures.  Higher reliability requires multiple observers and multiple observations.

In sum

Reasonable people can disagree on how to include achievement growth measures in teacher evaluations, such as whether the weight attached should be 20 percent of 50 percent, but it is no longer reasonable to question whether to include them.  For a number of reasons— limited reliability, the potential for abuse, the recent evidence that teachers have effects on student earnings and college going which are largely not captured by test-based measures—it would not make sense to attach 100 percent of the weight to test-based measures (or any of the available measures, including classroom observations, for that matter).  But, at the same time, given what we have learned about the causal impacts on students and the long-term impacts on earnings, it is increasingly hard to sustain the argument that test-based measures have no role to play, that the weight ought to be zero.  Although that may have been a reasonable position five years ago, when so many questions about value-added remained unanswered, the evidence which has been released since then simply does not support that view.

-Tom Kane

Thomas Kane is a fellow at the Brown Center on Education Policy and a professor at the Harvard Graduate School of Education.

This post first appeared on the Brown Center Chalkboard.

Comment on this article
  • Bill Bryan says:

    What a bunch of Edu-Speak gobble-d-gook! Over thirty
    years ago the Atlanta Public School System in Georgia
    proposed a Teacher Certification Test independent from
    SACS—Southern Association of Colleges and Schools—
    because of the horrible drop-out rates AND tests given
    to recent APS graduates where about 40% of the recent
    grads could not pass all sections of the Internationally
    recognized GED or High School Equivalency Test the
    1st time.

    Long story short: 200+ tenured APS teachers—all with
    Masters Degrees—were given a proposed Teacher Certification Test. About 50% of the Black teachers
    failed the test and about 5% of the White teachers failed
    this test. The Test? The International GED test!

    One of my APS teacher friend took this test and easily
    passed all sections the 1st time. “Bill, the Math questions
    where most Black teachers failed was 6th grade level!
    And, they still failed! English was 9th grade. We have
    a lot of ignorant teachers who are greatly damaging
    the students who attend APS Schools.

    I just couldn’t believe my friend’s statement about how easy the GED Test was, so I took it. My teacher friend was
    correct! If you have 1950s era 6th Grade Math Skills and 9 Grade English skills, you should have no problems passing
    The APS Teacher Certification Test.


  • Jersey Jazzman says:

    Dr. Kane:

    Do you believe climate change is caused by human activity and is imperiling the planet?

    Because you never say…

  • Wayne Gersen says:
    I wrote a post on this drawing on a white paper I wrote when NH was seeking RTTT funds. Here’s the bottom line: If districts across the country were required to use the new computerized tests to implement value added measures for ALL teachers it would require several years. Each State would be required to develop new assessments for those content areas not covered by existing tests, field test those assessments, and implement them for multiple years before receiving the results needed to make any meaningful decisions on teacher performance. All of this assumes it is possible to design such an assessment for small rural schools and assumes assessments can be designed and implemented for secondary teachers and K-12 teachers in specialized subjects. In short, it would be a costly and administratively complicated undertaking. The money and time would be better spent helping the children raised in poverty who struggle in schools.

  • Comment on this Article

    Name ()


    Sponsored Results

    The Hoover Institution at Stanford University - Ideas Defining a Free Society

    Harvard Kennedy School Program on Educational Policy and Governance

    Thomas Fordham Institute - Advancing Educational Excellence and Education Reform