How States Should Redesign Their Accountability Systems Under ESSA

ednext-blog-nov16-griffith-petrilli-essa

Those of us at the Fordham Institute have long held that there’s no one best way to design a state accountability system. It’s not just that we can’t even agree amongst ourselves about the relative importance of measuring student growth vs. student proficiency (though that’s true). It’s also because we understand that, as with all policy endeavors, this one amounts to a series of trade-offs. Perhaps there are some “wrong” answers (such as relying exclusively on proficiency rates in reading and math to judge school quality, or measuring school spending and other inputs and calling it accountability) but mostly there are a whole bunch of right and partially-right answers, depending on policymakers’ goals and states’ idiosyncrasies. That’s why, nine months ago, when we hosted our ESSA Accountability Design Competition, we intentionally decided not to declare a “winner.”

Still, we know that states are now putting pen to paper on their accountability plans and that many of them want advice about what to do. So no more hesitating or prevaricating. Here’s our attempt—just David and Mike, mind you, not Fordham at large—to lay out an ideal accountability system for states. Consistent with the guidelines for the original accountability contest, our proposal includes design objectives, the indicators we would use to assign summative ratings to schools (should the feds in fact force states to do this), as well as recommendations for the U.S. Department of Education.

Design Objectives

Our proposed accountability system has three goals:

1. To gauge school performance as fairly and accurately as possible (most importantly, by not penalizing schools for factors outside their control, such as students’ prior achievement)

2. To encourage schools to focus on all students’ long-term success

3. To foster an environment that rewards quality by empowering parents

As these goals suggest, we believe that ESSA provides an opportunity to move “accountability” beyond the constrained vision of No Child Left Behind—and that states should seize that opportunity. Rather than hold schools to account for student performance at one point in time, we can now hold them accountable for all students’ progress from one year to the next. Rather than focus exclusively on ensuring that kids are minimally proficient, we can now reward improvement across the achievement spectrum. Rather than making state bureaucrats solely responsible for holding hundreds or thousands of schools to account, we can share this responsibility with those with the greatest stake in the final outcome: parents and other adult caregivers.

System for Rating Schools

Consistent with the Obama Administration’s interpretation of ESSA, our proposed accountability system assigns school ratings based on a range of indicators, which we describe below. To be clear, we believe ESSA can be read to allow for multiple grades or ratings for schools. For instance, states might assign separate ratings to each of the five indicator types the law requires: academic achievement, student growth, graduation rates, progress toward English language proficiency, and other indicators of school quality and student success. That would be our preferred interpretation, and one we hope the Trump Administration adopts. Still, should the feds decide to stick to their guns, here’s how we’d combine these factors into a summative grade (A–F).

Indicators of Academic Achievement (10–25 percent of summative school ratings)

In the median state, measures of academic achievement currently count for about half of schools’ summative ratings. However, because these measures are strongly correlated with student demographics and prior achievement, we believe they should count for at most a quarter of schools’ ratings going forward. Further, instead of merely using proficiency rates, which encourage schools to focus on the “bubble kids” (i.e., those just above or below the proficiency threshold) states should use average test scores (like Nebraska) or a “performance index” (like Ohio). Currently, at least seventeen states use (or plan to use) one of these methods, though it’s unclear whether the Department of Education will allow them to continue to do so under the new law.

Indicators of Student Growth (50–90 percent of elementary and middle school ratings, 40–80 percent of high school ratings)

Because they are the best indicators of schools’ overall performance, measures that capture the academic growth of all students should count for at least half of elementary and middle schools’ ratings (as they already do in six states) and at least 40 percent of high schools’ ratings (as they do in Michigan and New Mexico). Currently, forty-five states estimate growth in English Language Arts and math at the K-8 level, and thirty-five do so in high school, so assigning more weight to these measures is something most states could do right away. In the medium term, states should also continue to develop their capacity to estimate growth at the high school level, as well as in other core subjects such as science and social studies, which will allow them to weight growth even more heavily in the future. Because growth scores can be unstable, even at the school level, we encourage states to average over two years (or three, if necessary) when calculating school grades. We also strongly urge states not to use “growth to proficiency” measures, as these encourage schools to ignore the needs of their high-achievers (and are poor indicators of school quality). Similarly, we urge those states that base a portion of their grade on the progress of low-achieving students, or other subgroups, not to overdo it. In our view, at least three quarters of whatever weight states assign to growth should be based on growth for all students.

Indicators of Progress toward English Language Proficiency (Variable)

We will leave the debates over how to best serve English Language Learners to those with expertise in this area. However, common sense suggests that the weight assigned to ELL measures should vary based on the percentage of a school’s students who are classified as ELL.

High School Graduation (10–25 percent)

Though we don’t have strong opinions about how states measure graduation rates, it’s important that they don’t assign too much weight to this indicator, lest they encourage schools to lower their standards for earning a diploma. In our view, basing 10–25 percent of high schools’ ratings on some combination of 4- and 5-year graduation rates is a reasonable approach.

Indicators of Student Success or School Quality (10–20 percent)

There is broad agreement that states’ current accountability systems are overly dependent on standardized tests that do not (and cannot) capture all the skills that students need to acquire, and that have sometimes encouraged teachers to engage in harmful curriculum narrowing and “test prep.” Yet many of the alternatives to testing that have been proposed, while promising in theory, are problematic in practice. Consequently, though we support the goal of reducing the emphasis on testing, we encourage states to be deliberate in their approach. ESSA’s “school quality” indicator provides an opportunity to experiment cautiously with new indicators and approaches, but that does not mean every new idea is worth trying. (For example, we are wary of indicators that can easily be gamed, such as those based on teacher surveys.) Over time, the weight assigned to these indicators may grow beyond the parameters we specify here, but first we need to figure out what works.

For now, here are five ideas we believe states should consider:

“College and Career Ready” indicators: Many states already include AP, IB, ACT, and SAT achievement in their high school rating systems, and we heartily endorse all of these of these measures, especially those tied to achievement on AP/IB tests, which are precisely the sort of high-quality assessments that critics of dumbed-down standardized tests have long called for. Likewise, we support dual enrollment-based measures, provided there is some form of quality control (e.g., provided that the credits students earn are accepted by state universities). Finally, we endorse indicators that are tied to industry credentials or certificates, which can be useful to students who are entering the job market directly out of high school. New Mexico, which already includes more than a dozen “college and career readiness” indicators in its high school accountability system, is a good example of what is possible in this area.

Subsequent performance/persistence: How students fare after they leave a school says a lot about what they learned while they were enrolled, and the degree to which that learning was accurately reflected in their test scores—or not. To guard against illusory achievement gains, states should rate elementary and middle schools based on the on-time promotion rate of students in the next two grades after they leave a school (as Morgan Polikoff recommends). Similarly, they should rate high schools based on postsecondary remediation and/or completion rates, which are preferable to enrollment rates.

Student/teacher retention: In a choice-based system, the rate at which students reenroll is an important indicator of school quality that also disincentivizes “creaming.” Similarly, teacher retention is an important indicator of teacher satisfaction that is strongly correlated with student growth. Because they are essentially immune to gaming, both of these ideas deserve more attention than they have received to date. In neither case is 100 percent retention the goal, but very low reenrollment and/or retention rates are surely a sign that something is amiss.

Chronic Absenteeism: Because the link between attendance and students’ long-term success is so clear (and because most states already collect attendance data), chronic absenteeism is an obvious candidate for the “school quality” indicator. States should also consider including chronic absenteeism for teachers.

Student surveys: Many teacher evaluation systems already incorporate the results of student surveys, which research suggests can also predict school and principal value-added. Unlike teacher surveys, which are easily gamed, student surveys are a potentially useful addition to existing evaluation systems, provided that states take sensible steps to ensure the integrity of the results.

Obviously, none of these measures is perfect. In particular, because schools that serve difficult populations are likely to have higher student/teacher turnover, higher remediation rates, and lower attendance, these measures are likely to be biased if the goal of the system is to gauge school performance fairly. In light of this concern, depending on their goals, states may wish to adjust schools’ scores on these indicators by controlling for demographics, geography, and other factors, much as they already do when estimating student growth.

Recommendations for the U.S. Department of Education

We have three recommendations for the U.S. Department of Education, which we hope both the outgoing and incoming administrations will consider.

1. Allow states to use a performance index as their measure of academic achievement.

Almost none of the participants in our ESSA design competition recommended that states use proficiency rates, reflecting the near-universal consensus that such rates are a bad measure of school quality.

2. Allow states to vary their approach to rating schools in reasonable ways.

As noted above, common sense suggests that the weight assigned to the ELL indicator should vary with the proportion of the student body classified as ELL. Similarly, because growth measures may do a poor job of capturing the progress of high-achieving students, some states may want the weights assigned to achievement and growth to vary based on the level at which a school’s students are achieving.

3. Let states decide whether they will assign summative ratings to schools

There is a case for summative school ratings, which send an unambiguous message about the quality of a school to parents who might otherwise be overwhelmed with information. But there is also a case against such ratings, which often obscure more than they illuminate by conflating fundamentally incommensurable indicators, such as growth and achievement. Consequently, states ought to decide for themselves whether to assign such ratings.

* * *

We are in one of those rare moments in the education world when real change is not only possible but likely. For months now, beneath the hideous clamor of the presidential campaign, the quiet murmur of state boards of education and gubernatorial subcommittees has provided incontrovertible evidence of the survival of a different America, in which Democrats and Republicans can still come together—yes, still, despite everything—to do what’s best for kids. Getting accountability right is important if we truly care about their future. Let’s not let them down.

— David Griffith and Michael J. Petrilli

David Griffith is a Research and Policy Associate at the Thomas B. Fordham Institute. Mike Petrilli is president of the Thomas B. Fordham Institute, research fellow at Stanford University’s Hoover Institution, and executive editor of Education Next.

This post originally appeared on Flypaper.

Last Updated

NEWSLETTER

Notify Me When Education Next

Posts a Big Story

Program on Education Policy and Governance
Harvard Kennedy School
79 JFK Street, Cambridge, MA 02138
Phone (617) 496-5488
Fax (617) 496-4428
Email Education_Next@hks.harvard.edu

For subscription service to the printed journal
Phone (617) 496-5488
Email subscriptions@educationnext.org

Copyright © 2024 President & Fellows of Harvard College