<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
xmlns:rawvoice="http://www.rawvoice.com/rawvoiceRssModule/"
>
<channel>
	<title>Education Next &#187; Check the Facts</title>
	<atom:link href="http://educationnext.org/category/check-the-facts/feed/" rel="self" type="application/rss+xml" />
	<link>http://educationnext.org</link>
	<description>Education Next is a journal of opinion and research about education policy.</description>
	<lastBuildDate>Mon, 20 May 2013 20:30:57 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
<!-- podcast_generator="Blubrry PowerPress/2.0.4" -->
	<itunes:summary>Education Next is a journal of opinion and research about education policy. Our podcasts include stories, interviews, and discussions of the latest developments in education policy. 

The Education Next Book Club features in-depth interviews by Mike Petrilli with authors of new and classic books about education.

 For more information visit educationnext.org</itunes:summary>
	<itunes:author>Education Next</itunes:author>
	<itunes:explicit>clean</itunes:explicit>
	<itunes:image href="http://educationnext.org/images/itunes.jpg" />
	<itunes:owner>
		<itunes:name>Education Next</itunes:name>
		<itunes:email>education_next@hks.harvard.edu</itunes:email>
	</itunes:owner>
	<managingEditor>education_next@hks.harvard.edu (Education Next)</managingEditor>
	<itunes:subtitle>Education Next is a journal of opinion and research about education policy.</itunes:subtitle>
	<itunes:keywords>ednext, educationnext, education, school, reform, k-12, charter, voucher, teacher, NCLB, curriculum</itunes:keywords>
	<image>
		<title>Education Next &#187; Check the Facts</title>
		<url>http://educationnext.org/images/rss.jpg</url>
		<link>http://educationnext.org/category/check-the-facts/</link>
	</image>
	<itunes:category text="Education">
		<itunes:category text="K-12" />
	</itunes:category>
		<item>
		<title>Questioning the Quality of Virtual Schools</title>
		<link>http://educationnext.org/questioning-the-quality-of-virtual-schools/</link>
		<comments>http://educationnext.org/questioning-the-quality-of-virtual-schools/#comments</comments>
		<pubDate>Thu, 10 Jan 2013 10:16:02 +0000</pubDate>
		<dc:creator>Matthew M. Chingos</dc:creator>
				<category><![CDATA[Check the Facts]]></category>
		<category><![CDATA[Homepage]]></category>
		<category><![CDATA[Journal]]></category>

		<guid isPermaLink="false">http://educationnext.org/?p=49651993</guid>
		<description><![CDATA[NEPC report uses flawed measures]]></description>
			<content:encoded><![CDATA[<p><em>Gary Miron and Jessica L. Urschel, “Understanding and Improving Full-Time Virtual Schools: A study of student characteristics, school finance, and school performance in schools operated by K12 Inc.,” National Education Policy Center, School of Education, University of Colorado-Boulder (July 2012)</em></p>
<p><strong>Checked by Matthew M. Chingos</strong></p>
<p>Proponents of school choice have sought for at least two decades to expand the education options available to families who lack the financial means to move to a neighborhood with high-quality public schools or to pay private-school tuition. Forty-one states and the District of Columbia now allow the founding of charter schools, which enrolled just over 2 million students in 2011–12, or about 4 percent of students nationwide, more than triple the number a decade earlier. Some states have voucher-type programs that enable children to use public funding to attend private schools, and some districts allow students to attend a traditional public school other than the one in their neighborhood.</p>
<p>Families certainly have more education options for their children than they did 20 years ago, but the growth of high-quality alternatives to the neighborhood school has often been constrained by geography: a student may not live within a reasonable distance of a desirable charter school or may lack reliable transportation to a school of choice if the district does not provide it. In rural communities, it may not make financial sense to have more than one school, and even populous areas may not have enough students to support a range of schools targeted at students with different needs and interests.</p>
<p>The potential to eliminate such geographic constraints on school choice at both the course and school levels may lie in digital learning. For instance, a student at a small high school that does not have enough students to justify offering an Advanced Placement course in physics can now take a course through an online provider if her school permits and funds such opportunities. In 31 states, students can enroll in a full-time virtual school, often from anywhere in the state, free of limitations based on geography or the physical constraints of a building.</p>
<p>Full-time virtual schools have gone from barely a blip on the radar screen a decade ago to enrolling approximately 275,000 students in 2011–12, according to one estimate. The schools have attracted the kind of scrutiny that most new innovations receive before they have an established track record of success (or fail and die out). The fact that many virtual schools are operated by for-profit education management organizations (EMOs) has surely contributed to the degree of scrutiny, prompting such publications as a recent report by the National Education Policy Center (NEPC) on the largest operator of these schools, K12 Inc.</p>
<p>The NEPC report presents data from a variety of public sources on a portion of the schools operated by K12 Inc. (referred to henceforth as “K12”), including 48 full-time virtual schools that served more than 65,000 students in 2010–11. The report contains some useful descriptive information on the population of K12 schools across the country but is ultimately of little use to policymakers or researchers. The NEPC report uses badly flawed measures of school performance that provide little information about how much students learn as a result of attending K12 schools. Consequently, it is unclear how to interpret the report’s comparisons of school finances without knowing whether K12’s schools are performing well, poorly, or in between.</p>
<p><a href="http://educationnext.org/files/ednext_20132_EN_chingos_img01.jpg"><img class="alignright size-full wp-image-49651994" style="float: right; padding-top: 5px; padding-bottom: 5px; padding-left: 5px;" src="http://educationnext.org/files/ednext_20132_EN_chingos_img01.jpg" alt="" width="400" height="632" /></a><br />
<strong>The NEPC Report</strong></p>
<p>Written by Gary Miron and Jessica Urschel, NEPC’s July 2012 report, “Understanding and Improving Full-Time Virtual Schools,” is billed as a “systematic review and analysis of student characteristics, school finance, and school performance of K12-operated schools.” These three sections of the report use publicly available data to compare K12-operated schools with all public schools in the same states.</p>
<p>The report first examines students’ demographic characteristics using data from the 2010–11 school year. Compared to all students in the same states, students at K12-operated schools are more likely to be white (75 vs. 55 percent), less likely to be Hispanic (10 vs. 28 percent), and about equally likely to be black (11 percent). K12 students are modestly less likely to participate in the federal free or reduced-price lunch program (40 vs. 47 percent ), roughly as likely to be classified as having a learning disability (9 vs. 12 percent), and much less likely to be English language learners (less than 1 vs. 14 percent). K12 students are disproportionately enrolled in the middle grades rather than in the elementary or high-school grades.</p>
<p>The NEPC report’s analysis of revenues and spending in 2008–09 is limited to seven K12 schools in five states (representing approximately 60 percent of K12 enrollment nationwide) due to data constraints. The available data indicate that this subset of K12 schools received an average of $7,393 in public revenue per student, which is 20 percent less than the charter school average ($9,258) and 37 percent less than the district school average ($11,708) for the same states. K12 schools spend more on instructional costs but less on teacher salaries and benefits, and more on administration but less on administrator salaries and benefits. The NEPC report refers to these differences as cost advantages and disadvantages. For example, the fact that K12 schools spend $715 per student less on support services than public schools in the same states is interpreted as a “cost advantage” for the virtual schools.</p>
<p>Finally, the NEPC report summarizes a number of measures of what it calls “school performance.” In 2010–11, 28 percent of K12 schools made Adequate Yearly Progress (AYP) under the federal No Child Left Behind accountability law, compared to 52 percent of schools nationwide. In the same year, only 19 percent of K12 schools rated by state education agencies (7 out of 36) received satisfactory grades. Many of these ratings reflect the fact that K12 students are less likely to score at the “proficient” level or above on statewide assessments, with differences (compared to the state average) varying by grade from 2 to 11 percentage points in reading and 14 to 36 points in math. High-school students at K12 schools have an on-time graduation rate of 49 percent, compared to 79 percent at schools in the same states.</p>
<p><strong>Measuring School Quality</strong></p>
<p>The NEPC report paints a dismal picture of student learning at K12-operated schools, but the fatal flaw of the report is that the measures of “performance” it employs are based primarily on outcomes such as test scores that may reveal more about student background than about the quality of the school, and on inappropriate comparisons between virtual schools and all schools in the same state. What parents and policymakers need to know about a school is how much its students learn relative to what they would have learned at the school they would otherwise have attended. In the case of virtual schools, policymakers need to know how well the students at those schools do relative to how they would have done if the virtual schools didn’t exist.</p>
<p>The measures used in the NEPC report—whether schools make AYP, state accountability system ratings, the percentage of students that score proficient on state tests, and high-school graduation rates—are at best rough proxies for the quality of education provided by any school. Using these metrics to compare one group of schools to another is as potentially misleading as inferring that private schools are better simply because their students score higher than their public-school counterparts on the National Assessment of Educational Progress.</p>
<p>Rigorous efforts to measure school quality focus instead on the growth in individual students’ scores on standardized tests from one year to the next. These “value-added” measures are subject to some of the same problems, but by focusing on what students learn over the course of the year, they are a significant improvement over a simple average test score (or, worse yet, the percentage of students that score above an arbitrary “proficiency” threshold). These measures can be adjusted for student background characteristics. However, such adjustments are particularly challenging in the case of virtual schools, because their students may be less likely to participate in some of the programs that are used to measure student backgrounds, such as the federal lunch program.</p>
<p>In addition to using poor performance measures, the NEPC report makes highly questionable comparisons between K12 students and all students in the same state. Parents don’t choose between a virtual school and any school in the state, but rather between a virtual school and the schools in the vicinity of where they live. A credible measure of the effectiveness of a virtual school would compare the achievement growth of students at that school to the performance of students in the schools those students would have attended otherwise. These comparison schools may look very different from the average school in the state, especially if families are most likely to choose the virtual option when their traditional options are unsatisfactory.</p>
<p>Measures of school performance based on carefully constructed comparisons of student achievement growth, and other important outcomes, such as high-school graduation and college enrollment rates, require student-level data that are not publicly available. Most states now have such information in their longitudinal databases, but no published studies have used these data to compare the achievement growth of students at virtual schools with demographically similar students at carefully selected comparison schools.</p>
<p>Research that painstakingly tries to separate out the actual effects of schools clearly has value, but it is important to bear in mind that, in the absence of random assignment of students to schools (such as occurs via charter school lotteries), families that choose for their children to be educated in their home (through virtual schools) are likely to be very different from other families. The parents of virtual-school students need to provide (or arrange for) supervision of their children during the school day. These families may use virtual schools as a form of home-schooling, or as a way to provide stability for students whose parents frequently relocate, for example.</p>
<p>Assembling descriptive information about the students attending virtual schools is a necessary first step to designing such careful comparisons. The NEPC report provides some basic demographic information, such as race/ethnicity, and data on participation in programs, such as free and reduced-price lunch and special education. These data are a useful starting point, but may be confounded by comparisons to statewide averages instead of to the other schools in these students’ neighborhoods as well as the differences in program participation discussed earlier. A useful addition would be data based on surveys of parents with children enrolled in virtual schools and in their brick-and-mortar counterparts.</p>
<p><strong>Comparing Finances</strong></p>
<p>The NEPC report presents information comparing the finances of a subset of K12-operated schools with other schools in the same states, but it is hard to interpret the spending data without good information on the performance of K12 schools. If a rigorous study found that K12 schools produced equivalent (or superior) learning outcomes to traditional schools, then it would be useful to determine whether the virtual schools were able to achieve the same (or better) outcomes at lower costs. But the NEPC report contains no information that can be used to accurately measure the effect of K12 schools on how much their students learn.</p>
<p>The comparison of specific categories of expenditures is also difficult to interpret, in large part due to the fundamentally different instructional and operational models of virtual and brick-and-mortar schools. It is misleading to refer to all differences in spending as “cost advantages and disadvantages,” when many of them reflect choices made by schools. The unsurprising fact that virtual schools do not spend much on transportation or food services likely reflects a true cost advantage of the virtual model. But differences in spending on teacher salaries as compared to student support services are not necessarily cost advantages or disadvantages, but rather decisions made by the school.</p>
<p>Describing differences in expenditures in this way is also confounded by differences in the overall amounts of funding provided to virtual and traditional schools. Unless states’ school-finance formulas are perfectly calibrated to reflect costs, variations in spending between groups of schools will reflect both differences in costs and differences in available funding. Describing reduced spending on various categories of expenditures as cost advantages when overall spending levels differ is like telling a poor person that he has a “cost advantage” relative to a wealthier individual.</p>
<p>Describing the different models of education offered by virtual and traditional schools, and the implications for different categories of costs, would certainly be a useful endeavor. For example, how much can student-teacher ratios be increased, and at what cost savings, by leveraging technology in the virtual education model? But the NEPC report’s conclusion that virtual schools have a cost advantage because they spend less money, when they receive less money, is simply a tautology. The publicly available data do not allow one to calculate the profits made by for-profit education providers such as K12.</p>
<p>The NEPC report recommends that schools be provided with funding based on the costs of educating students. This is sensible to the degree that funding is adjusted to reflect the challenge of educating certain kinds of students, such as those with special needs. But a broader policy that ties funding to costs creates perverse incentives for schools to drive up costs in order to increase their public funding. A better solution is to provide the same allocation to all schools that serve similar student populations, and then allow them to compete on quality. If parents can choose among schools and new schools can enter the market, then schools that provide a subpar education in order to increase profits would be driven from the market by higher-quality providers.</p>
<p><strong>Policy Implications</strong></p>
<p>Full-time virtual schools, in which students learn primarily from their own homes, clearly are not for everyone. Even after their recent enrollment growth, only one-half of 1 percent of public-school students in the U.S. attend full-time virtual schools. The key question for policymakers is whether virtual schools should be among the choices available to families deciding how best to educate their children. The NEPC report argues they should not be, calling for states to “slow or put a moratorium on the growth of full-time virtual schools.” But policymakers only control the growth of enrollment in virtual schools when they decide whether or not to allow them to exist and what cap, if any, to put on their enrollments. Once those decisions are made, enrollment in virtual schools is mostly up to parents.</p>
<p>The success or failure of virtual schools therefore depends on the ability of policymakers and parents to evaluate their quality. Policymakers need to know whether a given virtual school meets some minimum standard so as to be acceptable as a choice for parents dissatisfied with their traditional options. Parents need to have information on which to base decisions about what school is best for their child. It is simply not possible to make these sorts of decisions with the data in the NEPC report. For example, the report tells us that 70 percent of 8th-grade students at K12-operated schools met proficiency standards in reading, as compared to 77 percent in all public schools in the same states. But we have no idea what the scores are at the neighborhood schools of the K12 students, much less what the actual effect is of attending one school or another.</p>
<p>The NEPC report gets one important point right: the need for better information on school quality, especially when it comes to nontraditional schools. Acknowledging that some of the measures it uses to judge the quality of K12 schools are “inadequate or inappropriate,” the report calls for states to develop new and better instruments. Some states, such as Florida, already incorporate measures of student learning growth into their accountability metrics. But much more sophisticated measures will be needed to allow policymakers and parents to adequately judge the quality of the expanding diversity of education options.</p>
<p><em>Matthew Chingos is a fellow in the Brookings Institution’s Brown Center on Education Policy.</em></p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=49651993&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/questioning-the-quality-of-virtual-schools/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Florida Defeats the Skeptics</title>
		<link>http://educationnext.org/florida-defeats-the-skeptics/</link>
		<comments>http://educationnext.org/florida-defeats-the-skeptics/#comments</comments>
		<pubDate>Tue, 14 Aug 2012 04:02:58 +0000</pubDate>
		<dc:creator>Marcus Winters</dc:creator>
				<category><![CDATA[Check the Facts]]></category>
		<category><![CDATA[Journal]]></category>
		<category><![CDATA[Standards, Testing, and Accountability]]></category>
		<category><![CDATA[check the facts]]></category>
		<category><![CDATA[Florida]]></category>
		<category><![CDATA[NAEP]]></category>

		<guid isPermaLink="false">http://educationnext.org/?p=49649617</guid>
		<description><![CDATA[Test scores show genuine progress in the Sunshine State]]></description>
			<content:encoded><![CDATA[<p><em><a href="http://educationnext.org/files/ednext_20124_Winters_Opener.jpg"><img class="alignright size-full wp-image-49649618" style="float: right;padding-top: 5px;padding-bottom: 5px;padding-left: 5px" src="http://educationnext.org/files/ednext_20124_Winters_Opener.jpg" alt="" width="345" height="263" /></a><strong>Florida’s gains in reading and math achievement, as measured by the National Assessment of Educational Progress<br />
</strong></em><strong>Checked by Marcus A. Winters</strong></p>
<p>Among the 50 states, Florida’s gains on the National Assessment of Educational Progress (NAEP) between 1992 and 2011 ranked second only to Maryland’s (see &#8220;<a href="http://educationnext.org/is-the-us-catching-up/">Is the U.S. Catching Up?</a>&#8221; <em>features</em>, Fall 2012). Florida’s progress has been particularly impressive in the early grades. In 1998, Florida scored about one grade level below the national average on the 4th-grade NAEP reading test, but it was scoring above that average by 2003, and made further gains in subsequent years (see Figure 1). Scores on Florida’s own state examinations revealed an equally dramatic upward trend.</p>
<p>Many have cited the series of accountability and choice reforms that Florida adopted between 1998 and 2006, under the leadership of Governor Jeb Bush, as the driving force behind the large and rapid improvement in student achievement (see “<a href="http://educationnext.org/advice-for-education-reformers-be-bold/" target="_blank">Advice for Education Reformers: Be Bold!</a>” <em>features</em>, Fall 2012). Others have insisted that Florida’s NAEP scores do not represent true improvements in student reading achievement. Boston College professor Walter Haney, for example, argues that the scores are “dubious” and “highly misleading.” He contends that it is “abundantly clear” that Florida’s aggregate test-score improvements are a mirage caused by changes in the students enrolled in the 4th grade after the state began holding back a large number of 3rd-grade students in 2004 (all school years are reported by the year in which they ended). His argument has been touted by other researchers, most notably by some at the National Education Policy Center, and it has been cited in testimony presented before state legislatures considering the adoption of Florida-style reforms.</p>
<p>It is certainly true, as Haney has said, that one of the Florida reforms was to curtail social promotion of underachieving students from 3rd to 4th grade. In most school districts, students who do not warrant promotion on academic grounds move on to the next grade regardless, because many educators believe that keeping students with their peer group is desirable. But in Florida, those students who completed 3rd grade in the spring of 2003 and since have had to meet a minimum threshold on the Florida Comprehensive Assessment Test (FCAT) reading examination in order to be promoted to the 4th grade, unless they receive a special waiver. As a result, the percentage of students retained in 3rd grade increased substantially. In the two years prior to the policy change, only 2.9 percent of 3rd-grade students were retained, while in the two years following the policy’s implementation, 11.7 percent of Florida’s 3rd-grade students were told they had to remain in the same grade for the coming year.</p>
<p>Haney and others have concluded that this policy change artificially drove up 4th-grade test scores, because it removed from the cohort of students tested those who were retained in 3rd grade, the very students most likely to score the lowest on standardized tests. Although the point would seem to be well worth considering, it has not been subjected to serious empirical analysis. Does the holding back of the lowest-performing students in 3rd grade explain all the 4th-grade gains in Florida, as Haney contends? Does it explain some of the gains? Or none at all? The best way to answer the question is to look at changes in student test-score performance among those in 3rd grade for the first time, as their test scores are unaffected by the retention policy. If the gains observed for 4th graders were a function of differences in the type of students entering that grade due to the retention policy, then the performance of those entering 3rd grade should look essentially the same after 2002 as it did before the retention policy was put into place.</p>
<p>Drawing on information on student performance available from the Florida Department of Education, I was able to analyze test-score trends of students enrolled in the 3rd grade for the first time. I find that the gains among initial 3rd graders were not as dramatic as those shown on the 4th-grade NAEP, thereby suggesting that the 4th-grade scores did create the appearance of steeper achievement growth than actually took place. Nonetheless, the gains among initial 3rd graders were very substantial, about 0.36 standard deviations between 1998 and 2009, and more than enough to justify Florida’s claims that its gains have outpaced those in most other states.</p>
<p><strong>Reading Test Scores for 3rd Graders</strong></p>
<p><a href="http://educationnext.org/files/ednext_20124_Winters_fig1.jpg"><img class="alignright size-full wp-image-49649619" style="float: right;padding-top: 5px;padding-bottom: 5px;padding-left: 5px" src="http://educationnext.org/files/ednext_20124_Winters_fig1.jpg" alt="" width="460" height="546" /></a>I first analyze changes over time in the FCAT test scores of students in their initial 3rd-grade year in order to discern the extent to which Florida’s elementary-school students made true achievement gains during the period in question. Because the state has not yet identified students for retention, the test scores of students the first time they are in the 3rd grade are not affected by any change in the student cohort resulting from the retention policy.</p>
<p>The administrative data set for the State of Florida contains individual test scores and demographic information for the universe of test-taking students in grades 3 through 10 in Florida from 2001 through 2009. The data set includes a unique student identifier, which allows me to follow the progress of each student over time and to determine which students have been retained.</p>
<p>Figure 2 shows the changes since 2001 in the performance of students at the 25th, 50th, and 75th percentiles in their initial 3rd-grade year. The figure documents clear positive movement across the test-score distribution for the first cohort of students that needed to reach a minimal score on the FCAT exam in order to be promoted from the 3rd to the 4th grade (2003). The achievement distribution makes another leap forward the following year (2004), which was the first year that began with a sizable number of retained students due to implementation of the policy. Student achievement continued to grow in subsequent years.</p>
<p>The test-score improvements shown on the figure are substantial. By 2009, the median reading test score of students in their initial 3rd-grade year had improved by more than one-third of a standard deviation since 2001, as had nearly all points on the distribution. A gain of this magnitude amounts to roughly a full year of academic progress for students in the early elementary grades. The test-score gains among the state’s lowest-performing students were even more impressive; for instance, students at the 10th percentile improved by more than half a standard deviation. The gains made by initial 3rd-grade students on the math exam are even larger than the gains in reading at all points on the distribution.</p>
<p>The results do suggest, however, that the aggregate test scores on the 4th-grade NAEP could well be inflated by the retention policy. The improvement in the median reading score for those students entering 3rd grade is smaller than the NAEP increase for 4th graders over the same time period. Even so, the 3rd-grade gains remain noteworthy enough to substantiate the basic claims of those who praise the Florida track record.</p>
<p><strong>Rescaling NAEP Reading Scores </strong></p>
<p><a href="http://educationnext.org/files/ednext_20124_Winters_fig2.jpg"><img class="alignright size-full wp-image-49649620" style="float: right;padding-top: 5px;padding-bottom: 5px;padding-left: 5px" src="http://educationnext.org/files/ednext_20124_Winters_fig2.jpg" alt="" width="460" height="789" /></a>To assess how well Florida performed relative to the rest of the nation, one can use the results for initial 3rd-grade students on the FCAT to rescale the state’s 4th-grade scores on the NAEP reading exam. The rescaling assumes that test-score improvements on the FCAT for cohorts in their initial year as 3rd graders are a good proxy for gains in reading achievement made by Florida’s students in the next elementary grade. Though imperfect, this assumption is justified to the extent that most consider 4th-grade NAEP scores to be an assessment of overall elementary-school performance.</p>
<p>Because Florida did not participate in the NAEP in 2000, I use as the state’s baseline score its median score on the 4th-grade NAEP reading exam in 1998. Thus, I also assume that the state made no meaningful gains in 4th-grade reading between 1998 and 2000 that would have shown up on NAEP, which squares with the scores on the state’s own reading assessment. I then use the improvements of the median reading test score for initial 3rd-grade students on the FCAT since 2001 in order to rescale the state’s mean NAEP test score in the spring of the same year.</p>
<p>In addition to providing the originally reported NAEP score trend in median scores between 1998 and 2009 for Florida and the United States as a whole, Figure 1 shows the rescaled trend in Florida after making the adjustment described above. The first class affected by the retention policy entered the 4th grade during the 2004 school year, and thus the first NAEP score that could have been influenced by the exclusion of low-performing students from the 4th-grade NAEP sample was the spring 2005 administration.</p>
<p>The figure shows that Florida’s reading gains prior to the introduction of the policy were actually larger on the NAEP than on the FCAT. Such a difference cannot be explained by the retention policy, because students had not yet been retained. After introduction of the policy, Florida’s achievement on the state exam after accounting for sample selection increased between 2003 and 2005 in a way that did not show up on the NAEP scores. But the state’s NAEP scores quickly caught up to the FCAT performance. Adjusting the state’s NAEP scores for sample selection in 2007 and 2009 leads to a decrease in the state’s performance of about 0.07 and 0.08 standard deviations, respectively. However, Florida’s adjusted median score remains above the median score for all U.S. public-school students, and it continues to show substantial improvements relative to the prior decade.</p>
<p>Even after the adjustment, Florida’s students still made larger gains in reading than did the rest of the nation. The national gain, at 7 points (or about 0.19 standard deviations), was only slightly larger than half Florida’s rate. Prior to the adjustment, only Washington, D.C., made larger gains on the 4th-grade NAEP reading exam during this period. After the adjustment, only D.C. and Delaware made a larger test-score improvement.</p>
<p><strong>What Reforms Might Have Produced the Reading Gains? </strong></p>
<p>Putting a finger on exactly which policy changes produced the test-score improvements is remarkably difficult, because the state adopted a wide array of policies that may have had a beneficial effect. It is possible, however, to rule out some potential candidates.</p>
<p>For example, some have noted the state’s participation in the federal Reading First program, in which public schools received grant money to implement instructional and assessment tools. Florida also supplemented the Reading First grants with its own financing of reading coaches for schools across the state. The data clearly show, however, that any additional test-score gains made by schools that participated in Reading First or had reading coaches were far too small to explain the substantial improvements observed on both the NAEP and the FCAT.</p>
<p>Others have found it tempting to argue that the state’s constitutional amendments to reduce class size and provide universal pre-kindergarten services—both of which could have a sustained positive effect on young kids—are the most likely driver of the gains. Perhaps those reforms will prove effective. The 3rd-grade class of 2003, for which the large gains begin, however, was subject to neither policy.</p>
<p>Current research findings for the accountability and choice reforms adopted by Florida during this time period also appear insufficient to explain such large test-score improvements. Florida assigned letter grades—A, B, C, D, and F—to schools based on their performance on the FCAT. It put into place a school voucher program for students who were attending schools that received the grade of F twice in a row. A tax credit provided scholarships for low-income students. Studies of all these programs have shown that each had a positive effect. And studies have also shown that the retention policy has a positive impact on the performance of students who were retained. Though each of these policies has been tied to student test-score improvements, either the effect size was too small or the policy affected too few students to alone account for the substantial test-score improvements seen on the NAEP and FCAT.</p>
<p><strong>Conclusion</strong></p>
<p>The evidence presented here shows that Florida’s elementary-school students did in fact make large improvements in reading proficiency in the 2000s. As critics contend, the state’s aggregate test-score improvements on the 4th-grade FCAT reading exam—and likely on the NAEP exam as well—are inflated by the change in the number of students who were retained in 3rd grade in accordance with the state’s new test-based promotion policy. Large test-score improvements are also observed, however, among students whose scores were not influenced by changes in the sample selected.</p>
<p>Though somewhat smaller than what is apparent on the NAEP test, the portion of Florida’s reading test-score improvements during this time period that cannot be attributed to changes in the sample of students tested due to the retention policy is nonetheless substantial. Identifying the causes of these improvements remains an important task for future research.</p>
<p><em>Marcus A. Winters is senior fellow at the Manhattan Institute’s Center for State and Local Leadership and assistant professor at the University of Colorado–Colorado Springs.</em></p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=49649617&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/florida-defeats-the-skeptics/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Neither Broad Nor Bold</title>
		<link>http://educationnext.org/neither-broad-nor-bold/</link>
		<comments>http://educationnext.org/neither-broad-nor-bold/#comments</comments>
		<pubDate>Wed, 15 Feb 2012 18:00:27 +0000</pubDate>
		<dc:creator>Paul E. Peterson</dc:creator>
				<category><![CDATA[Check the Facts]]></category>
		<category><![CDATA[Homepage]]></category>
		<category><![CDATA[Journal]]></category>
		<category><![CDATA[Bolder Approach to Education]]></category>
		<category><![CDATA[helen ladd]]></category>
		<category><![CDATA[The Broader]]></category>

		<guid isPermaLink="false">http://educationnext.org/?p=49647328</guid>
		<description><![CDATA[A narrow-minded approach to school reform]]></description>
			<content:encoded><![CDATA[<p><strong><em>Helen Ladd, Presidential Address to the Association for Public Policy Analysis and Management in Washington, D.C., November 4, 2011.</em></strong></p>
<p><strong> </strong></p>
<p><strong>Checked by Paul E. Peterson</strong></p>
<p><a href="http://educationnext.org/files/ednext_20123_CTF_opener.jpg"><img class="alignright size-full wp-image-49647330" style="float: right; padding-top: 5px; padding-bottom: 5px; padding-left: 5px;" src="http://educationnext.org/files/ednext_20123_CTF_opener.jpg" alt="" width="360" height="325" /></a>Children raised in families with higher incomes score higher on math and reading tests. That is no less true in the Age of Obama than it was in the Age of Pericles or, for that matter, in the Age of Mao. But is parental income the <em>cause</em> of a child’s success? Or is the connection between income and achievement largely a symptom of something else: genetic heritage, parental skill, or a supportive educational setting?</p>
<p>The Broader, Bolder Approach to Education, a coalition of education professors and interest-group leaders, including the heads of the country’s two largest teachers unions, have concluded that family income itself determines whether or not a child learns. In the first paragraph of its mission statement, the coalition claims that it has identified “a powerful association between social and economic disadvantage and low student achievement.”</p>
<p>“Weakening that link,” the Broader, Bolder group goes on to say, “is the fundamental challenge facing America’s education policy makers.” For this group, poverty and income inequality, not inadequate schools, are the fundamental problem in American education that needs to be fixed. Other possible approaches to improving student achievement—school accountability, school choice, reform of the teaching profession—are misguided, counterproductive, and even dangerous. The energy now being wasted on attempts to enhance the country’s education system should be redirected toward a campaign to either redistribute income or expand the network of social services.</p>
<p>The Broader, Bolder platform has won the wholehearted support of the country’s teachers unions. But it’s much to the credit of the current U.S. secretary of education, Arne Duncan, that he has carefully kept his distance, insisting instead on accountability, choice, and teacher policy reforms that the Broader, Bolder group finds dispensable.</p>
<p>Inasmuch as the Broader, Bolder movement can be expected to gather steam in an election year, especially given the success of Occupy Wall Street and the “1 percent” campaign, it is worth giving attention to the scholarly foundation on which its claims rest. That is best done by looking closely at the presidential address given before the Association for Public Policy Analysis and Management by one of the coalition’s cochairs, Helen Ladd, a Duke University professor, which she summarized in a December 2011 op-ed piece published in the <em>New York Times</em>.</p>
<p><strong>The Platform</strong></p>
<p>The central thesis of the Ladd presidential address is certainly sweeping and bold: The income of a child’s family determines his or her educational achievement. Those who come from low-income families learn little because they are poor. Those who come from prosperous families learn a lot because they are rich. Her solution to the nation’s education woes is almost biblical. According to St. Matthew, Jesus advised the rich man to “Sell what you possess and give to the poor.” Not quite as willing as St. Matthew to rely on the charitable instinct, Ladd modifies the biblical injunction by asking for government intervention to make sure the good deed happens. But she is no less confident than Matthew that wonderful things will happen when the transfer of wealth takes place. Once income redistribution occurs, student achievement will reach a new, higher, and more egalitarian level. Meanwhile, any attempt to fix the schools that ignores this imperative is as doomed to failure as the camel that struggles to pass through the eye of a needle.</p>
<p>Of course, Ladd does not put it quite that bluntly. But her meaning is clear enough from what she does say: education reform policies “are not likely to contribute much in the future—to raising overall student achievement or to reducing [gaps in] achievement.”</p>
<p>The “logical policy response,” she continues, “would be to pursue policies to reduce the incidence of poverty…. That might be done, for example, through macro-economic policies designed to reduce unemployment, cash assistance programs for poor families, tax credits for low wage workers, or or an all-out assault ‘war on poverty.’”</p>
<p>Ladd is particularly enthusiastic about her approach “given the current high unemployment rates and also the dramatic increase in income inequality in this country since the 1970s.”</p>
<p>She continues, “Many considerations…make a compelling case for the country to take strong steps to reduce income inequality.”</p>
<p>Though income redistribution is the preferred option, Ladd decides it is not politically feasible. “Such a policy thrust is not in the cards, at least in the near term…unless the current protests in New York City and elsewhere…[put] income inequality back on the policy agenda.” In the meantime, the best course of action is for the government to fund a host of new services for the poor.</p>
<p><strong>Why do the better-off have higher-performing children?</strong></p>
<p>Key to Ladd’s case is a graph that shows a correlation between family income and student achievement in 14 industrialized nations. To no one’s surprise, that graph shows that in every country students who come from higher-income families score higher on math and reading tests. But is the connection causal? Do some students do better than others because their parents earn more money? Or are the parents who make a better living also the ones who do a better job of raising their children?</p>
<p>In work published in 1997, Susan Mayer, former dean of the University of Chicago’s Harris School of Public Policy Studies, tried to answer this question by carrying out a variety of tests, each of them an attempt to see exactly how much changes in income directly affect student achievement. In one test, she looked at those on welfare who lived in states where welfare benefits were higher. She found little if any benefit for those children living in one-parent families. Overall, she found that the direct relationship between income and education outcomes varies between negligible and small.</p>
<p>In a 2011 Brookings Institution report, Julia Isaacs and Katherine Magnuson explored this topic by looking specifically at the impact of family income on child readiness for school, a primary concern of the Broader, Bolder coalition. The authors rely on recently collected data from a U.S. Department of Education survey of a representative sample of U.S. families that tracked children from birth to the year they entered school. They look at the impact of a host of family characteristics on school readiness and student achievement in the first year of school. When they calculate the simple correlation between income and math achievement, Helen Ladd’s approach, they find that a $4,000 increment (a 50 percent increase in the $8,000 average income reported by the families in this study) in the income of the poor family will lift student achievement by 20 percent of a standard deviation (close to a year’s worth of learning in the middle years of schooling), a substantial impact that seems to support the Broader, Bolder claims. But when the authors adjust for other factors—race, mother’s and father’s education, single or two-parent family, smoking during pregnancy, and so forth—the distinctive impact of family income on math achievement drops to just 6.4 percent of a standard deviation. It is better than  twice as important for achievement that children living in a low-income family have a mother with a high school diploma (as compared to one without the diploma) than that the family has 50 percent more income.</p>
<p><strong>Is it absolute income or relative income that counts?</strong></p>
<p>Ladd claims that Finland, Canada, and the Netherlands have higher student performance because they have fewer children living in poverty. To arrive at this conclusion, she excludes the value of medical programs and other government services, the very items that later become part of her policy agenda. This is no small matter, as the U.S. poverty rate in 2003 was just 8.1 percent if those items are included, 23 percent less than the officially reported 10.5 percent poverty rate for that year (which fails to take into account food stamps, Medicaid, school lunch programs, earned income credits, and other cash transfers). In addition, Ladd defines poverty in relative, not absolute, terms. Anyone is poor if he has an income more than 1 standard deviation below the average. With that definition, she decides that only 4 percent of the children in Finland live in poverty compared to 20 percent of the children in the United States, despite the fact that average income in the U.S. is a third higher than it is in Finland.</p>
<p>Of course, one could also conclude that Finland’s rising test-score performance is due to the growing income gap in that country. In 2008, the Organisation for Economic Co-operation and Development (OECD) reported that “the gap between rich and poor has widened more in Finland than in any other wealthy industrialized country over the past decade.” When one picks out stray facts from a country one likes, anything goes.</p>
<p>Using the sociologist’s relative definition of poverty, and not the absolute definition used by ordinary people, fits the Broader, Bolder agenda. The point is not to provide opportunities for the poor but to equalize wealth across society as a whole. Never mind if everyone, rich or poor, ends up with less.</p>
<p><strong>Do changes between 1940 and 2000 explain the larger achievement gap? </strong></p>
<p>Drawing on a study by Stanford education professor Sean Reardon, Ladd says that the gap in reading achievement between students from families in the lowest and highest income deciles is larger for those born in 2001 than for those born in the early 1940s. She suspects it is because those living in poor families today have “poor health, limited access to home environments with rich language and experiences, low birth weight, limited access to high-quality pre-school opportunities, less participation in many activities in the summer and after school that middle class families take for granted, and more movement in and out of schools because of the way that the housing market operates.”</p>
<p>But her trend data hardly support that conclusion. Those born to poor families in 2000 had much better access to medical and preschool facilities than those born in 1940. Medicaid, food stamps, Head Start, summer programs, housing subsidies, and the other components of Johnson’s War on Poverty did not become available until 1965. Why didn’t those broad, bold strokes reduce the achievement gap?</p>
<p>What has changed for the worse during the intervening period is not access to food and medical services for the poor but the increment in the percentage of children living in single-parent households. In 1969, 85 percent of children under the age of 18 were living with two married parents; by 2010, that percentage had declined to 65 percent. According to sociologist Sara McLanahan, income levels in single-parent households are one-half those in two-parent households. The median income level of a single-parent family is just over $27,000 (in 1992 dollars), compared to more than $61,000 for a two-parent family. Meanwhile, the risk of dropping out of high school doubles. The risk increases from 11 percent to 28 percent if a white student comes from a single-parent instead of a two-parent family. For blacks, the increment is from 17 percent to 30 percent, and for Hispanics, the risk rises from 25 percent to 49 percent. In other words, a parent who has to both earn money and raise a child has to perform at a heroic level to succeed.</p>
<p>A better case can be made that the growing achievement gap is more the result of changing family structure than of inadequate medical services or preschool education. If the Broader, Bolder group really wanted to address the social problems that complicate the education of children, they would explore ways in which public policy could help sustain two-parent families, a subject well explored in a recent book by Mitch Pearlstein (<em>Shortchanging Student Achievement: The Educational, Economic, and Social Costs of Family Fragmentation</em>) but one that goes virtually unmentioned in the Ladd report.</p>
<p><strong>Why do states differ?</strong></p>
<p>Ladd tells us that states that have a high poverty rate—for example, Mississippi, Arkansas, Alabama, and Louisiana—have lower math and reading scores than states with low poverty rates, such as New Hampshire, Connecticut, Massachusetts, Utah, and Maryland. While Ladd comes close to saying that high state poverty rates produce low achievement, the opposite connection is more plausible. The New England states and Utah have the lowest child-poverty rates because the commitment to education in those states has deep historical and cultural roots, and the families in those states are more likely to remain intact. Meanwhile, the southern parts of the United States all but closed the school doors to African Americans and only opened them a small crack for all but well-to-do white students throughout most of the 19th century, and even well into the 20th. It’s easier to make the case that the wide range in educational opportunity and achievement among the states in the not-too-distant past is the cause—not the consequence—of the variation in state poverty rates today.</p>
<p>Even in contemporary America, the places that have strong education systems tend to attract business, industry, and a skilled workforce. Where high-quality schools are abundant, incomes are generally high and poverty low. If a state is well endowed with human capital, its citizens are prosperous and its students will be learning at school. Does anyone believe that the federal government could reverse Connecticut’s and Alabama’s places on the student achievement scale if it took the money from the Constitution State and gave it to the Heart of Dixie?</p>
<p>Of course, we are not making the claim that the quality of a state’s schools is the only thing that affects poverty levels. Economic life is too complex to be reduced to any single factor. No matter what the Broader, Bolder group says, any inference that might be drawn from a simple correlation between achievement and poverty is problematic.</p>
<p>Perhaps recognizing the weaknesses in her case, Ladd tries to bolster it by correlating changes in achievement with changes in the child poverty rate within states. She finds that in recent years a 1 percentage point increase in the poverty rate reduces achievement by about .03 standard deviations. But she does little to control for other factors that may be changing at the same time. If single-parent households in a state are increasing, they could be adversely affecting student achievement and child poverty rates simultaneously. And if the state economy is sliding, talented, eager workers might be moving elsewhere and leaving behind the less ambitious, who are likely to be those with low-achieving children. In other words, any simultaneous shift in poverty rates and achievement is likely to be the result of a third factor that affects both simultaneously. Even the most devoted Broader, Bolder fan can hardly claim that a child’s test scores bounce up and down with the number of bills in Daddy’s pocket.</p>
<p><strong>Why do people deny the poverty reality and claim that schools can teach poor students?</strong></p>
<p>Ladd is so confident of her data that she attacks as deniers those who question a strong correlation between income and achievement. “Can anyone credibly believe that the mediocre overall performance of American students on international tests is unrelated to the fact that one-fifth of American children live in poverty?” she asks in her <em>New York Times</em> essay. Well, yes, they can. Even if we compare with <em>all</em> students in other countries the math performance of only those U.S. students from families where one parent has a college degree, the U.S. ranks 19th among the nations of the world who took the 2006 Program for International Student Assessment (PISA) test; just 10 percent of students from college-educated families performed at the advanced level. More than 20 percent of <em>all</em> Koreans and Finns do that well, as do 15 percent of <em>all</em> Canadians. Surely, those telling facts about the state of American math education cannot be attributed simply to child poverty.</p>
<p><strong>Attacking the Reforms </strong></p>
<p>But if poverty is the Broader, Bolder whip, the horses to be flogged are those pulling the school reform chariot: not to get them to run faster but to punish them for their efforts. School reformers, she says, have been recklessly trying to improve education “by better use of information and incentives.”</p>
<p>She objects to the “no excuses” approach to education, which expects strong performance from students regardless of family background, saying that the few schools that are able to accomplish the task are unusual places filled with kids from families with especially devoted parents. She criticizes George W. Bush for worrying about the “soft bigotry of low expectations.” That kind of talk goes “a long way toward explaining why No Child Left Behind has not worked,” she says, overlooking the fact that gains in math and reading since its passage have amounted to 8 percent of a standard deviation, with even larger gains among minority students (see “<a href="http://educationnext.org/grinding-the-antitesting-ax/" target="_blank">Grinding the Antitesting Ax</a>,” <em>check the facts</em>, Spring 2012).</p>
<p>Ladd condemns the use of test-score information for the purpose of evaluating and compensating teachers. “Extensive research shows that…valid and reliable measures of teacher effectiveness,” have yet to be generated, she says, blithely putting on ignore important work by Thomas Kane, Eric Hanushek, and Raj Chetty and his colleagues, which shows that students learn in any given year somewhere between 10 and 20 percent of a standard deviation more if they have an especially effective teacher rather than a very ineffective one.</p>
<p>Ignoring the potential impact that would accompany the recruitment and retention of more-effective teachers, Ladd condemns merit-pay policies based on student test performance on the grounds that such policies “provide…incentives for [school officials] to narrow the curriculum to the tested subjects of math and reading, and to direct teacher attention to basic skills away from student reasoning skills.” Even worse, it leads to “unfair and arbitrary treatment of teachers.” Once schools “place heavy weight on student test scores” they are “likely to do more harm than good.” One can hear the applause ringing out in union halls across the country.</p>
<p>Charter schools are rejected because that they constitute merely a “governance change” that “ignores the educational challenges facing disadvantaged children.” She worries that such schools are “draining funds from the traditional public schools,” even though there is not a single state that takes money away from public schools unless a child leaves them for a school the parent prefers. Ladd apparently thinks public schools should receive money whether or not they have students.</p>
<p><strong>What Is to Be Done?</strong></p>
<p>Eschewing all school reforms, and conceding that the rich cannot be robbed quite yet, what does Ladd actually want to do? When we turn to her practical agenda, we can see just how important the teachers unions are to the Broader, Bolder coalition: most of the key reforms Ladd proposes have nothing to do with ending poverty in any direct way, but instead are directed toward employing more professionals for tasks outside the regular K–12 classroom:</p>
<p>Establish preschool programs. Though she admits the evidence on the effectiveness of Head Start and other large-scale preschool programs is disappointing, she calls for their expansion. Yet the poor already have better access to government-funded preschool programs than other families do. If this were the solution to the achievement gap, we would already be well on our way.</p>
<p>Expand school-based health clinics and social services. Ladd wants to hire a vast new number of “school nurses, social welfare counselors and teachers” who would “meet on a regular basis to discuss and address the challenges of individual children,” as if that were not already part and parcel of the special education program into which 15 percent of school-age students already are placed. If that program has not borne fruit, why would its expansion do anything other than provide more adult employment?</p>
<p>Establish quality afterschool and summer programs. Rather than fix the regular day school, Ladd would have the United States pour its energy into programs that would extend the days and hours that children are in school. Although she admits that “research shows&#8230;that marginally expanding in-school time without improving how that time is used does not improve learning” she is confident that “high intensity summer programs” can do the job, as if any such program could be brought to scale.</p>
<p>Provide high-quality schools for disadvantaged students. “Children in schools serving large proportions of disadvantaged students “ must “have access to high quality teachers, principals, supports for students, and other resources, and…schools” must “be held accountable for the quality of their internal processes and practices.” Ladd plans to hold these schools accountable while at the same time ending the “obsession with test-based outcome measures” by making sure that every school has a certified teacher, shifting good teachers to schools teaching disadvantaged students (without telling us how to identify those teachers), and looking at the total climate of a school, not just its test scores, when deciding whether it is effective.</p>
<p>Eliminate No Child Left Behind. “In its place the federal government should implement strategies designed to help state and local governments address in a more constructive and positive manner the educational needs of low SES children.” Just exactly how schools themselves are to do this is left unsaid.</p>
<p>In sum, the Broader, Bolder platform is narrow, niggling, naïve, and negligible. Contrary to Ladd’s claims, the unique effects of family income on student achievement are only modest, less than the effects of many of the education reforms Ladd regards as inadequate or worse. Most of the proposals to lift student achievement offered by Ladd and her Broader, Bolder colleagues ignore the many hours children spend at school, proposing instead a potpourri of noneducational services; those services that do have an educational component are to be offered either to preschoolers or to students during their summer vacation or after school. Such initiatives will increase the number of unionized workers in the public sector, but they have never been shown to have more than modest effects on student achievement. They promise little hope of stemming the rising number of single-parent families, a major contributor to both child poverty and low levels of student performance. If reducing poverty and lifting student achievement are the goals, dollars would be better allocated by cutting the taxes on earned income paid by two-parent, working families with children.</p>
<p><em>Paul E. Peterson is director of the Program on Education Policy and Governance at Harvard University and senior fellow at the Hoover Institution.</em></p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=49647328&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/neither-broad-nor-bold/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Grinding the Antitesting Ax</title>
		<link>http://educationnext.org/grinding-the-antitesting-ax/</link>
		<comments>http://educationnext.org/grinding-the-antitesting-ax/#comments</comments>
		<pubDate>Mon, 21 Nov 2011 12:15:08 +0000</pubDate>
		<dc:creator>Eric A. Hanushek</dc:creator>
				<category><![CDATA[Check the Facts]]></category>
		<category><![CDATA[Homepage]]></category>
		<category><![CDATA[Journal]]></category>
		<category><![CDATA[No Child Left Behind]]></category>
		<category><![CDATA[Standards, Testing, and Accountability]]></category>
		<category><![CDATA[National Research Council]]></category>
		<category><![CDATA[NCLB]]></category>

		<guid isPermaLink="false">http://educationnext.org/?p=49645318</guid>
		<description><![CDATA[More bias than evidence behind NRC panel’s conclusions]]></description>
			<content:encoded><![CDATA[<p><strong><em>Incentives and Test-Based Accountability in Education<br />
</em></strong>A report from the National Research Council</p>
<p><strong> </strong></p>
<p><strong>Checked by Eric A. Hanushek</strong></p>
<p><a href="http://educationnext.org/files/ednext_20122_CTF_img1.jpg"><img class="alignright size-full wp-image-49645320" style="float: right; padding-top: 5px; padding-bottom: 5px; padding-left: 5px;" title="ednext_20122_CTF_img1" src="http://educationnext.org/files/ednext_20122_CTF_img1.jpg" alt="" width="230" height="303" /></a></p>
<p>The No Child Left Behind Act of 2001 (NCLB) was scheduled for reauthorization in 2007, and its future has in recent months garnered renewed attention. Yet so far, Congress has found it impossible to reach sufficient consensus to update the legislation, as competing groups want to a) keep all the essential features of the current law as a way of maintaining the pressure on schools to teach all students, b) modify the federal law by moving to a value-added or some alternative testing and accountability system, or c) eliminate federal testing and accountability requirements altogether, reverting to the days when the compensatory education law was simply a framework for distributing federal funds to school districts. Critics of NCLB’s testing and accountability requirements have a litany of complaints: The tests are inaccurate, schools and teachers should not be responsible for the test performance of unprepared or unmotivated students, the measure of school inadequacy used under NCLB is misleading, the tests narrow the curriculum to what is being tested, and burdens imposed upon teachers and administrators are excessively onerous.</p>
<p>But in all the acrimonious discussion surrounding NCLB, surprisingly little attention has been given to the actual impact of that legislation and other accountability systems on student performance. Now a reputable body, a committee set up by the National Research Council (NRC), the research arm of the National Academy of Sciences, has reached a conclusion on this matter. In its report, <em>Incentives and Test-Based Accountability in Education</em>, the committee says that NCLB and state accountability systems have been so ineffective at lifting student achievement that accountability as we know it should probably be dropped by federal and state governments alike. Further, the committee objects to state laws that require students to pass an examination for a high school diploma. There is no evidence that such tests boost student achievement, the committee says, and some students, about 2 percent, are not getting their diplomas because they can’t—or think they can’t—pass the test. The headline of the May 2011 NRC press release is frank and bold in the way committee reports seldom are: “Current test-based incentive programs have not consistently raised student achievement in U.S.; Improved approaches should be developed and evaluated.”</p>
<p>Needless to say, the report can be expected to play an important role in the continuing debate over NCLB. Upon its initial release, the report captured top billing, appearing on <em>Education Week</em>’s front page. Certainly, the NRC intends for the report to influence the NCLB conversation, rushing a draft version to the media five months before the completed report was available to the public.</p>
<p>Unfortunately, the NRC’s strongly worded conclusions are only weakly supported by scientific evidence, despite the fact that NRC’s stated mission is “to improve government decision making and public policy, increase public understanding, and promote the acquisition and dissemination of knowledge.”</p>
<p><strong>The Report</strong></p>
<p><a href="http://educationnext.org/files/ednext_20122_CTF_side.jpg"><img class="alignright size-full wp-image-49645322" style="float: right; padding-top: 5px; padding-bottom: 5px; padding-left: 5px;" title="ednext_20122_CTF_side" src="http://educationnext.org/files/ednext_20122_CTF_side.jpg" alt="" width="460" height="513" /></a></p>
<p>Reports from the NRC are generally treated as highly credible. The NRC convenes panels of outside experts who volunteer their time to provide consensus opinions on issues of policy significance. And this particular panel includes a number of especially qualified researchers (see sidebar). The committee chair, Michael Hout, is a member of the National Academy of Sciences; 7 of the 17 panel members have named professorships; 2 are deans (of law and education schools); and a majority have published articles about testing, accountability, or incentives.</p>
<p>When it comes to gathering together the general literature, both theoretical and empirical, on the use of incentives in various contexts, the committee’s work is solidly constructed. But this strong scientific discussion of theory and empirical analysis of incentives and accountability breaks down when it comes to the committee’s core purpose: evaluating accountability regimes in education that employ incentives and tests.</p>
<p>The report comes to two policy conclusions: NCLB and state accountability systems have proven ineffective and state-required high-school exams are counterproductive. The unequivocal presentation of the conclusions is clearly designed to leave little doubt in the minds of policymakers. When the underlying evidence is examined, however, it becomes apparent that neither conclusion is warranted. Instead of weighing the full evidence before it in the neutral manner expected of an NRC committee, the panel selectively uses available evidence and then twists it into bizarre, one might say biased, conclusions.</p>
<p><strong>Selecting Evidence</strong></p>
<p>To get a grasp of the bias that motivated the report’s authors, consider how its first conclusion is phrased:</p>
<blockquote><p>Test-based incentive programs, as designed and implemented in the programs that have been carefully studied, have not increased student achievement enough to bring the United States close to the levels of the highest achieving countries.</p></blockquote>
<p>Note especially that the conclusion does not say that there is no evidence that testing and accountability work. It says that testing and accountability, by themselves, cannot lift the United States to the level of accomplishment reached by the world’s highest-achieving countries, an extraordinary standard for evaluating a policy innovation. To catch up to the leading countries would require gains of at least half of a standard deviation, or roughly two years of learning (see “Are U.S. Students Ready to Compete?” <em>features</em>, Fall 2011). No individual reform on the public agenda—neither merit pay, class size reduction, salary jumps for teachers, nor Race to the Top—can claim or even hope for anything close to that level of impact. The appropriate question is not whether testing and accountability is a panacea, but whether it has proven worthwhile.</p>
<p>By that more appropriate standard of judgment, the committee’s own data indicate that testing and accountability have proven effective, if not quite the spectacular success promised by those who enacted NCLB into law. The committee report tells us that the average estimated impact of these interventions is 0.08 standard deviations of student achievement. In other words, the average student in a state without accountability would have performed at the 53rd percentile of achievement had that student been in a state with an accountability system, all other things being equal.</p>
<p>That estimate may well be too low. The report states that “our literature review is limited to studies that allow us to draw causal conclusions about the overall effects of incentive policies and programs,” and then it goes on to describe several types of studies that would be excluded by this criterion. Where does the 0.08 come from? The committee considers a review from 2008 of 14 studies, and 4 studies conducted after that review. The review presents an average impact of 0.08. The NRC committee apparently felt no need to look any further and ignored the fact that a majority of the 14 studies would not come close to meeting its standard of enabling a “causal conclusion.” The committee determines that one of the more recent studies also supports an estimate of 0.08, although that study’s authors prefer estimates that are much higher. The 14 earlier studies and the 4 later ones produce a wide distribution of estimated impacts, but the committee makes no attempt to investigate whether the unusual estimates suggest circumstances under which accountability seems particularly effective (or ineffective). The committee chooses to emphasize the studies with negative findings (10 percent) while downplaying a number of those that have positive findings (90 percent). Thus the NRC mantra, repeated with slightly different wording throughout the report: “Despite using them for several decades, policymakers and educators do not yet know how to use test-based incentives to consistently generate positive effects on achievement and to improve education.” Apparently, the inconsistent results heralded in the press release reflect the 10 percent of studies that differed from the overwhelming majority.</p>
<p><strong>Small Gains Add Up</strong></p>
<p>Let us put this concern aside and consider the increment in student performance of 0.08 standard deviations of individual achievement that the committee presents as its best estimate. Is that so small an effect that it cannot justify continuation of testing and accountability? Consider that this is the average effect of a program that has been implemented on a national scale, affecting students across the country. We are hard pressed to come up with <em>any</em> other education program working at scale that has produced such results. Moreover, these average gains are the result of accountability systems that many people believe have important flaws. Even larger gains might be expected if those flaws could be corrected, as many experts, though not the NRC panel, have suggested.</p>
<p>The estimated benefits from a 0.08 standard deviation gain in student performance vastly outweigh its estimated costs. The cost of designing, administering, grading, and reporting the results from statewide examinations have been estimated at between $20 and $50 per pupil, a trivial sum considering that per-pupil education expenditures in the United States run above $12,000 annually. Most reforms—including class size reduction, merit pay, across-the-board raises for teachers, in-service training programs, or the scaling up of charter schools—would cost many, many times as much. For these innovations to have the same kick for every dollar invested, results would have to be improbably large.</p>
<p>The NRC, instead of considering these actual costs, suggests that implicit costs in the form of narrowed curricula are the most important, but it provides no evidence for its view.</p>
<p>What might the economic impact of a 0.08 standard deviation improvement in average achievement nationwide be? Along with University of Munich professor Ludger Woessmann, I have estimated the impact on U.S. Gross Domestic Product (GDP) of higher levels of student achievement. These estimates project the historical pattern of growth to determine the result of gains in student achievement, calculate the additions to GDP over the next 80 years, and discount them back to today so that they are comparable to other current investments. A 0.08 improvement has a present value of some $14 trillion, very close to the current $15 trillion level of our entire GDP, and equivalent to $45,000 for every man, woman, and child in the U.S. today. In other words, an inexpensive program that affects every student nationwide can, over the long run, have a very large impact, even if its average effect seems at first glance to be quite small. Indeed, if we figured testing cost $100 per student each year for the next 80 years and we tested all students rather than the limited grades tested now, the rate of return on the investment would be 9,189 percent. Google investors would be envious.</p>
<p>Several omissions from the report are also noteworthy. The report gives only passing attention to the positive impact of NCLB on the education of the most disadvantaged students, a consequence of the requirement to report performance by specific subgroups (e.g., racial and ethnic groups and the economically disadvantaged). The NRC report’s main reference to this feature of current accountability systems is that consideration of subgroup performance has added analytical difficulties because of the smaller samples.</p>
<p>Perhaps more telling, this panel of experts on testing and incentives makes absolutely no effort to describe how accountability programs could be improved. Being good researchers themselves, they do favor continued research on testing, however, and provide recommendations on what research should be done, which not surprisingly matches their own interests and expertise.</p>
<p><strong>Lower the Bar?</strong></p>
<p>The report also addresses a second, widely used accountability policy: high-school exit exams that hold students responsible for meeting a set of content standards. The report’s second conclusion reads,</p>
<blockquote><p>The evidence we have reviewed suggests that high school exit exam programs, as currently implemented in the United States, decrease the rate of high school graduation without increasing achievement.</p></blockquote>
<p>The panel strongly suggests that states that impose an exit exam should repeal this requirement. To understand this conclusion, it is necessary to understand the exams themselves and to evaluate the evidence behind the committee’s conclusion.</p>
<p>Currently, more than half of the states require that students pass a test of some sort to obtain a normal diploma (see Figure 1), and virtually all of these current requirements have been put in place since 2000. The tests almost always cover English and math, but many states add science and history. Test difficulty varies by state, but the modal level is grade 10. Although that standard may seem low, it is considerably more stringent than the standards that existed prior to 1990, when no state had a test reaching even the 9th-grade level. The current tests are not as high a barrier to high school graduation as they are often alleged to be, as a student may generally take the exam multiple times in order to achieve a passing score. And in all but three states (South Carolina, Tennessee, and Texas), students can either appeal the test result, if they feel the score misrepresents their accomplishments, or obtain a diploma by some alternative path.</p>
<p><a href="http://educationnext.org/files/ednext_20122_CTF_map.jpg"><img class="alignright size-full wp-image-49645321" title="ednext_20122_CTF_map" src="http://educationnext.org/files/ednext_20122_CTF_map.jpg" alt="" width="690" height="507" /></a></p>
<p>The motivations for administering exit exams are to create incentives for students to apply themselves to the task of learning and to set uniform (minimum) quality standards for the state’s schools. Such content standards provide guidelines to schools about what to teach. They also indicate to colleges and universities what knowledge and abilities a graduate can be expected to possess. And they give similar information to prospective employers.</p>
<p>According to the best available evidence (discussed below), perhaps 2 percent of students are induced to drop out of school either because of failure to pass the exam or because of fear of not being able to pass the exam. Implicitly, the committee assumes this consequence does considerable harm to the affected students, given the substantial economic rewards that accrue, on average, from receiving a high school diploma. But average effects do not necessarily apply to the 2 percent on the border line between graduating and failing to graduate from high school. The impact for this particular group of students is likely to be much less, unless you make the bizarre assumption that it is only the diploma—not what the student learns—that affects job prospects and future income. The people who are induced to drop out because they cannot pass a 10th-grade exam would most likely be near the bottom of the earnings distribution of graduates were they to be handed a diploma. The economic impact on these students will be much lower than the average difference between graduate and dropout.</p>
<p>Perhaps the best argument against exit exams is simple: If a student shows up for school for 12-plus years and cannot pass a 10th-grade exam, it must be the school’s fault, and it would be unfair to hold the student responsible. This argument, interestingly enough, is the precise opposite of one of the primary arguments against the testing and accountability provisions of NCLB: We should not hold schools responsible for low achievement, because achievement is affected by student motivation and family background characteristics beyond the school’s control. Taken together, the arguments embedded in the committee’s two conclusions imply that nobody—not schools, not teachers, not even students themselves—bears responsibility for low student achievement.</p>
<p>Interestingly, the committee’s conclusion with respect to exit exams does not pick up on the full report’s emphasis on the importance of the design features of incentive systems, which include warnings that tests aimed at ensuring minimum competency may lower expectations, and concerns about both the potential narrowing of the curriculum and the tendency for score inflation on a known test. Instead, the presumed problem is the inherent unfairness of denying a diploma to a student who has met the attendance and course distribution requirements for a diploma.</p>
<p>If the main objective is to maximize high school graduation, there are many ways to do that. We could eliminate all exams, even those administered by teachers. We could loosen up course requirements. We could offer the diploma after 10 or 11 years of schooling, instead of 12. Of course, nobody is willing to take such steps, even though class exams, course requirements, and the inclusion of the 12th grade of schooling all have negative impacts on graduation rates. So why then does the NRC promote the idea of eliminating a 10th-grade-level examination as a requirement for high school graduation on the narrow basis that a few students will, as a result, not earn the degree? Is the NRC also against the movement of many states toward increasing the required amount of math or moving to college and career-ready standards?</p>
<p><strong>The Data Shuffle</strong></p>
<p>Let’s examine the evidence the committee supplies for its exit exam conclusion. The report marshals three studies that explore the issue: two on dropouts and one on achievement. Evaluating the impact of exit exams on achievement is inherently difficult. Because the exams apply to everybody in a state at the same time, it is not possible to compare students of the same age within the same state to find out the impact of exams. It is possible, however, to look at different cohorts of students, for example, those who attended school before the exam was in place and those who attended after, and to compare these to similar cohorts in other states where no such change in policy took place. In conducting this type of study, one must rule out other differences, such as those in family background or those in state education policies that might also affect student performance over time. Even when these challenges are met, one cannot be entirely sure of the results, as exit exams may influence student and school performance even before they come into effect, if teachers and students know that they will soon be introduced, which is usually the case.</p>
<p>The committee tosses out every exit-exam study (save three) that has ever been conducted on the grounds that it is not possible “to draw causal conclusions about the overall effects of test-based incentives” (that is, the very same criteria the committee ignored in considering school-level accountability). Some of the excluded studies use the well-regarded quasi-experimental technique known as regression discontinuity analysis. In the committee’s view, “Such regression discontinuity studies provide interesting causal information about the effect of being above or below the threshold, but they do not provide information about the overall effect of implementing an incentives program.” That criticism is odd, since the impact of an exit examination is of special interest for exactly those students on the cusp of adequate levels of achievement. While these excluded studies are not really appropriate for studying achievement, they tend to show little impact of exit exams on dropout behavior or graduation outcomes.</p>
<p>The committee relies for its conclusion regarding exit examinations exclusively on a 2009 study by Eric Grodsky, John Robert Warren, and Demetra Kalogrides. Because of the significance of this piece of research for the committee project as a whole, it is worth considering in some depth. The Grodsky team identified trends in student achievement in each state that administers an exit examination by drawing on data provided by the long-term trend assessments of the National Assessment of Educational Progress (NAEP). The long-term NAEP, begun in the late-1960s and continued with testing every few years, was designed to provide consistent score information to judge achievement of the nation as a whole. It was not designed to be used to evaluate the schools of any particular state or district. As a result, NAEP never collected in its long-term trend assessment a representative sample of students for any specific state, and the median number of tested students in each state was very small.</p>
<p>Grodsky et al. pretend that the NAEP provides them with just that: a representative sample of students for each state. They assume that the average performance of students in each state on the long-term NAEP provides an accurate measure of the average performance of students in that state, thereby violating the first principle of statistical sampling.</p>
<p>They then merge the information with information on the timing of the adoption of an exit exam by a state between 1971 and 2004. The study includes observations of math and reading achievement at 9 and 10 different points in time, respectively. The researchers report results for achievement of 13-year-olds and 17-year-olds separately, acknowledging that there are limitations to using either cohort. Thirteen-year-olds may be too young to detect the impact of exit exams, while the sample of 17-year-olds suffers from the noninclusion of school dropouts.</p>
<p>The Grodsky analysis encounters a further difficulty. For the most part, the researchers consider only the very early years, when exit exams were first introduced, a time when the exams were set at a very low level of difficulty, below that of a 9th-grade student. Only 1 percent of the observations included in their analysis are for states that had an exit exam rated at the 9th-grade level or higher, as most current examinations are.</p>
<p>Not only does the Grodsky team rely on inadequate data, but the analysis itself is flawed. Any attempt to see the effects of state tests should compare the changes that occur in the states that introduce them with changes in the states that do not. But the Grodsky study effectively tosses out all the information available for the 27 states that do not have an exit examination before 2004. As important, the analysis does not consider any measures of state policies except for exit exams, implying that any other policy changes for the three decades between 1971 and 2004 are either irrelevant for student performance or are not correlated with the introduction and use of exit exams.</p>
<p>The central finding is that exit exams do not have a statistically significant effect on test scores. But this insignificance could arise because of any or all of the above-mentioned problems rather than the absence of an effect of exit exams, as the NRC committee wants us to presume.</p>
<p>The committee’s estimate of the effects of exit exams on school dropout rates is less controversial. It relies on two quite reliable studies, although they are not without limitations: they study the effects of specific exit exams, which may not generalize to other arrangements. The studies indicate that perhaps 2 percent of potential high-school graduates would have received the diploma had it not been for the exit exams.</p>
<p>The committee touts the possibility of alternative incentives to exit exams: “Several experiments with providing incentives for graduation in the form of rewards, while keeping graduation standards constant, suggest that such incentives might be used to increase high school completion.” The key of course is just what the phrase “while keeping graduation standards constant” means. The idea behind exit exams is to ensure a minimum level of quality, as distinct from meeting the course completion requirements. Moreover, the report never makes the case that exit exams and other potential incentive programs are mutually exclusive. In principle, nobody would argue against employing other incentive programs as long as they were worth the expense and, as the committee says elsewhere, do not introduce perverse incentives of one kind or another.</p>
<p><strong>The Takeaway</strong></p>
<p>The NRC clearly wants to enter into the current debate about the reauthorization of NCLB. And the NRC has an unmistakable opinion: its report concludes that current test-based incentive programs that hold schools and students accountable should be abandoned. The report committee then offers three recommendations: more research, more research, and more research. But if one looks at the evidence and science behind the NRC conclusions, it becomes clear that the nation would be ill advised to give credence to the implications for either NCLB or high-school exit exams that are highlighted in the press release issued along with this report.</p>
<p>The framing of policy in the NRC report is simple: “The small or nonexistent benefits that have been demonstrated to date suggest that incentives need to be carefully designed and combined with other elements of the educational system to be effective.” Nobody would oppose careful design of incentives. Nobody would oppose evaluating the intended and unintended outcomes of incentives. And nobody would oppose combining carefully designed incentives with “other elements of the educational system to [make them] effective.”</p>
<p>The NRC is careful to offer no guidance on how NCLB or state exit exams might be modified to make them more effective. And the NRC is very careful not to offer any guidance on “other elements of the educational system.” The message that comes through is clear: keep working on test development, but never use tests for any incentive or policy purposes.</p>
<p>A better takeaway message might be, “Never rely on the conclusions of this NRC report for any policy purpose.”</p>
<p><em>Eric Hanushek is senior fellow at the Hoover Institution of Stanford University and member of the Koret Task Force on K–12 Education.</em></p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=49645318&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/grinding-the-antitesting-ax/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>The Case Against Michelle Rhee</title>
		<link>http://educationnext.org/the-case-against-michelle-rhee/</link>
		<comments>http://educationnext.org/the-case-against-michelle-rhee/#comments</comments>
		<pubDate>Tue, 12 Apr 2011 02:46:49 +0000</pubDate>
		<dc:creator>Paul E. Peterson</dc:creator>
				<category><![CDATA[Check the Facts]]></category>
		<category><![CDATA[Governance and Leadership]]></category>
		<category><![CDATA[Homepage]]></category>
		<category><![CDATA[Journal]]></category>
		<category><![CDATA[On Top of the News]]></category>
		<category><![CDATA[Alan Ginsburg]]></category>
		<category><![CDATA[Chancellor of Schools for the District of Columbia]]></category>
		<category><![CDATA[Michelle Rhee]]></category>
		<category><![CDATA[National Academy of Science]]></category>
		<category><![CDATA[National Research Council]]></category>
		<category><![CDATA[NRC]]></category>
		<category><![CDATA[Paul E. Peterson]]></category>

		<guid isPermaLink="false">http://educationnext.org/?p=49641326</guid>
		<description><![CDATA[How persuasive is it?]]></description>
			<content:encoded><![CDATA[<p><img style="width: 7px;height: 9px" src="http://educationnext.org/wp-content/themes/ednxt/img/podcast_icon.jpg" border="0" alt="" width="7" height="9" /> Podcast: <a href="http://educationnext.org/taking-the-measure-of-michelle-rhee/">Paul Peterson describes his new findings on the gains made by D.C. students</a></p>
<p>A footnoted version of this article is <a href="http://educationnext.org/files/Case_Against_Rhee_Unabridged.pdf">available here</a>.</p>
<hr /><a href="http://educationnext.org/files/20103_ctf_open.jpg"><img class="alignright size-full wp-image-49634363" style="float: right;padding-top: 5px;padding-bottom: 5px;padding-left: 5px" src="http://educationnext.org/files/20103_ctf_open.jpg" alt="" width="339" height="249" /></a></p>
<p>Recently, two separate studies—one by Alan Ginsburg, a former director of Policy and Program Studies in the U.S. Department of Education, the other by a committee constituted by the National Research Council (NRC)—have sought to discredit the work of Michelle Rhee, former chancellor of schools for the District of Columbia.</p>
<p>According to Ginsburg, Rhee was no more effective—probably even less effective—than her predecessors. Not surprisingly, his argument was quickly picked up by American Federation of Teachers president Randi Weingarten. In a <em>Wall Street Journal</em> interview, she asserts that Michelle Rhee “had a record that is actually no better than the previous two chancellors.” In a blog post dated March 29, 2011, Diane Ravitch makes the same point: “The gains under Rhee were no greater than the gains registered under her predecessor Clifford Janey, who did not use Rhee’s high-powered tactics, such as firing massive numbers of teachers.” Yet the evidence Ginsburg musters to support such claims falls well short of its mark.</p>
<p>In the second study, the NRC committee does not deny that student performance in the District of Columbia improved under Michelle Rhee’s chancellorship between 2007 and 2010, but it says there is no scientific evidence that proves the work of the chancellor is responsible for those gains. “The problem was the [test score] changes that seem to be going in the right direction can’t be attributed to the specific changes in the system,” the study committee’s co-chair Robert M. Hauser told an <em>Education Week</em> reporter. While it is certainly true that one cannot, in the absence of experimental evidence, establish a connection between policy changes and test-score outcomes, Hauser added a carefully worded slap at Rhee: “All districts should be cautious about generalizing from the kind of aggregate overview data that have been used to suggest successes of changes made in the district to date.” The reporter is then informed that “students’ NAEP scores started to improve before the overhaul law passed, as noted in a report last month by Alan Ginsburg.”</p>
<p>The NRC study bears the more prestigious imprimatur, but it is the Ginsburg study that is most likely to be cited in future discussions of merit pay, teacher tenure, and the like. So our fact-checking of the two studies begins with his contribution to the discussion.</p>
<p><strong>The Ginsburg Report</strong></p>
<p>Alan Ginsburg, though now retired, was until very recently the ultimate Washington insider. For more than a generation he was known as the Department of Education’s data-collection guru, the person inside the bureaucracy who understood best what information to collect and how to collect it. So it is of considerable interest that Ginsburg has now chosen to give aid and comfort to Weingarten and other union leaders by leveling a hard-core attack on “The Rhee DC Record.”</p>
<p>To an <em>Education Week</em> reporter, Ginsburg insisted that his critique of “The Rhee DC Record” is not “intended to be anti-Rhee.” He is reported as saying that he acted only because “he believes they [his findings] should serve as a check on a policy of mass dismissals of teachers as a way to improve districts. ‘For me, it’s the much larger question in this country of building a large teaching force.’” It is nonetheless quite disconcerting that he—and those who rely on his work—say that she was engaged in “large-scale firing” and “mass dismissals” when in fact she released in 2010 just 241 teachers for low performance.</p>
<p>Ginsburg excludes any and all information coming from the D.C. exams, known as the Comprehensive Assessment System (CAS), required by the federal law known as No Child Left Behind. He explains that decision on the grounds that “performance levels for 2006 and afterwards are not comparable with those from prior years.” But that does not preclude a comparison of Rhee’s record for the years beginning in 2007 with the situation in the year before she arrived. Had Ginsburg taken a look at that information, he would have found an acceleration of the gains in the percentage of students deemed proficient. Before Rhee’s tenure, or between 2006 and 2007, the percentage increase in proficiency was about 1 percentage point in reading and 4 percentage points in math. But between 2007 and 2010, the gains in percent proficient were 9 percentage points in reading and 15 percentage points in math.</p>
<p><strong>District Performance on National Assessment of Educational Progress</strong></p>
<p>Although these gains are impressive, a <em>USA Today</em> investigative team has expressed concerns that, at least in some schools, those test-score results might have been improperly inflated. No conclusive evidence of cheating has yet been established, but it may well be prudent to focus, as Ginsburg does, on the performance of D.C. students on the National Assessment of Educational Progress (NAEP), commonly known as the nation’s report card. That is a low-stakes test taken only by a representative sample of students, none of whom answer all the questions and for whom no results are reported by student, teacher, or school. As the NAEP is not part of any accountability system, incentives to cheat on the test are minimal, and no allegations of cheating have been made.</p>
<p>At first glance, Ginsburg does not seem to have much of a case against Rhee. D.C. scores on the NAEP shifted upward during the first two years Rhee was in office. In both 4th-grade math and reading they jumped by 6 points, and in 8th-grade math they leaped by 7 points, though they slipped a point in 8th-grade reading (see Figure 1).</p>
<p><a href="http://educationnext.org/files/ednext_20113_CTF_fig1.jpg"><img class="alignright size-full wp-image-49641329" style="margin-bottom: 10px" src="http://educationnext.org/files/ednext_20113_CTF_fig1.jpg" alt="" width="690" height="531" /></a></p>
<p>But Ginsburg says those gains are actually no greater than the ones students had been making in prior years, when superintendents Paul Vance and Clifford Janey were in charge. He reports, “With respect to the distribution of DC’s total gains in NAEP scores over grades 4 and 8 between 2000-09, Vance accounted for a 46% share of the total gain, Janey 30% and Rhee 24%.”</p>
<p>Though headline-grabbing numbers, they are quite misleading. Between 2000 and 2009, Rhee was in office for only two years, while Vance was in office for three, and Janey for four. If gains were rising at the same rate over the nine-year period, then each superintendent should account for 11.1 percent of the gains for each year in office: Vance 33.3%, Janey 44.4%, and Rhee 22.2 %. So based on Ginsburg’s own calculations, Rhee outperformed her immediate predecessor.</p>
<p>More significantly, Ginsburg ignores the fact that the D.C. NAEP sample in 2009 did not include students attending charter schools not authorized by the district, while in 2007 all charter school students were included. Because charter schools outside district control were outperforming district schools, the latter appeared to be doing better in 2007 than they actually were. NAEP corrected its data-collection procedures in 2009, but, except for 8th-grade math, it failed to provide the data that allow for an apple-to-apple comparison between 2007 and 2009. For 8th-grade math, NAEP explains that had NAEP followed the same policy in 2007 that it adopted in 2009, 8th-grade math scores under Rhee would have increased by 7 points, a statistically significant gain, not just the 3 points that are officially reported.</p>
<p>Similar underreporting of gains may have occurred on the 4th- and 8th-grade reading exams and the 4th-grade math tests, but NAEP unfortunately does not tell us how large they were. Its report only says that giving us that information would not alter the findings as to the statistical significance of gains. So in the analysis below, I provide the corrected results for 8th-grade math, but I cannot provide corrected results for the other exams.</p>
<p><strong>Closing the Gap between District and National Performance</strong></p>
<p>Most importantly, Ginsburg did not adjust for national trends in student performance occurring between 2000 and 2009. Unless one adjusts for national trends, one does not know whether gains in the district are due to district-specific events or to some larger developments in the nation, such as changes in the economy, or the waning effectiveness of No Child Left Behind, or permutations in the design and administration of the NAEP examination, or some other large-scale factor.</p>
<p><a href="http://educationnext.org/files/ednext_20113_CTF_fig2.jpg"><img class="alignright size-full wp-image-49641330" style="float: right;padding-top: 5px;padding-bottom: 5px;padding-left: 5px" src="http://educationnext.org/files/ednext_20113_CTF_fig2.jpg" alt="" width="450" height="436" /></a>The most straightforward way of adjusting for national trends is to look at the extent to which D.C. closed the gap between its students’ performances and those of students nationwide. Once that adjustment is made, it can be shown that Rhee did considerably better at that task than did her predecessors (see Figure 2). For example, during the Rhee years, 4th-grade students, in both reading and math, gained an average of 3 points each year relative to the scores earned by students nationwide, a gain twice that of Rhee’s predecessors.</p>
<p>These numbers seem small, but they add up. In 2000, the gap between D.C. and the nation in 4th-grade math was 34 points. Had students gained as much every year between 2000 and 2009 as they did during the Rhee era, that gap would in 2009 have been just 7 points. Three more years of Rhee-like progress and the gap is closed. In 8th-grade math, the gap in 2000 was 38 points. Had Rhee-like progress been made over the next nine years, the gap would in 2009 have been just 14 points, with near closure in 2012. In 4th-grade reading, the gap was 30 points in 2003 (scores are unavailable for 2000); if Rhee-like gains had taken place over the next six years, the gap in 2009 would have been cut in half.</p>
<p>None of this proves that Rhee could sustain the gains observed over a two-year period. That is too short a time to draw conclusions about a leader based on NAEP results alone. Also, no improvement in 8th-grade reading is detected. The overall results do, however, cast doubt on Ginsburg’s claim that Rhee did no better than her predecessors.</p>
<p>But perhaps the other report, the one issued by a committee of the prestigious National Research Council, makes a more persuasive case that Rhee’s performance is less than it seems.</p>
<p><strong>The National Research Council Report</strong></p>
<p>The National Academy of Sciences dates its lineage back to the presidency of Abraham Lincoln, who asked three scientists to help in the “war against the rebellion.” Operating under its aegis, the NRC has positioned itself as the only nonprofit organization that can sign contracts with federal agencies without submitting a competitive bid. In the hard sciences, NRC periodically issues major reports of public significance. But on too many occasions it exploits its reputation for objectivity by wandering into domains where scientific knowledge is thin.</p>
<p>NRC has expanded its operations beyond reports to federal agencies. In the case at hand, it acted on a 2007 request of the D.C. City Council “under the leadership of Vincent C. Gray” to carry out an independent evaluation of D.C. public schools. Despite the fact that Gray was already planning his run for mayor, NRC responded enthusiastically to his request by undertaking an energetic fundraising campaign that supplemented the council’s own $325,000 in funding with a like amount from a variety of foundations and agencies, including the Spencer Foundation, the National Science Foundation (which contributed $200,000), and the World Bank (which contributed $25,000).</p>
<p>With $650,000 in hand, NRC staff formed the 14-member, largely academic Committee on the Independent Evaluation of DC Public Schools, consisting of a variety of professors and researchers. Its co-chairs are Christopher Edley, the left-leaning dean of Berkeley law school and, as mentioned, Robert Hauser, former University of Wisconsin sociology of education professor, a liberal critic of accountability systems, who has recently assumed the leadership of NRC’s division responsible for education reports.</p>
<p><strong>Guidance for a Future Evaluation</strong></p>
<p>The committee’s official assignment was not to carry out an independent evaluation, as its title implies, but only to 1) “provide guidance on how to structure” that evaluation and 2) “provide feedback about implementation” of the Rhee reforms. As part of its “guidance,” the committee calls for “systematic yearly public reporting of key data as well as in-depth studies of high priority issues.” One needs to look at more than just “student test scores,” it says. One needs to establish “suitable indicators” that “track how well the city’s public schools are doing.” “In-depth studies should be designed to provide deeper analysis of specific questions about high priority issues,” such as “teacher recruitment and retention.”</p>
<p>If most of this guidance consists of harmless bromides, one recommendation has an edge to it: The evaluation “must be independent of school and city leaders and responsive to the needs of all stakeholders.” Read in the context of D.C. politics, this seems to say: Keep the mayor and chancellor out of any independent evaluation, but let the unions play a major role. Now that Vincent Gray is mayor, one wonders just how eager he will be to act on that recommendation!</p>
<p>The committee has not issued a final document, but it has put out a press release and a prepublication version of an unedited version of the report. The rush to print seems to have been necessary in order to carry out the committee’s second objective: providing “feedback” on the Rhee record, which it apparently wanted to accomplish before her successor officially assumed office. The first substantive information in the committee’s press release reads as follows: “Data suggest that a modest improvement in student test scores has continued&#8230;but the committee cautions that it is premature to draw general conclusions about the reforms’ effectiveness at this time.” Note that the press release talks about a “continuation,” not an “acceleration,” in “modest,” not “striking,” improvement in student achievement. An <em>Education Week</em> reporter explains that “the evaluators confirmed that students’ NAEP scores started to improve before the overhaul law passed, as noted in a report last month by Alan Ginsburg.” Clearly, the NRC committee leadership was willing to put an NRC stamp on Ginsburg’s claims.</p>
<p><strong>Do Teachers Need to Be at School for Students to Learn?</strong></p>
<p>How did the committee cast doubt on Rhee’s effectiveness? The general strategy is to admit the evidence on school improvement in D.C., but then insist that it is impossible to see any connection between that improvement and the work of the chancellor. Of course, it is, as we have said, quite impossible, without experimental evidence, to prove connections between Rhee policies and changes in student gains, but that is not the committee’s agenda. Not in its executive summary, in its press release, or anywhere in the report does the committee call for the conduct of experiments that could establish causal relationships between policies and outcomes. On the contrary, the committee recommends gathering still more trend data and conducting old-fashioned case studies that in the end will prove little more than what is already known. And in the pursuit of its second objective, giving feedback on the Rhee reforms, it does not carry out even minimal case-study research to see whether a probable relationship may exist between Rhee policies and classroom outcomes.</p>
<p>Take, for example, the decline in student and teacher truancy. According to 8th-grade student self-reports, the rate of absenteeism declined significantly between 2007 and 2009. Teacher absenteeism also dropped noticeably over these same two years. The days on which 98 percent or more of the teachers were at school climbed from about 68 percent to approximately 85 percent.</p>
<p>Instead of congratulating the district on this improvement, the committee cautions: “It is important to note&#8230;that the fact that teacher absenteeism is correlated with achievement does not mean that the absenteeism causes the low achievement. There are many other factors, such as school safety, that affect both teacher absenteeism and student achievement. This is just one example of the many limitations of these data.”</p>
<p>In this passage we see a certain bias at work. The incidence of student and teacher truancy declined, the committee admits. But that hardly proves Rhee was a success or that students, in order to learn, need the stability that comes with the presence of their regular teacher. Perhaps school safety also improved, but the committee makes no effort to gather statistics on this point or carry out a case study to see whether Rhee had worked to make schools safer. We are simply left with the caution that a drop in the rate of absenteeism might not prove anything.</p>
<p><strong>Comparing D.C. to Other Big Cities</strong></p>
<p>The committee also acknowledges a notable climb in test scores on the DC CAS test and says that “NAEP shows increases similar to those seen on the CAS.” But, it says, “in comparison with other urban districts, the District’s scores were similar; many others also showed consistently significant gains.”</p>
<p><a href="http://educationnext.org/files/ednext_20113_CTF_fig3.jpg"><img class="alignright size-full wp-image-49641331" style="float: right;padding-top: 5px;padding-bottom: 5px;padding-left: 5px" src="http://educationnext.org/files/ednext_20113_CTF_fig3.jpg" alt="" width="450" height="458" /></a>Really? At the 4th-grade level, D.C. students in math and reading gained 6 scale score points between 2007 and 2009, while the average gain in the other 10 cities for which comparable data are available was only 1 point and 2.2 points, respectively. In 8th-grade math, the D.C. gains were 7 points, as compared to an average of 2.9 points for the other cities. Only in 8th-grade reading does the District of Columbia lag behind, dropping a point, while the others gained 1.7 points (see Figure 3).</p>
<p><strong>Do Demographics Explain Gains?</strong></p>
<p>The committee next worries over whether the gains may be due to a change in the composition of the student population in D.C. “The composition of students tested in DCPS&#8230;has changed markedly since 2007,” the report says. “These patterns could bias the&#8230;statistics.” Education Week’s reporter was told that “the numbers of students with disabilities or limited English proficiency fell during that time. The district also had fewer black students and more white and Hispanic students by 2010.”</p>
<p>But is there any reason to believe the gains on the NAEP between 2007 and 2009 were attributable to a shift in the D.C. demography? Did high-income whites and blacks bring their children into the district’s public schools, while low-income blacks and Hispanics moved out? According to the committee’s own report, signs point in the opposite direction. The percentage of students identified as economically disadvantaged grew from 63 percent in 2007 to 70 percent in 2009. The percentage African American slipped slightly from 85 percent to 83 percent of the total, but the percentage Hispanic increased from 9 percent to 10 percent, while the white population rose from 4 percent to 5 percent. Those needing instruction in the English language increased from 7 percent to 10 percent. It’s true that the percentage identified as in need of special education budged downward by 1 percentage point, but the participation rates of special education students on the NAEP increased by 1.5 percent over the two-year period. Nothing in these data indicates that the D.C. schools had fewer challenges in 2009 than they had in 2007.</p>
<p><a href="http://educationnext.org/files/ednext_20113_CTF_img1.jpg"><img class="alignright size-full wp-image-49641353" style="float: right;padding-top: 5px;padding-bottom: 5px;padding-left: 5px" src="http://educationnext.org/files/ednext_20113_CTF_img1.jpg" alt="" width="300" height="295" /></a></p>
<p><strong>Rhee’s Record</strong></p>
<p>In all the numbers Rhee’s critics have assembled, the two facts that stand out have nothing to do with test scores, but rather with student and teacher absenteeism. One does not know how quickly leaders can have an impact on student learning, but strong educational leaders are known for their impact on school culture. If we take Rhee at her word, changing culture was what she was trying to do, and those falling absenteeism indicators suggest that she may have had an effect, even in a short period of time. It’s even possible that a change in the D.C. school climate accelerated learning gains. About that one cannot be certain when only two years of NAEP data are available. But one can be quite sure that a case against Rhee has yet to be established.</p>
<p><em>Paul E. Peterson directs Harvard’s Program on Education Policy and Governance.</em></p>
<p>A footnoted version of this article is <a href="../files/Case_Against_Rhee_Unabridged.pdf">available here</a>.</p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=49641326&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/the-case-against-michelle-rhee/feed/</wfw:commentRss>
		<slash:comments>23</slash:comments>
		</item>
		<item>
		<title>A Closer Look at Charter Schools and Segregation</title>
		<link>http://educationnext.org/a-closer-look-at-charter-schools-and-segregation/</link>
		<comments>http://educationnext.org/a-closer-look-at-charter-schools-and-segregation/#comments</comments>
		<pubDate>Tue, 27 Apr 2010 11:05:09 +0000</pubDate>
		<dc:creator>Gary Ritter</dc:creator>
				<category><![CDATA[Check the Facts]]></category>
		<category><![CDATA[Homepage]]></category>
		<category><![CDATA[Journal]]></category>
		<category><![CDATA[charter schools]]></category>
		<category><![CDATA[Choice without Equity: Charter School Segregation and the Need for Civil Rights Standards]]></category>
		<category><![CDATA[Civil Rights Project]]></category>
		<category><![CDATA[CRP]]></category>
		<category><![CDATA[racial composition]]></category>
		<category><![CDATA[segregation]]></category>

		<guid isPermaLink="false">http://educationnext.org/?p=49634360</guid>
		<description><![CDATA[Flawed comparisons lead to overstated conclusions]]></description>
			<content:encoded><![CDATA[<p><a href="http://educationnext.org/files/20103_ctf_open.jpg"><img class="alignright size-full wp-image-49634363" style="float: right; padding-top: 5px; padding-bottom: 5px; padding-left: 5px;" title="20103_ctf_open" src="http://educationnext.org/files/20103_ctf_open.jpg" alt="" width="339" height="249" /></a>In January 2010, the UCLA-based Civil Rights Project (CRP) released “Choice without Equity: Charter School Segregation and the Need for Civil Rights Standards.” The study intended to report on, among other things, levels of racial segregation in charter schools across the United States. The authors use 2007–08 data from the U.S. Department of Education’s Common Core of Data (CCD) to compare the racial composition of charter schools to that of traditional public schools at three different levels of aggregation: nationwide; within 40 states and the District of Columbia; and within 39 metropolitan areas with large enrollments of charter school students. Based on these comparisons, the authors conclude, incorrectly in our view, that charter schools experience severe levels of racial segregation compared to traditional public schools (TPS).</p>
<p>We will show that, when examined more appropriately, the data actually reveal small differences in the level of overall segregation between the charter school sector and the traditional public-school sector. Indeed, we find the majority of students in the central cities of metropolitan areas, in both charter and traditional public schools, attend school in intensely segregated settings. Our findings are similar to those in a 2009 report by RAND, in which researchers focused on segregation in five large metropolitan areas (Chicago, Denver, Milwaukee, Philadelphia, and San Diego) — areas that were also included in the CRP report. The RAND authors, with the benefit of student-level data, follow students who move from traditional public schools into charter schools and conclude that these transfers have “surprisingly little effect on racial distributions across the sites.” The authors of the RAND report write:</p>
<p style="padding-left: 30px;">Across 21 comparisons (seven sites with three racial groups each), we find only two cases in which the average difference between the sending TPS and the receiving charter school is greater than 10 percentage points in the concentration of the transferring student’s race.</p>
<p>The RAND report, based on a superior methodology, provides strong evidence that the CRP claims are off base. Their findings, coupled with our own, offer a significantly different portrayal of segregation in charter schools than the CRP report. We find no basis for the allegations made by the CRP authors, who argue that charter-school enrollment growth, based on the free choices of mostly minority families, represents a “civil rights failure.”</p>
<p>While we find fault with the methodology employed by the CRP authors, and with their conclusions, we recognize that the questions addressed by the CRP, in this report and in scores of earlier ones, concern issues of importance for policymakers and the public alike. With the billions of dollars invested each year in public schools, both traditional and charter, and the millions of hours that we compel our children to attend these schools, it is critical that we have a basic understanding of the school environment that we are providing. Moreover, given the history of forced racial segregation in our nation’s schools, we must be ever-attentive to these issues.</p>
<p>Indeed, because these questions are of such significance, it is imperative that they be addressed carefully and correctly.</p>
<p><strong>The Wrong Approach</strong></p>
<p>Unfortunately, the analyses employed in the CRP report do not meet this standard. The authors begin by presenting a great deal of descriptive data on the overall enrollment and aggregate racial composition in public charter schools compared to traditional public schools. Based only on enrollments aggregated to the national and state level, the authors repeatedly highlight the overrepresentation of black students in charter schools in an attempt to portray a harmful degree of segregation. But comparisons of simple averages at such a high level of aggregation can obscure wide differences in school-level demographics among both charter and traditional public schools. It is like having your feet in the oven and your head in the icebox, and saying that, on average, the temperature is just right.</p>
<p>After this descriptive overview, the authors address the question of racial segregation in a more appropriate way. In this analysis, the CRP authors define as “hypersegregated” any school with a 90 percent minority population or a 90 percent white population. Their aim is to determine if charter students nationwide are more or less likely to attend school in such hypersegregated environments. However, a critical flaw undermines this comparison and all of the analyses that follow. In every case, whether the authors examine the numbers at the national, state, or metropolitan level, they compare the racial composition of <em>all</em> charter schools to that of <em>all</em> traditional public schools. This comparison is likely to generate misleading conclusions for one simple reason, as the authors themselves point out on the first page of the executive summary and then again on page 57 of the full report: “the concentration of charter schools in urban areas skews the charter school enrollment towards having higher percentages of poor and minority students.”</p>
<p>In other words, the geographic placement of charter schools practically ensures that they will enroll higher percentages of minorities than will the average public school in the nation, in states, and in large metropolitan areas. Further, because serving disadvantaged populations is the stated mission of many charter schools, they seek out locations near disadvantaged populations intentionally. Instead of asking whether all students in charter schools are more likely to attend segregated schools than are all students in traditional public schools, we should be comparing the racial composition of charter schools to that of nearby traditional public schools. Employing this method, we could compare the levels of segregation for the students in charter schools to what they would have experienced had they remained in their residentially assigned public schools.</p>
<p>If we acknowledge this standard for valid comparisons, we can quickly dismiss the national and state-level comparisons, which constitute the bulk of the CRP report. According to the authors’ own numbers in Table 20, more than half (56 percent) of charter school students attend school in a city, compared to less than one-third (30 percent) of traditional public school students. Thus, any national comparisons are inappropriate, as these two groups of students are inherently dissimilar. The authors employ this same flawed strategy individually for each of the 40 states included in their analysis. Again, comparing the segregation in charter schools in a state, which are concentrated in heavily minority central cities, to that in traditional public schools throughout the state, reveals nothing about the reality of racial segregation in charter schools.</p>
<p>The examples that the authors draw from these state-level comparisons are almost humorous at times. For example, consider the following point from page 43 of the report:</p>
<p style="padding-left: 30px;">In some cases, like Idaho, charter school students across all races attend schools of white isolation: majorities of students of all races are in 90–100% white charter schools.</p>
<p>No kidding! The state of Idaho is nearly 95 percent white. Obviously, this is not a charter phenomenon, yet the authors brazenly use this as evidence for their claims without making any mention of the corresponding figure for the traditional public schools in the state.</p>
<p>Finally, the authors consider the hypersegregation in charter and traditional public schools individually within 39 metropolitan areas. But even within the large Census Bureau–defined Core-Based Statistical Areas (CBSAs) used as proxies for metropolitan areas, charters are still disproportionately located in low-SES (socioeconomic status) urban areas, while traditional public schools are dispersed throughout the entire CBSA. For example, the authors note that in the Washington, D.C., CBSA, 91 percent of students in charter schools attend hypersegregated schools, while only 20 percent of students in that same area attend hypersegregated traditional public schools. A quick look at the geographical placement of charter schools in the D.C. metro area, however, shows why such a comparison is inappropriate. The D.C. metro CBSA contains 1,186 traditional public schools, 1,026 of which are in Virginia, Maryland, and even West Virginia; only 13 percent of the traditional public schools in the D.C. CBSA are actually situated in the racially isolated District of Columbia. On the other hand, 93 percent of the charter schools in the D.C. CBSA are located in D.C. In other words, nearly all of the area’s charter schools are in D.C., while the vast majority of the traditional public schools the authors use in their comparisons are located in the largely suburban or exurban areas of surrounding states. For the 39 CBSAs examined by the authors, only 22 percent of the traditional public schools were located in central cities, compared to 51 percent of the charter schools.</p>
<p><strong>A Tighter Comparison</strong></p>
<p>It is indeed likely that, with the right analysis and the proper questions, the conclusion would not be as clear as portrayed by the CRP authors. We modified the CRP analysis by comparing the percentage of students in hypersegregated minority charters within the central city of each CBSA to the percentage of students in hypersegregated minority traditional public schools within the same central city. For example, for the Washington, D.C., CBSA, we included only schools located within the District of Columbia. The data we obtained for this comparison are publicly available from the Common Core of Data, so the CRP researchers could have conducted their analysis at this level. Of course, even this analysis is not perfect. Only following students at the individual level would reveal precisely what effect charters are having on segregation.</p>
<p><a href="http://educationnext.org/files/20103_ctf_tbl1.jpg"><img class="alignright size-full wp-image-49634361" style="float: right; padding-top: 5px; padding-bottom: 5px; padding-left: 5px;" title="20103_ctf_tbl1" src="http://educationnext.org/files/20103_ctf_tbl1.jpg" alt="" width="414" height="328" /></a>We focus our reanalysis on the data presented by the authors in their report, (Table 10). The focal measures in this table are shown in the last two columns, where the authors present the percentage of charter school students (from the entire metropolitan area) in schools with greater than 90 percent minority students alongside the similar figure for traditional public schools. The problematic figure in this table is the percentage of traditional public school students in hypersegregated schools used as the point of comparison. (See Table 1) which shows the bias entailed for the 8 largest metropolitan areas by the CRP report.</p>
<p>Whether or not we believe that charter schools are more segregated than traditional public schools depends largely on which set of traditional public schools serve as a comparison. The data for these eight very large metropolitan areas, representing more than half of the enrollment for the entire dataset, demonstrate how the CRP method overstates the relative levels of segregation in the charter sector. For example, under the CRP method, 91.2 percent of the charter students in the DC CBSA are in hypersegregated minority schools, as compared to just 20.9 percent of the students in traditional public schools. Using the central-city method, the percentage of students in hypersegregated minority charters stays roughly the same, but the percentage of students in hypersegregated minority traditional publics skyrockets to 85 percent.</p>
<p><a href="http://educationnext.org/files/20103_CTF_figure1.jpg"><img class="alignright size-full wp-image-49634362" style="float: right; padding-top: 5px; padding-bottom: 5px; padding-left: 5px;" title="20103_CTF_figure1" src="http://educationnext.org/files/20103_CTF_figure1.jpg" alt="" width="414" height="395" /></a>In fact, in the vast majority of the 39 metro areas reviewed in the CRP report, the application of our central-city comparison decreases (relative to the flawed CRP analysis) the level of segregation in the charter sector as compared to the traditional public school sector. (<a href="http://educationnext.org/files/20103_Ritter_Supplement.pdf" target="_blank">Click here to view a table with these figures for all 39 CBSAs</a>.) Importantly, unlike the CRP authors, we also compute and present the overall average results. Using the best available unit of comparison, we find that 63 percent of charter students in these central cities attend school in intensely segregated minority schools, as do 53 percent of traditional public school students (see Figure 1). Thus, while it appears that charter students are, on average, more likely to attend hypersegregated minority schools, the difference between the charter and traditional public sector is far less stark than the CRP authors suggest.</p>
<p><strong>The Right Question</strong></p>
<p>Our analysis presents a more accurate, but still imperfect, picture of the levels of racial segregation in the charter sector relative to the traditional public-school sector. Ideally, to examine the issue of segregation, we would pose the question, Are the charter schools that students attend more or less segregated than the traditional public schools these students would otherwise attend? Unfortunately, our data linking schools to cities do not allow for this analysis.</p>
<p>Even within many of the central cities in the metropolitan areas listed above, there is a great deal of racial segregation. And most available data suggest that charter schools are popping up in areas where the students are poor and disadvantaged and need additional educational options. Public charter schools are simply less likely to open in economically advantaged, mostly white neighborhoods. Thus, even our analysis likely underestimates the true levels of racial segregation in the specific traditional public schools that charter students are leaving. Indeed, a more fine-grained analysis (similar to the study conducted by RAND) in which we compared the levels of segregation in public charter schools to that of the traditional public schools in the same neighborhood would be preferable. The RAND report is particularly relevant here because it focuses on student-level data from Chicago, Denver, Milwaukee, Philadelphia, and San Diego, five metropolitan areas highlighted in the CRP report. By examining student-level transfers, the authors are able to determine the extent to which students move into schools with higher concentrations of their own race and thereby increase the overall level of segregation. Using this strategy, the RAND researchers found,</p>
<p style="padding-left: 30px;">Transfers to charter schools did not create dramatic shifts in the sorting of students by race or ethnicity in any of the sites included in the study. In most sites, the racial composition of the charter schools entered by transferring students was similar to that of the TPSs from which the students came.</p>
<p>Our own similar analysis of student-level transfers to charters in the Little Rock, Arkansas, area over the past five years tells much the same story. While many of the students transferred into Little Rock charter schools that were racially segregated, these students generally left traditional public schools that were even more heavily segregated.</p>
<p><strong>Conclusion</strong></p>
<p>The authors of the Civil Rights Project report conclude,</p>
<p style="padding-left: 30px;">Our new findings demonstrate that, while segregation for blacks among all public schools has been increasing for nearly two decades, black students in charter schools are far more likely than their traditional public school counterparts to be educated in intensely segregated settings.</p>
<p>Our analysis suggests that these claims are certainly overstated. Furthermore, the authors fail to acknowledge two significant truths.</p>
<p>First, the majority of students in central cities, in both the public charter sector and in the traditional public sector, attend intensely segregated minority schools. Neither sector has cause to brag about racial diversity, but it seems clear that the CRP report points its lens in the wrong direction by focusing on the failings of charter schools. As the authors themselves note, across the country only 2.5 percent of public school children roam the halls in charter schools each day; the remaining 97.5 percent are <em>compelled</em> to attend traditional public schools. And we know that, more often than not, the students attending traditional public schools in cities are in intensely segregated schools. If we are truly concerned about limiting segregation, then this is where we should look to address the problem.</p>
<p>Second, and perhaps more important, the fact that poor and minority students flee segregated traditional public schools for similarly segregated charters does not imply that charter school policy is imposing segregation upon these students. Rather, the racial patterns we observe in charter schools are the result of the choices students and families make as they seek more attractive schooling options. To compare these active parental choices to the forced segregation of our nation’s past (the authors of the report actually call some charter schools “apartheid” schools) trivializes the true oppression that was imposed on the grandparents and great-grandparents of many of the students seeking charter options today.</p>
<p><em>Gary Ritter is professor of education policy at the University of Arkansas. Nathan Jensen, Brian Kisida, and Joshua McGee are research associates in the Department of Education Reform at the University of Arkansas.</em></p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=49634360&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/a-closer-look-at-charter-schools-and-segregation/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Quality Counts and the Chance-for-Success Index</title>
		<link>http://educationnext.org/quality-counts-and-the-chance-for-success-index/</link>
		<comments>http://educationnext.org/quality-counts-and-the-chance-for-success-index/#comments</comments>
		<pubDate>Tue, 12 Jan 2010 12:20:22 +0000</pubDate>
		<dc:creator> </dc:creator>
				<category><![CDATA[Check the Facts]]></category>
		<category><![CDATA[Homepage]]></category>
		<category><![CDATA[School Life]]></category>
		<category><![CDATA[Standards, Testing, and Accountability]]></category>
		<category><![CDATA[CFSI]]></category>
		<category><![CDATA[Chance-for-Success Index]]></category>
		<category><![CDATA[Editorial Projects in Education Research Center]]></category>
		<category><![CDATA[Education Week]]></category>
		<category><![CDATA[EPE]]></category>
		<category><![CDATA[Quality Counts]]></category>

		<guid isPermaLink="false">http://educationnext.org/?p=49632355</guid>
		<description><![CDATA[Narrowing its scope to factors schools can control would give the measure greater value]]></description>
			<content:encoded><![CDATA[<p><a href="http://educationnext.org/files/20102_77_opener.jpg"><img class="alignright size-full wp-image-49632368" style="float: right;padding-top: 5px;padding-bottom: 5px;padding-left: 5px" src="http://educationnext.org/files/20102_77_opener.jpg" alt="20102_77_opener" width="220" height="193" /></a>From the moment of birth, Americans have a fascination with seeing how we measure up. Apgar scores assess the vitality of a newborn. Growth charts compare a youngster to his peers. Report cards throughout school equate a student’s academic performance with a grading standard. Professional athletes, corporations, and communities all have rating systems designed to reveal their quality. We are a nation obsessed with the story told in numbers. And we seem to take on faith that the rating systems behind the scores are on target.</p>
<p>The quality of our public schools has been measured in innumerable ways, and stakeholders may draw on any number of sources for rankings to support a particular agenda. Each winter, <em>Education Week</em> issues <em>Quality Counts</em> as a magazine supplement to its weekly newspaper. Report cards track and compare state education policies and outcomes in six areas: chance-for-success; K–12 achievement; standards, assessments, and accountability; transitions and alignment; the teaching profession; and school finance. For example, the grade for transitions and alignment is based on 14 indicators related to “early-childhood education, college readiness, and economy and workforce,”  while the school finance indicators measure spending patterns and resource distribution. Through these report cards, <em>Education Week</em> purports to “offer a comprehensive state-by-state analysis of key indicators of student success.”</p>
<p>The <em>Quality Counts</em> rankings are eagerly anticipated, thoroughly perused, and widely quoted. After the 2009 rankings were released, the Maryland State Department of Education issued a press release touting the state’s place at “the top of the list in <em>Education Week’s</em> tally, just ahead of Massachusetts.” Florida governor Charlie Crist celebrated the news that Education Week’s <em>Quality Counts</em> rated Florida’s schools 10th in the nation, based on its average rating across the six categories that comprise the analysis. Are Florida’s schools among the nation’s best? It depends on what you measure. By November of 2009, two lawsuits had been filed in Florida claiming the state was <em>failing</em> to provide high-quality education to its students. The plaintiffs claimed the state has low graduation rates, frequent school violence, and low levels of education spending and teacher pay compared to other states.</p>
<p><a href="http://educationnext.org/files/20102_77_indicate.jpg"><img class="alignright size-full wp-image-49632372" style="float: right;padding-top: 5px;padding-bottom: 5px;padding-left: 5px" src="http://educationnext.org/files/20102_77_indicate.jpg" alt="20102_77_indicate" width="230" height="852" /></a>The rankings are also frequently misunderstood. Among the most widely cited of the <em>Quality Counts</em> ranking schemes is the Chance-for-Success Index (CFSI), which attempts to measure a state’s capacity for helping young people succeed. Here’s what <em>Education Week’s </em>Editorial Projects in Education (EPE) Research Center has to say about the index:</p>
<p>The Chance-for-Success Index captures the critical role that education plays at all stages of an individual’s life, with a particular focus on state-to-state differences in opportunities. While early foundations and the returns in the labor market from a quality education are important elements of success, we find that the school years consistently trump those factors. In every state, indicators associated with participation and performance in formal schooling constitute the largest source of points awarded in this category, and help explain much of the disparity between the highest- and lowest-ranked states.</p>
<p>The CFSI’s stated aim is to show the role that education plays as a student moves from childhood through the formal K–12 system and into the workforce, but then the rest of the description is fairly ambiguous. Many states nonetheless interpret the index as a simple measure of school quality. Maryland came in fifth in 2009, with a B+. The Maryland schools’ press release cited above reported that the state “ranked among the nation’s leaders in ‘Chance for Success,’ which looks at how well graduates achieve beyond high school.” Of course, some states choose not to emphasize their CFSI score. For example, the New Mexico education department’s January 2009 press release led with its number-two rank and A grade for transition and alignment policies and buried in the middle its 51st-place CFSI grade of D+.</p>
<p>Does CFSI measure the school system’s contributions to achievement beyond high school? It’s hard to say. Most of its components, described as “key facets of education spanning stages from childhood to adulthood,” are a grab bag of demographic characteristics. The index combines indicators related to family background, wealth, education levels, and employment with schooling measures, including kindergarten enrollment and selected National Assessment of Educational Progress (NAEP) test scores. The 13 components of success are identified in the sidebar. Not all of these have a clear relationship to postsecondary success, and several are beyond the control of state policymakers.</p>
<p>Consider the parental employment indicator and its role in an index that is updated annually or even every other year. Short-run trends in parental employment may not have any impact on the overall quality of a state’s education system; even the direction of possible influence is unclear. Parents who see how difficult it is to get and retain employment without education may stress the value of school completion, but it is also conceivable that underemployed parents may seek to accelerate their children’s entry into the labor force, even at the expense of their education. A similar problem exists with annual income: many factors outside of education quality influence the vitality of a state economy. Even if strong gains in public education are realized, it will be years before the effects are reflected in adults’ annual income. Income trends over the next few years will have little or nothing to do with current levels of education quality.</p>
<p><strong>A Different Approach</strong></p>
<p>Absent a sound theory of action, it is easy to go on a data spree. As seen in the CFSI, the more the merrier. As an experiment, we reconstructed the Chance-for-Success Index. First, we selected a clear standard for our index: we defined “success” as the percentage of young adults, aged 18 to 24, who are productively engaged in postsecondary endeavors (pursuing a college degree, active military service, or full-time employment). We limited the indicators to only those factors for which a reasonable empirical base of evidence shows an association between the indicator and our definition of success and that are plausibly under the control of education policymakers. Five indicators have a clear bearing on education outcomes: preschool enrollment, kindergarten enrollment, 4th-grade reading, 8th-grade mathematics, and high school graduation. Using the same source data as the 2009 CFSI and giving each factor equal weight, we computed new averages for each state and compared the new rankings to the originals.</p>
<p>Our results show marked divergence from the CFSI rankings (see Table 1). Only Maryland (5th) and Arizona (43rd) retained their rankings, although four of the top five stayed within that band. Looking down the list, however, 34 states moved 3 or more places, 21 shifted by 5 or more places, and 13 states moved by 8 or more places. Does our revised index precisely rank states’ public education systems? Probably not. The ideal index would be one that measured how well states and schools did, given their demography. Still, this exercise shows how sensitive the CFSI is to the choice of indicators.</p>
<p>Removing family background characteristics from the index changes states’ rankings substantially. The states that drop the most in the revised rankings are Hawaii, Rhode Island, Indiana, Alaska, Nebraska, and North Dakota. The states that gain the most are Florida, Texas, Maine, Idaho, Arkansas, and Mississippi, mostly poor, rural states.</p>
<p>Is the CFSI largely a measure of parental education? We looked at where the states would fall if we ranked them by individual family background variables. The variable that by itself provides a ranking with the closest fit to the CFSI is percentage of children with at least one parent with a postsecondary degree (parent education). Ranked by that measure alone, only 8 states would move by 8 or more places from their positions in the CFSI. Indicators of family income and adult education levels also produce rankings similar to the CFSI. Ranking states by either the percentage of children in families with incomes at least 200 percent of poverty level (the family income indicator) or the percentage of adults (25–64) with a 2- or 4-year postsecondary degree (adult educational attainment), only 15 states would move 8 or more places.</p>
<p><a href="http://educationnext.org/files/20102_77_table1.jpg"><img class="alignright size-full wp-image-49632365" src="http://educationnext.org/files/20102_77_table1.jpg" alt="20102_77_tbl1" width="690" height="695" /></a></p>
<p><strong>Measuring the Measure</strong></p>
<p>Report cards must meet a number of conditions if they are to be reliable. First, they need to clearly define the condition or result being examined. None of the descriptions provided by the CFSI editors accomplish this—they never reveal exactly what they take the “chance for success” to be, asserting only that some states provide better opportunities than others. Explained the EPE Research Center’s director, “a child’s life prospects depend greatly on where he or she lives.”</p>
<p>Second, the indicators that are employed should have direct and proven association with the outcome being measured. The CFSI’s current approach mixes inputs such as demographics with outcomes like academic results to arrive at a single score. The result is a tautology: success is the sum of the parts; the parts are by default the components of success.</p>
<p>The editors of <em>Quality Counts</em> gather and report a variety of measures that reflect current education and policy performance across all 50 states and the District of Columbia and, through comparison, encourage states to take actions that would lead to improvements in their ratings. Nowhere do the <em>Quality Counts </em>editors show how or why the Chance-for-Success Index is a good predictor of success. Instead, they provide statistics that divert attention away from the things that actually do matter, such as high-quality teaching, a good range of school options, and success in early elementary schools. There is risk in including variables that have no real impact on the result being studied. States may view the results as motivators to improvement, and ineffective indicators may lead to ineffective attention and investment. Narrowing the scope of the Chance-for-Success Index to factors both causally related to school achievement and under the control of state education officials or school districts would improve its value and deliver the right signals to states.</p>
<p><em>CREDO at Stanford University supports education organizations and policymakers in using research and program evaluation to assess the performance of education initiatives. The team is led by Margaret Raymond and includes Kenneth Surratt, Devora Davis, Edward Cremata, Emily Peltason, Meghan Cotter Mazzola, Kathleen Dickey, and Rosemary Brock.</em></p>
<p><em> </em></p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=49632355&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/quality-counts-and-the-chance-for-success-index/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Fraud in the Lunchroom?</title>
		<link>http://educationnext.org/fraud-in-the-lunchroom/</link>
		<comments>http://educationnext.org/fraud-in-the-lunchroom/#comments</comments>
		<pubDate>Thu, 12 Nov 2009 09:00:13 +0000</pubDate>
		<dc:creator> </dc:creator>
				<category><![CDATA[Check the Facts]]></category>
		<category><![CDATA[Governance and Leadership]]></category>
		<category><![CDATA[On Top of the News]]></category>
		<category><![CDATA[School Spending]]></category>
		<category><![CDATA[State and Federal]]></category>
		<category><![CDATA[E-Rate program]]></category>
		<category><![CDATA[Eligibility Manual for School Meals]]></category>
		<category><![CDATA[NAEP]]></category>
		<category><![CDATA[National Assessment of Educational Progress]]></category>
		<category><![CDATA[National School Lunch Act]]></category>
		<category><![CDATA[National School Lunch Program]]></category>
		<category><![CDATA[Nation’s Report Card]]></category>
		<category><![CDATA[NSLP]]></category>
		<category><![CDATA[TANF]]></category>
		<category><![CDATA[Temporary Assistance to Needy Families]]></category>
		<category><![CDATA[Title I funds]]></category>
		<category><![CDATA[U.S. Department of Agriculture]]></category>
		<category><![CDATA[USDA]]></category>

		<guid isPermaLink="false">http://educationnext.org/?p=49631358</guid>
		<description><![CDATA[Federal school-lunch program may not be a reliable measure of poverty]]></description>
			<content:encoded><![CDATA[<p><a href="http://educationnext.org/files/20101_67_fig1.gif"><img class="alignright size-full wp-image-49631362" style="float: right;padding-top: 5px;padding-bottom: 5px;padding-left: 5px" src="http://educationnext.org/files/20101_67_fig1.gif" alt="20101_67_fig1" width="329" height="418" /></a>Fill it out and turn it in: that’s the message thousands of school districts send parents each year when they offer applications for the federal government’s National School Lunch Program (NSLP). And each year, millions of parents comply. But new data suggest that the process for verifying eligibility for the program is fundamentally broken and that taxpayers may be picking up the tab for participation by ineligible families. The NSLP, which is administered by the U.S. Department of Agriculture (USDA) at an annual cost of $8 billion, serves 31 million American children each day. The program’s goal is to help low-income students succeed in public and private school classrooms by ensuring they have adequate nutrition, a mission that is compromised if substantial funds are being spent on ineligible families or the program fails to reach the neediest students.</p>
<p>Determining the extent of program fraud and error is important, as the entitlement is associated with other streams of federal, state, and local taxpayer dollars. Eligibility data are widely used as proxies for poverty rates, thereby influencing funding for myriad government programs and informing both school district policies and policy research. For example, NSLP participation rates serve as the main criteria for the allocation of federal Title I funds to schools. Those schools with a higher percentage of students eligible for free or reduced-price lunch also receive a larger discount on the federal government’s E-Rate program, which facilitates access to telecommunications services for schools and libraries.</p>
<p>State governments dole out benefits according to free and reduced-price lunch percentages, too. The Wisconsin Department of Public Instruction, for instance, allocates $2,250 to schools for each low-income child enrolled in kindergarten through 3rd grade. The program gauges poverty using NSLP participation.</p>
<p>Because of the financial benefits, local school districts have a clear incentive to register as many students in NSLP as possible. Some districts encourage parents to fill out applications, even if they are not sure they qualify. One district in Chillicothe, Missouri, offered parents a $10 Wal-Mart gift card for turning in an application. “Even if you choose to pay for your child’s lunches and or breakfasts, each qualified application earns $1,025 per child of state money for our school district,” said Assistant Superintendent Wade Schroeder.</p>
<p>School districts often use free and reduced-price lunch percentages for student assignment and resource allocation as well. North Carolina’s largest school district, Charlotte-Mecklenburg Schools, gives schools 30 percent more funds for every student enrolled in the entitlement. Wake County Public School System, in central North Carolina, employs a costly busing strategy to foster socioeconomic diversity in the classroom, measured in part by NSLP participation. These districts and others could be basing policy on faulty numbers if the lunch program data are not a valid indicator of socioeconomic status.</p>
<p>In addition, the federal government’s evaluation programs routinely employ school lunch subsidies as a poverty indicator. The National Assessment of Educational Progress (NAEP), commonly known as the “Nation’s Report Card,” uses the scores of students eligible for the lunch program to track the performance of states in educating low-income children over time. No Child Left Behind requires that schools meet performance benchmarks for program-eligible students in order to make adequate yearly progress. Academic researchers also make use of NSLP participation data, raising the question of whether researchers could be producing skewed results if program participation is not a reliable indicator of income.</p>
<p><strong>How It Works</strong></p>
<p>Parents who apply for school lunch benefits, or for the smaller school breakfast program, report their yearly income on the application. Children living in households at or below 130 percent of the federal poverty level ($27,560 per year for a family of four) qualify for free meals at school; those in households between 131 percent and 185 percent (up to $39,220 per year for a family of four) qualify for reduced-price meals. Children can also qualify automatically based on residential status in areas of concentrated poverty or participation in other means-tested government programs, including food stamps and Temporary Assistance to Needy Families (TANF). The USDA reimburses districts for each free or discounted meal served.</p>
<p>No proof of income, such as a pay stub or W-2 form, is required when parents apply. That’s in contrast to other federal nutrition entitlements, including the food stamp program, now called the Supplemental Nutrition Assistance Program (SNAP). Normally, SNAP applicants must “file an application form, have a face-to-face interview, and provide proof (verification) of certain information, such as income and expenses.” Assuming a 180-day school year, students eligible for free lunch receive on average $462.60 per year in benefits, compared with an average of $1,152 per year in benefits for individuals receiving food stamps.</p>
<p>Each NSLP application contains a certification statement that parents or guardians are required to sign in which they promise that their reported income level is accurate. The statement warns that adults “may be prosecuted” if they “purposefully give false information,” but the threat doesn’t have teeth, as few, if any, applicants have been held accountable for cheating. It isn’t even clear which level of government—federal, state, or local—would be responsible for prosecuting fraud.</p>
<p>The only verification mechanism in place for the NSLP is outlined in the Richard B. Russell National School Lunch Act, as most recently amended by Congress in 2004. The act requires school districts to try each year to verify the incomes of 3 percent (or 3,000, whichever is less) of participants considered “error prone,” meaning households whose reported earnings are within $100 monthly or $1,200 yearly of the income eligibility limitation. School districts can also qualify for an alternate sample size of 1 percent if they meet certain requirements.</p>
<p>To verify eligibility, school officials request proof of income by mail from parents to justify the amount initially put on the application. If applicants fail to respond, it raises the possibility that they may not in fact be eligible, and officials terminate their benefits. If applicants respond with evidence that shows too high an income, officials reduce or terminate their benefits accordingly. In some cases, officials raise benefits if initial reports of income are too high.</p>
<p><strong>Fraud or Error?</strong></p>
<p>Verification summaries obtained from 10 of the nation’s largest school districts show a high proportion of those asked to provide proof of income could not or would not comply. The data are prompting some school officials to question the way the program is administered.</p>
<p>Of the 10 districts, all but 1 had a rate of reduced or repealed benefits above 70 percent for those in the verification sample for the 2007–08 school year (see sidebar). Most of those benefit reductions and repeals were due to participants’ failure to respond to the mailing, which automatically revoked their benefits. The average nonresponse rate among the 10 districts was 58 percent. Significantly, an average of only 1.5 percent of those who did respond had their benefits increased, suggesting that parents were more likely to understate than overstate their income on the forms.</p>
<div>
<p><strong>Trust, but Verify</strong></p>
<p>The Los Angeles Unified School District (LAUSD), the nation’s second-largest district with an enrollment of about 700,000 students, had the highest rate of reduced or repealed benefits (93 percent) for the 2007–08 school year. Of 3,401 program participants asked to verify their income, 2,650 (78 percent) did not respond to the verification request; 215 (6 percent) provided evidence that reduced their benefits from free or reduced-price to paid; 291 (9 percent) provided income evidence that reduced their meal benefits from free to reduced-price; 233 (7 percent) provided evidence to justify their initial report of income; and 12 (less than 1 percent) provided evidence that increased their benefits.</p>
<p>The LAUSD results were similar for the 2006–07 school year, when 2,856 (90 percent) of those asked to verify income failed to respond and 206 who did respond (6 percent) provided income information that reduced or repealed their benefits, which means that almost all families surveyed had their meal privileges reduced or revoked. In contrast, 120 respondents</p>
<p>(4 percent) saw no change in their eligibility status and just</p>
<p>6 respondents had their benefits increased.</p>
<p>The nation’s largest school district, in New York City, had nonresponse rates of 56 percent and 62 percent for the 2007–08 and 2006–07 school years, respectively. The district had reduced or repealed benefits rates that were somewhat lower than those for Los Angeles: 70 percent of the sample for the 2007–08 school year and 71 percent for the 2006–07 school year. Once again, nonresponse accounted for most of the revocations. The New York City schools serve 1.1 million students, of whom 801,596 qualified for either free or reduced-price lunch in 2006–07.</p>
<p>The Chicago Public Schools (CPS) had the lowest potential fraud rate among the 10 districts at 28 percent for 2007–08, with only 258 out of 1,655 parents (16 percent) not responding. Most (69 percent) of the participants verified their income and saw no change in eligibility status. Relative to other school districts, the nonresponse rate for the Chicago schools was quite low. It’s unclear how CPS got so many parents to respond to the verification. Requests for more information on the school district’s verification methods were not returned.</p>
</div>
<p>Smaller school districts show a similarly high rate of reduced or repealed benefits. Wake County Public Schools had a nonresponse rate of 36 percent and a total reduced or repealed rate of 64 percent for its verification sample in 2007–08. Charlotte-Mecklenburg had a nonresponse rate of 31 percent and a reduced or repealed rate of 68 percent for the same school year.</p>
<p>Child nutrition officials say even the high percentages of reduced or revoked benefits do not suggest widespread fraud because the state samples are made up of “error-prone” applicants and are not random. They argue that disparities on the applications of those who do respond to the verification request are mostly due to honest mistakes, such as rounding errors or inserting weekly rather than monthly income, which could put applicants under the income threshold unintentionally.</p>
<p>Marilyn Moody, senior director of child nutrition services for the Wake County schools, pointed to intimidation as one reason her district’s nonresponse rate was so high. “Some people fail to respond because when we send a federal form that says you must send us proof of income, it’s intimidating,” she said. “They may not be educated to the point of realizing the significance of that.”</p>
<p>But others see a deliberate attempt to cheat the system. “I don’t think there is any doubt in anyone’s mind, even though we’re pussyfooting around, that there are thousands of students here that probably are not entitled to this government benefit,” said Larry Gauvreau, school board member in Charlotte-Mecklenburg.</p>
<p>“They know at the district and school level that it generates funding for a lot of other programs,” said Lisa Snell, director of education and child welfare at the Reason Foundation, a libertarian think tank. “It may not be intentional to be fraudulent in the program, but it is an unintended consequence of the program.”</p>
<p>Other research has found evidence of potential fraud in the NSLP. A study by Mathematica Policy Research published in February 2009 found that 15 percent of students enrolled in the breakfast and lunch programs receive more benefits than they are eligible for and 7.5 percent receive less. The most common source of error was parents or guardians misreporting income on applications. Mathematica estimated the total cost for the errors at around $1 billion annually.</p>
<p>The authors of the Mathematica study used a multistage-clustered sample design, selecting 7,800 applicants and students directly certified in 87 school districts across the country. The report stopped short of advocating an overhaul, instead suggesting that policymakers find a way to get more accurate income data from households. The authors did not offer specific recommendations on how to accomplish that goal.</p>
<p>Another study, commissioned by the USDA and published by Mathematica in 2005, argued that requiring applicants to submit proof of income would hurt needy children. The study compared districts pilot testing an approach that required families to document their income on the initial applications to a comparison group of districts using the current system. Study authors Philip Gleason and John Burghardt found that the same proportion of ineligible children were certified in both sets of districts but that in districts requiring up-front documentation, “the process reduced eligible students’ access to free and reduced-price meals.”</p>
<p><strong>Food Fight</strong></p>
<p>School board members in Charlotte-Mecklenburg upset the school-lunch apple cart last year by requesting more thorough verification of student eligibility for the lunch program, which, as noted above, partly determines the funding each school receives from the district. The move touched off a heated debate and led to weeks of uncertainty as school attorneys tried to obtain a written order from the USDA on the permissibility of a comprehensive audit. The controversy also aggravated old tensions over integration and racial busing, two sore spots in the district.</p>
<p>Like many cities in the South, Charlotte has a contentious history on the issue of school segregation. After the Warren Court in 1954 declared the separate but equal doctrine unconstitutional, the city adopted a neighborhood school policy that had the effect of sending most black students to inner-city schools and most white students to suburban schools in wealthier parts of the district. The U.S. Supreme Court attempted to remedy the situation in 1971 in its <em>Swann v. Charlotte-Mecklenburg Board of Education</em> ruling. The decision paved the way for school districts to adopt busing strategies aimed at creating greater diversity in the classroom.</p>
<p>In 1997, a white parent challenged the busing policy in court after a magnet school denied his daughter admission because of her race. Two years later, a federal judge ruled that the district’s 30-year busing policy had fulfilled its purpose of racial integration and was no longer necessary. The ruling stood after an appellate court upheld the decision and the Supreme Court declined to weigh an appeal.</p>
<p>Today, Charlotte-Mecklenburg has  a community-based assignment policy, but the issue remains divisive. And questions of cheating among free lunch recipients, the majority of whom are minorities, have poured more salt into the wound.</p>
<p>In August 2008, Ken Gjertsen became the first Charlotte-Mecklenburg school board member to raise questions about the program after learning of the potential fraud rate. The issue remained on the school board’s agenda for two months, as members went back and forth about the merits of a comprehensive audit. “Poor people don’t know how to steal from the federal government. They’re not smart enough,” said school board member Vilma Leake. She characterized a comprehensive audit as a “witch hunt” aimed at poor families.</p>
<p>Others claimed the school board had a responsibility to weed out cheating. “There are thousands of people who shouldn’t be in that program. We know that. Everybody up here knows that,” said Gauvreau, who twice proposed a motion, voted down both times, that would have directed the district superintendent to verify a larger percentage of applications.</p>
<p>Efforts to authorize an audit came crashing down in September when the USDA threatened to cut off the district’s $34 million lunch-program subsidy for the 2007–08 school year if it proceeded with a full verification. School-district attorneys subsequently received a written order from the USDA saying that an audit beyond the mandated 3 percent would be illegal under federal law.</p>
<p>The National School Lunch Act does not specifically address the legality of a school district going beyond the 3,000 or 3 percent benchmark. The USDA, however, interprets the law to disallow a comprehensive verification. The 2008 version of the “Eligibility Manual for School Meals,” published by the USDA, says that school districts “must not verify more than or less than the standard sample size … and <em>must not</em> verify all (100% of) applications” (emphasis in original).</p>
<p>The guidelines do provide one narrow window for school districts to cut down on fraud. Officials can pursue verification on a case-by-case basis if they see questionable content on an application, but it appears that districts rarely take advantage of this option. Charlotte-Mecklenburg conducted no verifications for cause during the 2006–07 and 2007–08 school years. Wake County verified 2 applicants for cause in 2007–08 and fewer than 10 in 2006–07. Due to the politically sensitive nature of the NSLP, it’s likely that school nutrition officials worry that verifying too many applicants would cause blowback.</p>
<p><strong>To Verify or Not to Verify</strong></p>
<p>With a recession hitting the family pocketbook hard, more parents are turning to free school lunches for relief. Rising food costs have put a strain on school districts, too, prompting President Obama to include $100 million in additional funding for the program in his economic stimulus bill, passed by Congress in February 2009. Obama has proposed another $1 billion for school nutrition programs in his 2010 budget.</p>
<p>Many government officials are quick to tout the benefits of the NSLP, arguing that some students would go hungry if the program did not exist. In a letter signed by a bipartisan group of 40 senators in January, Sen. Tom Harkin, an Iowa Democrat and chairman of the Senate Agriculture Committee, said that child nutrition programs “play a critical role in preventing hunger and promoting healthy diets among children from birth until the end of secondary school.”</p>
<p>The political climate in Washington makes it doubtful Congress will revise the verification structure of the NSLP in the near future. The entitlement has a long history of partisan strife and is generally recognized as a political hot potato. To make matters more complicated, the program is the product of a political alliance between agriculture Republicans and metropolitan-area liberals, which means that critics are few and far between. But the possibility of waste and fraud warrants a closer look from elected officials. Because the NSLP is the nation’s second-largest food entitlement, unqualified families could be costing taxpayers billions each year. The challenge is balancing program integrity with income verification policies that might have a chilling effect on eligible families. At the very least, Congress should establish clearer guidelines for school districts to investigate suspected fraud and explore alternative income-documentation methods that would provide greater reliability for program data. Given the amount of taxpayer dollars devoted to school lunch, and the range of policies and research based on the program, lawmakers can’t afford to do nothing.</p>
<p><em>David N. Bass is an investigative reporter and associate editor with the John Locke Foundation.</em></p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=49631358&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/fraud-in-the-lunchroom/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Few States Set World-Class Standards</title>
		<link>http://educationnext.org/few-states-set-worldclass-standards/</link>
		<comments>http://educationnext.org/few-states-set-worldclass-standards/#comments</comments>
		<pubDate>Thu, 29 Oct 2009 16:00:28 +0000</pubDate>
		<dc:creator>Paul E. Peterson</dc:creator>
				<category><![CDATA[Check the Facts]]></category>
		<category><![CDATA[No Child Left Behind]]></category>
		<category><![CDATA[On Top of the News]]></category>
		<category><![CDATA[Standards, Testing, and Accountability]]></category>

		<guid isPermaLink="false">http://content.hks.harvard.edu/educationnext/?p=18845034</guid>
		<description><![CDATA[In fact, most render the notion of proficiency meaningless]]></description>
			<content:encoded><![CDATA[<p>As             the debate over the reauthorization of No Child Left Behind (NCLB)             makes its murky way through the political swamp, one thing has             become crystal clear: Though NCLB requires that virtually all             children become proficient by the year 2014, states disagree on the             level of accomplishment in math and reading a proficient child             should possess. A few states have been setting world-class             standards, but most are well off that mark—in some cases to a             laughable degree.</p>
<p>In this report, we use 2007 test-score             information to evaluate the rigor of each state’s proficiency             standards against the National Assessment of Educational Progress             (NAEP), an achievement measure that is recognized nationally and             has international credibility as well. The analysis extends             previous work (see “<a href="http://educationnext.org/johnnycanreadinsomestates/">Johnny Can Read&#8230;in Some States</a>,” <span class="italic">features</span>, Summer 2005,             and “<a href="http://educationnext.org/keeping-an-eye-on-state-standards/">Keeping an Eye on State Standards</a>,” <span class="italic">features</span>, Summer 2006)             that used 2003 and 2005 test-score data             and finds in the new data a noticeable decline, especially at the             8th-grade level. In Figure 1, we rank the rigor of state             proficiency standards using the same A to F scale teachers use to             grade students. Those that receive an A have the toughest             definitions of student proficiency, while those with an F have the             least rigorous.</p>
<p class="tocheading"><span class="bold">Measuring Standards </span></p>
<p>That states vary widely in their definitions             of student proficiency seems little short of bizarre. Agreement on             what constitutes “proficiency” would seem the essential             starting point: if students are to know what is expected of them,             teachers are to know what to teach, and parents are to have a             measuring stick for their schools. In the absence of such             agreement, it is impossible to determine how student achievement             stacks up across states and countries.</p>
<p>One national metric for performance does             exist, the National Assessment of Educational Progress. The NAEP is             a series of tests administered under the auspices of the U.S.             Department of Education’s National Center  for Education             Statistics. Known as the Nation’s Report Card, the NAEP tests             measure proficiency in reading and math among 4th and 8th graders             nationwide as well as in every state. The NAEP sets its proficiency             standard through a well-established, if complex, technical process.             Basically, it asks informed experts to judge the difficulty of each             of the items                                          in its test bank. The experts’ handiwork         received a pat on the back recently when the American Institutes for         Research (AIR) showed that NAEP’s definition of         “proficiency” was very similar to the standard used by         designers of international tests of student achievement. Proficiency         has acquired roughly the same meaning in Europe and Asia, and in the         United States—as long as the NAEP standard is employed.</p>
<p>This is not to say students are proficient             either in this country or elsewhere. According to NAEP standards,             only 31 percent of 8th graders in the United States are proficient             in mathematics. Using that same standard, just 73 percent of 8th             graders are proficient in math in the highest-achieving country,             Singapore, according to the AIR study. In other words, bringing             virtually all 8th graders in the United States up to a NAEP-like             level of proficiency in mathematics constitutes a challenge no country has ever mastered.</p>
<p class="tocheading"><span class="bold">Comparing the States </span></p>
<p>Three states—Massachusetts, South             Carolina, and Missouri—have established world-class standards             in math and reading as the goal for all students. Every other state             has established a lower proficiency standard, and some states (for             example, Georgia and Tennessee) declare most students proficient             even when their performance is miles short of the NAEP standard. By             setting widely varying standards, states render the                                         very notion of proficiency meaningless. If Billy         and Sally cannot read in South Carolina, they should not be able to         pass muster simply by crossing the state’s western border.</p>
<p>We gauge the differences among states by             comparing how students do on state assessments with how they             perform on NAEP tests. By comparing the percentage of students             deemed proficient on each, it is possible to determine whether             states are setting expectations higher, lower, or equal to the NAEP             standard. If the percentages are identical (or roughly so), then             state proficiency standards can be fairly labeled as             “world-class.” If state assessments identify many more             students as proficient than the NAEP, then state proficiency             figures should be regarded as inflated. In short, comparing state             assessment results to NAEP scores can help reveal whether states             are giving parents and voters the real scoop about where the             state’s children stack up when measured against world-class             benchmarks.</p>
<p>In Figure 1, we give Massachusetts, Missouri,             and South Carolina an A for establishing rigorous expectations             regarding what proficient students must know and be able to do.             Note that a grade of A does not indicate students are performing at             the highest level. Rather, the high grade indicates that the three             states have set a high bar for students to reach if they are to be             deemed proficient. So, for example, only 25 percent of 8th graders             in South Carolina were deemed proficient on both the state reading             test and on the NAEP reading test—an honest, if embarrassing,             reckoning of the education situation in the state.</p>
<p>The remaining 47 states (information is not yet             available for the District of Columbia) had distinctly lower             standards. Three states—Georgia, Oklahoma, and             Tennessee—expected so little of students that they received             the grade of F. The state of Georgia, for instance, declared 88             percent of 8th graders proficient in reading, even though just 26             percent scored at or above the proficiency level on the NAEP.             According to our calculations, Georgia 8th-grade reading standards             are 4.0 standard deviations below those in South Carolina, an             extraordinarily large difference. Thus, while students in Georgia             and South Carolina perform at similar levels on the NAEP, the             casual observer would be misled by Georgia’s reporting that             its students achieve proficiency at three times the rate that South             Carolina’s students do.</p>
<p>Twelve states—Alabama, Alaska, Idaho,             Illinois, Michigan, Mississippi, Nebraska, North Carolina, Texas,             Utah, Virginia, and West Virginia—received Ds because they             had pitched their expectations far below other states. Illinois set             its proficiency bar for 8th-grade reading at a level that is 1.01             standard deviations below the national average. If you believe             those who set the Illinois standards, 82 percent of its 8th graders             are proficient in reading, even though the NAEP says only 30             percent are.</p>
<p>In general, the states of the Northeast have             the highest standards, while the states of the South and Midwest have the lowest. Western states fall in between.</p>
<div><img src="http://educationnext.org/files/ednext_20083_70_fig1.gif" border="0" alt="Figure 1." align="middle" /></div>
<p class="tocheading"><span class="bold">A Downward Trend </span></p>
<p>There is some evidence of slippage in standards             since our original report card was published in 2005 (see Figure             2). In 8th-grade reading, for example, standards overall are down             by 0.2 standard deviations. This means that, in 8th-grade reading,             states are reporting a substantial improvement that is not evident             on the NAEP. The smallest amount of slippage was in 4th-grade math,             where standards fell by 0.06 standard deviations. Most of the             slippage at the 4th-grade level is due to the lower standards             adopted by those states that were initially slow in complying with             the NCLB accountability system; those that have had standards since             2003 have not altered them significantly. But at the 8th-grade             level, standards are falling across the board—in both reading             and math, and among both the states that had standards in 2003 and             the states that have only adopted them more recently.</p>
<div><img src="http://educationnext.org/files/ednext_20083_70_fig2.gif" border="0" alt="Figure 2." align="middle" /></div>
<p>We also see slight convergence among the             states. For example, the variation in 4th-grade math standards             narrowed 0.11 standard deviations between 2003 and 2007. The good             news is that differences among state standards are shrinking; the             bad news is that states are converging downward, not upward.</p>
<p>By and large, the changes that are taking             place in individual states are fairly small, perhaps so they do not             stir controversy. A few states, though, have made big adjustments             since 2003. Colorado and Texas have raised their proficiency bars             enough to warrant a grade one letter better than the one given             initially. Five states—Arizona, Illinois, Maine, Michigan,             and Wyoming—have lowered the bar enough that their grades             have dropped by a full letter.</p>
<table border="0" cellspacing="0" cellpadding="5" bgcolor="#f7e4da">
<tbody>
<tr>
<td><span class="bold">Grading Procedure </span></p>
<p>In 2003, 2005, and 2007, both state and NAEP tests     were given in math and reading for 4th- and 8th-grade students. The grades     reported here are based on the comparison of state and NAEP proficiency     scores in 2007, and changes for each are calculated relative to 2003. For     each available test, we computed the difference between the percentage of     students who were proficient on the NAEP and the percentage reported to be     proficient on the state’s own tests for the same year. We also     computed the standard deviation for this difference. We then determined how     many standard deviations each state’s difference was above or below     the average difference on each test. The scale for the grades was set so     that if grades had been randomly assigned, 10 percent of the states would     earn As, 20 percent Bs, 40 percent Cs, 20 percent Ds, and 10 percent Fs.     The grade given each state is based on how much easier it was to be labeled     proficient on the state assessment compared with the NAEP. For example, on     the 4th-grade math test in 2007, South Carolina reported that 41.4 percent     of its students had achieved proficiency, but 35.9 percent were proficient     on the NAEP. The difference (41.4 percent — 35.9 percent = 5.5     percent) is about 1.6 standard deviations better than the average     difference between the state test and the NAEP, which is 32 percent. This     was good enough for South Carolina to earn an A for its standards in     4th-grade math. The overall grade for each state was determined by taking     the average for the standard deviations on the tests for which the state     reported proficiency percentages.</td>
</tr>
</tbody>
</table>
<p>Two years ago, we could see small evidence for             a decline in standards but detected no race to the bottom. That is             still true for 4th graders. But 8th-grade standards, if not exactly             racing downward, are moving steadily away from world-class             standards. Those responsible for NCLB reauthorization, as they             struggle forward, should first and foremost establish a clear and             consistent definition of grade-level proficiency in reading and             math, even if it means giving up the cherished but decidedly             unrealistic goal of proficiency for all students by 2014.</p>
<p><span class="italic">Paul E. Peterson and Frederick M. Hess are             editors of</span><span class="italic"> </span>Education             Next<span class="italic">. </span></p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=18845034&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/few-states-set-worldclass-standards/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>The International PISA Test</title>
		<link>http://educationnext.org/the-international-pisa-test/</link>
		<comments>http://educationnext.org/the-international-pisa-test/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 13:00:13 +0000</pubDate>
		<dc:creator>Mark Schneider</dc:creator>
				<category><![CDATA[Check the Facts]]></category>
		<category><![CDATA[On Top of the News]]></category>
		<category><![CDATA[Standards, Testing, and Accountability]]></category>

		<guid isPermaLink="false">http://content.hks.harvard.edu/educationnext/?p=49626490</guid>
		<description><![CDATA[States should think twice before paying for more testing. There are easier ways to compare students to their global peers.]]></description>
			<content:encoded><![CDATA[<p><img style="float: right;margin-left: 10px" src="http://educationnext.org/files/pisa-illustration.jpg" alt="pisa-illustration" width="450" height="577" />Recent months have brought an ever-louder drumbeat in support of state-level participation in the Programme for International Student Assessment (PISA), with a weaker chorus calling for states to participate in the Trends in International Mathematics and Science Study (TIMSS). What would be gained if, in addition to the nation as a whole, individual states were to participate directly in these assessments by testing a much larger and more representative sample of students?<span id="more-49626490"></span> Not as much as many advocates would have us believe, and probably not enough to justify the considerable cost. Despite the growing infatuation with international comparisons of student performance and the illuminating feedback they can provide on how young Americans are doing relative to students in other countries, current international assessments cannot generate a great deal of reliable policy advice. In other words, they’re better at showing how our children’s academic performance (in certain subjects) compares with that of their overseas agemates than at guiding us toward stronger U.S. schools.</p>
<p><img style="float: left;margin-right: 10px" src="http://educationnext.org/files/pisa-students.jpg" alt="pisa-students" width="450" height="322" />If states choose to spend a fortune on such tests, it’s because their leaders believe, in the words of the National Governors Association (NGA), that doing so will lead to “policy solutions to U.S. education system shortcomings.” This is the first of two justifications given for these efforts, which are widely referred to as “international benchmarking.” The second purpose is the straightforward comparison of U.S. students and their peers in other countries. Here, the resulting “league tables” are the center of attention, and these results are often described, and almost as often decried, as a “horse race” or an “Olympics contest” with an attendant fixation on who’s winning and who’s losing. Absent is an acknowledgment that these assessments embody particular assumptions and beliefs about what should be measured.</p>
<p>As states consider funding and administering international assessments for themselves, such facts must be kept in mind and key questions addressed. Among them are, what will states get as a result of participation? How much will it cost? Are there easier and cheaper ways of obtaining the desired information? We begin with a comparison of PISA and TIMSS, the two most prominent international assessments (see sidebar for additional information).</p>
<p><strong>PISA v. TIMSS</strong><br />
PISA is a self-proclaimed “yield study” assessing the “literacy” of 15-year-olds and is not tied to any specific curricula. PISA claims to be assessing the skills young adults will need in the emerging global economy. Never modest, the report releasing the 2003 PISA results was titled “Learning for Tomorrow’s World” and the 2006 PISA results “Science Competencies for Tomorrow’s World.”</p>
<div>
<h1><strong>A Quick Guide to International Student Assessments</strong></h1>
<p><em>The three main programs are known widely by their acronyms:</em></p>
<li>PIRLS (Progress in International Reading Literacy Study) is an assessment of 4th-grade reading administered every five years under the auspices of the International Association for the Evaluation of Educational Achievement (IEA). Fifty-five countries are expected to participate in 2011.<a href="http://nces.ed.gov/surveys/pirls/">http://nces.ed.gov/surveys/pirls/</a></li>
<li>TIMSS (Trends in International Mathematics and Science Study) is IEA’s assessment of student achievement in 4th- and 8th-grade science and math and is conducted every four years. Some 67 countries administer TIMSS.<a href="http://nces.ed.gov/timss/">http://nces.ed.gov/timss/</a></li>
<li>PISA (Programme for International Student Assessment) is an evaluation of reading, math, and science “literacy” among 15-year-olds. The Organisation for Economic Co-operation and Development (OECD) conducts PISA every three years, emphasizing one of the subjects on a revolving basis. The emphasis in 2006 was science; in 2009 it will be reading. Participation in PISA has grown from 43 countries in 2000 to an expected 65 countries in 2009.<a href="http://nces.ed.gov/Surveys/PISA/">http://nces.ed.gov/Surveys/PISA/</a></li>
</div>
<p>TIMSS is more grade- and curriculum-centered and far more modest in its claims. As described in the release of the 2007 data, “TIMSS is designed to align broadly with mathematics and science curricula in the participating countries. The results, therefore, suggest the degree to which students have learned mathematics and science concepts and skills likely to have been taught in school.”</p>
<p>PISA has not hesitated in making the most of these differences, and the high visibility of the Organisation for Economic Co-operation and Development (OECD) has ensured that PISA is the most prominent of the international assessments. Indeed, while in public the NGA is careful to say that it is in favor of international benchmarking in general and not in favor of any particular test, in practice the organization has tended to favor PISA. For example, a background paper the NGA released in February 2008 concludes by promising that “the National Governors Association will work with states interested in state-level PISA administration to identify available resources and cost-savings options to support state participation.” (In this three-page memo, TIMSS rated just one sentence.)</p>
<p>More recently, the NGA, the Council of Chief State School Officers, and Achieve released a report titled “Benchmarking for Success: Ensuring U.S. Students Receive a World-Class Education,” calling for state scores on international assessments. PISA was mentioned far more frequently than TIMSS, and the head of the PISA project at OECD, Andreas Schleicher, was repeatedly quoted.</p>
<p>The Alliance for Excellent Education has made an even stronger appeal for state participation in PISA, calling our failure to participate at that level “myopic.”</p>
<p>Most state interest is in the science and math assessments of PISA and TIMSS. In 1995, TIMSS participants included five states and one consortium of school districts. In 1999, just before PISA was launched, 13 states plus 14 consortia of school districts participated in TIMSS. Only two states, Massachusetts and Minnesota, funded their own participation in the 2007 TIMSS. No U.S. states have independently participated in PISA or PIRLS thus far.</p>
<p>The results of the three assessments are often compared to those of the National Assessment of Educational Progress (NAEP), which in the United States currently dwarfs the international assessments in scope. In 2007, the NAEP 8th-grade math exams involved more than 150,000 students in approximately 7,000 schools. The 8th-grade 2007 TIMSS assessed some 7,400 U.S. students in fewer than 250 schools. PISA in 2006 involved only 5,600 15-year-olds in around 170 schools.</p>
<p>The difference in size of these assessments is important: in reading and math in 4th and 8th grade, NAEP can report state-by-state performance every two years and, through the Trial Urban District Assessment program, NAEP can present data on a growing number of large school districts. In contrast, the small size of the PISA and TIMSS samples in most states makes state-level reporting difficult, though not impossible.</p>
<p><img style="float: right;margin-left: 20px;margin-bottom: 20px" src="http://educationnext.org/files/pisa-test.png" alt="pisa-test" width="634" height="452" /></p>
<p><strong>Statistical Benchmarking</strong><br />
One relatively inexpensive way in which state performance can be gauged against international measures is through statistical linking. Figure 1 shows TIMSS scores for 8th-grade math for seven states plus the District of Columbia and an equal number of countries. This figure is based on work by Gary Phillips of the American Institutes for Research, who has estimated 2007 TIMSS scores for all states by placing TIMSS and NAEP on the same scale. Such statistical linking can be more easily done with TIMSS than with PISA because TIMSS is administered at the same grade levels as NAEP, and their purposes and frameworks are similar.<br />
Figure 1 presents patterns more precisely than we might infer from NAEP or TIMSS alone. For example, we know from the 2007 NAEP math report that Massachusetts and Minnesota are high-performing states, that South Carolina and Missouri perform around the national average, that Mississippi and Alabama are low-performing states, and that the District of Columbia, is the lowest-performing “state” in the nation. The 2007 TIMSS ranks countries, ranging from the highest-performing, including Chinese Taipei (Taiwan) and the Republic of Korea, through the middle ranks, including the United States, and going down to the lowest-performing countries, including Turkey (the lowest shown in Figure 1), Syria, Egypt, Algeria, and Colombia.</p>
<p>By combining these two sets of results, we know that even the best-performing American states do not score nearly as high as Chinese Taipei or Korea, that the average-performing American states are about on par with England, the Russian Federation, and Lithuania, and that the District of Columbia’s performance is more comparable to those of Thailand and Turkey.</p>
<p>While the Phillips method creates this interwoven list at minimal cost (because he is estimating state scores from already extant data), obtaining actual state-level scores is very expensive. How much a more precise list would be worth is an issue that states must consider.</p>
<p><strong>Generating Policy Advice</strong><br />
<img style="float: right;margin-left: 10px" src="http://educationnext.org/files/pisa-student.jpg" alt="pisa-student" width="450" height="304" />Figure 1 reflects the “how are we doing” aspect of benchmarking. But what policy advice can be garnered from these international studies? The quick answer is not as much as many would have you believe. This is particularly true of PISA, which is very aggressive in drawing policy implications out of the data. Reflecting the mission of the OECD, which combines both statistical and policy functions, the very organizational structure of PISA combines the collecting and release of statistical data with policy advice in a single unit. In contrast, in the United States, strict Office of Management and Budget guidelines separate the release of federal statistical reports by time and place from any policy statements.</p>
<p>The pressure from policymakers for advice based on PISA interacts with this unhealthy mix of policy and technical people. The technical experts make sure that the appropriate caveats are noted, but the warnings are all too often ignored by the needs of the policy arm of PISA. As a result, PISA reports often list the known problems with the data, but then the policy advice flows as though those problems didn’t exist. Consequently, some have argued that PISA has become a vehicle for policy advocacy in which advice is built on flimsy data and flawed analysis. Much of the critical work has come from Germany, where PISA has had a profound effect. There has also been debate in Finland, in part because students do well in PISA but did relatively poorly on TIMSS on 1999.</p>
<p>Among the limitations are the fact that both PISA and TIMSS produce cross-sectional (single point in time) data and therefore do not allow longitudinal analysis (comparisons over time) at the student level, which is how researchers prefer to measure growth in student achievement and to identify the factors associated with such changes. But too often, countries and education ministers demand information and don’t want to know how cross-sectional data using flawed measures can lead to bad advice or advice that seems “right” but has little or no basis in fact.</p>
<p>To be sure, international data can in some cases be useful for addressing some policy questions, and often these data are the only way to examine the consequences of differences in policy that vary across countries. But the obstacles to drawing strong causal inferences based on such data are substantial. While both the International Association for the Evaluation of Educational Achievement (IEA) and the OECD work hard to make sure that measures are comparable, there are large variations in how people in different countries understand similar questions, and how statistical agencies measure and report indicators. This is especially true for the United States, where, as a federal system of continental size in which education is still predominantly a state and local function, variance is often more important than the mean national value calculated by many international studies. Unfortunately, both TIMSS and PISA data are used, too often with little regard for their limitations, to formulate policy recommendations.</p>
<p>It should be noted that NAEP has taken a different approach to releasing its results, one that PISA has rejected. While early NAEP reports resembled the current voluminous PISA reports, over time, the Nation’s Report Cards have become shorter and more tightly focused on data that show cross-sectional results by state and by reporting categories called for in U.S. legislation (for example, race or ethnicity and income). NAEP reports also compare the performance of students at the same grade level over time; these comparisons have become a critical tool for measuring the nation’s progress. In short, these are highly focused and valuable benchmarking reports. Further, a typical NAEP release is far more accessible to policymakers, reporters, and the general public than PISA reports are. And NAEP reports do not contain policy implications or recommendations: the dividing line between statistical reporting and policy prescription is honored.</p>
<p>Within guidelines designed to protect confidentiality, data from all three assessments are made widely available both online and through other avenues of dissemination, allowing researchers to explore statistical patterns and the policy implications of the data. However, with PISA, the OECD megaphone attracts the attention of policymakers and drowns out alternate interpretations.</p>
<p><strong>Accountability Problem</strong><br />
OECD is a high-level intergovernmental organization with wide sweep across many domains, including finance, trade, environment, agriculture, technology, and taxation as well as education. Its pronouncements have a gravitas that cannot be matched by the IEA. Just as OECD does not have a modest presence in describing the scope of PISA, its voice in pushing the policy lessons that can be “learned” from PISA has been equally loud. But that loud voice has a few cracks in it.</p>
<p>For example, consider Chapter 5 of the 2006 PISA report, “Science Competencies for Tomorrow’s World,” which is devoted to analyzing the effects of school resources on student performance. Clearly, the stakes are high in identifying the best ways to allocate money, teachers, school leadership, and the like. Box 5.1 on page 215 notes all the appropriate caveats: PISA is cross-sectional; some things are not measured well; and other important factors are unmeasured. Moreover, it notes that the school characteristics measured are from the student’s current school, and in many countries a 15-year-old might have been in that school for only a year or two. The report warns that “these restrictions limit the ability of PISA to provide direct statistical estimates of the effects of school resources on educational outcomes.”</p>
<p>Nonetheless, the chapter proceeds with dozens of charts and tables relating different school resources to student outcomes. Finally, the chapter ends with “policy implications,” even though the foundations for these implications are weak. For example, the report reads, “what is noticeable about the strongest effects measured in this chapter is that they are not the ones most closely associated with finite material resources, such as the distribution of good teachers. Rather, such effects are related to how schools and the school system are run—for example, the amount of time that students spend in class and the extent to which schools are accountable for results.”</p>
<p>This is powerful stuff: hold schools accountable, and other issues (such as the allocation of good teachers) don’t matter. What is the evidence for PISA’s confident endorsement of accountability?</p>
<p>On close examination, “accountability” turns out to mean merely the public posting of school-level results. No other measures of accountability were tied to PISA scores at a level of statistical significance once students’ socioeconomic status was introduced. But the report uses the highly evocative word “accountable” rather than “posting.” The reader can easily have the mistaken impression that a broad set of conditions has been found to be important.</p>
<p>Standards of evidence and the research practices that are found in this analysis would not pass muster in the equivalent statistical agencies and among most researchers in the United States. Indeed, the National Center for Education Statistics (NCES) frequently objects to many of the findings in PISA reports, but is almost as often politely (and sometimes not politely) ignored.</p>
<p><strong>Questions Remain</strong><br />
Despite these limits, the chorus for state-level assessments will continue to grow louder. But how would state-level international assessments fit into the already complex world of large-scale assessments? In the 2006 PISA assessment, the U.S. barely made the minimal school participation rate to be included in the analysis. If governors and chief state school officers are behind a state administration of PISA or TIMSS, getting school participation may be easier, but student engagement in low-stakes tests declines as students get older—likely a situation that even gubernatorial exhortations cannot resolve.</p>
<p>If it is true that what gets tested gets taught, are governors and chief state school officers really willing to allow OECD or IEA to drive their curricula? In the rush to employ rigorous international standards, we need to keep in mind that TIMSS and PISA each embody particular views about what those standards should be.</p>
<p>Can PISA really inform a state’s policymakers about how to improve their school system? PISA assesses mathematical and science “literacy,” a broad domain encompassing skills and knowledge learned both inside and outside of school. PISA 2006 cautions, “If a country’s scale scores in reading, scientific or mathematical literacy are significantly higher than those in another country, it cannot automatically be inferred that the schools or particular parts of the education system in the first country are more effective than those in the second. However, one can legitimately conclude that the cumulative impact of learning experiences in the first country, starting in early childhood and up to the age of 15 and embracing experiences both in school and at home, have resulted in higher outcomes in the literacy domains that PISA measures.” In other words, PISA itself admits that it cannot reliably identify which parts of the education pipeline are working well and which need improvement.</p>
<p>How expensive are these tests, and are there less costly ways to get these or similar data? To get a state-level score, about 1,500 students in the state need to be tested. The cost to test that number of students is around $700,000 for PISA and, based on the experience of Massachusetts, approximately $500,000 per grade for TIMSS. Nationwide, that’s around $25 million per PISA assessment and $15 million per grade in TIMSS if all states participated. There are less expensive alternatives, such as the TIMSS estimates for the states generated by Gary Phillips and noted earlier. While the alternatives would not produce all the details that might come from the full assessment (and mercifully avoid the temptation to use these data for unwarranted policy analysis), they could produce reliable estimates of state performance relative to international performance. Phillips has recently worked out the technical details for obtaining international benchmarks with PISA, by embedding PISA items within the state assessment. The PISA items on the state assessment are then used to link the state assessment to the PISA scale. Once this is done, the state can benchmark its performance against PISA without having students take the entire PISA assessment.</p>
<p>An alternative to statistical linking for PISA would be “small area estimation.” These estimates are model-based and “borrow” information from other data available for the state together with any state-level PISA data collected. The results are known as “indirect” projections to distinguish them from standard or “direct” estimates. NCES produced such state and country estimates of adults with low literacy based on the National Assessment of Adult Literacy (NAAL), for example. Further research would be needed to determine the feasibility of conducting small area estimates to generate state-level PISA scores. And it is likely that the PISA sample would have to be increased somewhat. This would add costs, but far less than full state-level assessments.</p>
<p><strong>Double-Edged Sword</strong><br />
<img style="float: right;margin-left: 10px" src="http://educationnext.org/files/pisa-students-3.jpg" alt="pisa-students-3" width="450" height="674" />Compared to students in most other OECD countries, American students do not perform well on PISA. In the aftermath of the dismal results from the 2006 PISA, the clamor for producing state-level PISA scores was thunderous. State interest in administering PISA has since receded, with the current financial crisis and the fading of the fanfare surrounding PISA 2006, but the economy will eventually turn around, PISA 2009 is in the offing, and the Obama administration and many governors are sympathetic to the idea of international benchmarking.</p>
<p>While TIMSS has its partisans and states have chosen to participate in past years, momentum is behind PISA. If more states do choose to participate in PISA, what should they expect? First, they would get a PISA score that would allow them to compare themselves to other PISA participants. In some cases, this would provide bragging rights (“Our students are better than those in Turkey”). In most states, disappointing results would provide reform-minded governors with ammunition to push for their own legislative agenda. But if states choose a full state assessment, along with the PISA scale score would come all of the OECD’s policy advice and its approach to standards, which might make it harder for reform-minded governors to choose the options they prefer. Caveat emptor.</p>
<p><em>Mark Schneider is a vice president at American Institutes for Research and former commissioner of the National Center for Education Statistics.</em></p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=49626490&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/the-international-pisa-test/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>The Odd Couple</title>
		<link>http://educationnext.org/the-odd-couple/</link>
		<comments>http://educationnext.org/the-odd-couple/#comments</comments>
		<pubDate>Fri, 17 Aug 2007 20:27:27 +0000</pubDate>
		<dc:creator>Jay P. Greene</dc:creator>
				<category><![CDATA[Check the Facts]]></category>

		<guid isPermaLink="false">http://content.hks.harvard.edu/educationnext/?p=9223746</guid>
		<description><![CDATA[Murray and Rothstein find some unexpected common ground]]></description>
			<content:encoded><![CDATA[<p><img src="http://educationnext.org/files/ednext_20074_75_opener.gif" border="0" alt="" align="right" /><span class="italic">Checked: </span><span class="italic">Charles Murray, “Intelligence in             the Classroom,” </span>Wall Street             Journal<span class="italic">, January 16, 2007; </span> <span class="italic">Richard Rothstein, </span>Class and Schools: Using Social, Economic, and Educational             Reform to Close the Black-White Achievement Gap<span class="italic"> (Economic Policy Institute, 2004)</span><span class="italic"> </span></p>
<p><span class="bold">Checked by Jay P. Greene</span></p>
<p>Welfare             critic and American Enterprise             Institute fellow Charles Murray and former union organizer and <span class="italic">New York Times </span>columnist             Richard Rothstein don’t usually have much in common. But one             thing on which they agree is that there is little that schools can             do to improve educational achievement, particularly for poor and             minority students. Both Murray and Rothstein contend that schools             face severe constraints that hinder their ability to alter student             outcomes. The net effect of their arguments is to provide aid and             comfort to those who would resign themselves to the educational             status quo and explain away the school system’s shortcomings.</p>
<p class="tocheading"><span class="bold">The Argument against Reform </span></p>
<p>While both Murray and Rothstein argue that             schools are operating under severe constraints, they disagree about             what those constraints are. According to Murray, in a recent             commentary in the <span class="italic">Wall Street Journal</span> and his controversial book, <span class="italic">The </span><span class="italic">Bell Curve</span>, the major factor hindering school improvement is the cognitive                                          potential of students. No matter how hard they         try, Murray argues, schools cannot get students to achieve more than         their intelligence will allow. As he puts it, “Our ability to         improve the academic accomplishment of students in the lower half of         the distribution of intelligence is severely limited. It is a matter of         ceilings.”</p>
<p>According to Rothstein, in his book <span class="italic">Class and Schools</span>,             the major factor hindering school improvement is poverty and its             attendant social ills. Rothstein argues that “the influence             of social class characteristics is probably so powerful that             schools cannot overcome it, no matter how well trained are their             teachers and no matter how well designed are                                          their instructional programs and climates.”         It is only a slight exaggeration to say that Rothstein views demography         as destiny, at least in the aggregate. In his words, “No matter         how competent the teacher, the academic achievement of lower-class         children will, on average, almost inevitably be less than that of         middle-class children.”</p>
<p>Given their convictions about the severity of             the constraints facing schools, both Murray and Rothstein have a             defeatist attitude about school reform efforts. Murray warns             against false hope: “Some say that the public schools are so             awful that there is huge room for improvement in academic             performance just by improving education. There are two problems             with that position.” The first problem, he suggests, is that             the high percentage of students performing below the basic standard             on the National Assessment for Educational Progress (NAEP) may not             be inconsistent with the upper bounds of achievement, given the             cognitive constraints of students. “The second             problem,” Murray continues, “with the argument that             education can be vastly improved is the false assumption that             educators already know how to educate everyone and that they just             need to try harder—the assumption that prompted No Child Left             Behind. We have never known how to educate everyone.” Accept             the facts, he urges, as little can be expected from school reform.</p>
<p>Rothstein is similarly gloomy about the             prospects for school improvement. He admits that some reforms may             be well designed and have limited success, “but a careful             examination of each claim that a particular school or practice has             closed the race or social class achievement gap shows that the             claim is unfounded.” In most cases, he argues, claims of             effective reforms are based on either a misanalysis of test scores             or the selection of advantaged students into reform programs. His             thesis is that school reform by itself can hardly make a dent in             the achievement of low-income and minority students.</p>
<p>The argument that schools face constraints,             whether cognitive or social, that significantly hinder progress has             some superficial plausibility to it. We can all understand limits             and the futility of trying to exceed them. If, for example, there             is a constraint on the human life span, then efforts to extend life             expectancy beyond that duration would obviously be unproductive.             But before we give up on investing in medical research and             improving our health, we might want to be convinced that we are             already approaching the limits of how long humans can live. We             wouldn’t want to be deterred from making improvements unless             we believed that we had reached the point where limitations made advances virtually impossible.</p>
<p class="tocheading"><span class="bold">Evidence on Constraints </span></p>
<p>What evidence do Murray and Rothstein provide             that we are already at the upper bounds of what schools can do? Not             very much. Murray points to the fact that national gains in             educational achievement, particularly for those beginning on the             lower end of the distribution, have been very hard to come by in             the past few decades: “If we confine the discussion to             children in the lower half of the intelligence distribution             (education of the gifted is another story), the overall trend of             the 20th century was one of slow, hard-won improvement.” If             educational progress has stalled, Murray suggests, it must be a             sign that we are bumping up against the cognitive limits of             students.</p>
<p>This argument is not very compelling. The             stalled growth in educational achievement could be the result of             diminishing returns on reform efforts, as Murray suggests. But the             stall could also have been caused by a failure to adopt new,             effective reform strategies. We wouldn’t want to give up on             trying to improve the school system in the absence of convincing             evidence that no more gains could be wrought.</p>
<p>Similarly, observers of Russia would note that             gains in average life expectancy have been very hard to come by in             recent decades. In fact, the average age of Russians at death has             declined in recent years. But it would be completely wrong to             conclude from this that no reforms could be pursued to improve life             expectancy among the Russian populace. Simply observing a delay in             progress and pointing out that there is a limit to the human life             span (it is a matter of ceilings, you know) would blind one to the             obvious reforms that the nation could adopt that might improve life             expectancy, such as reducing rampant alcoholism, bringing its AIDS             epidemic under control, and cleaning up its environmental messes.             Murray’s reasoning is no more persuasive in advocating             against improving Russian public health than it is against             bettering American public schools.</p>
<p><img src="http://educationnext.org/files/ednext_20074_75_figure1.gif" border="0" alt="" align="right" />Of course, an astute observer might also note             that Russian life expectancy is considerably lower than in other             countries. Unless Russians are genetically cursed with a shorter             life span, shouldn’t the fact that people can live longer in other countries prove that Russian life             expectancy is not hitting the limit? The same could be said of             American education (see Figure 1). U.S. students perform             significantly worse than students in many developed countries,             according to the Program for International Student Assessment             (PISA), Trends in International Mathematics and Science Study             (TIMSS), and other international comparisons, trailing the leaders             by more than half of a standard deviation. Shouldn’t that             prove that U.S. achievement has considerable room to improve before             it hits the ceiling of cognitive constraints?</p>
<p>Unless we believe that (as Murray suggests in <span class="italic">The Bell Curve</span>)             Singapore, Korea, Japan, and the Benelux countries demonstrate             higher achievement because they are genetically blessed with higher             IQs, then the existence of higher and still increasing achievement             elsewhere in the world suggests considerable potential for school             reform (see Figure 2). And we can’t attribute the success of             other developed countries simply to higher performance of their             best students. Even their lower-achieving students outperform ours.             The fact that the entire distribution of students in other             developed countries outperforms the entire distribution of U.S.             students suggests that there is a difference in the effectiveness             of school systems across countries that school reform could remedy.</p>
<p>Rothstein’s tack differs from             Murray’s in that Rothstein tries to provide evidence of the             limited potential of school reform by debunking specific claims of             successful efforts. For example, Rothstein contends that KIPP             (Knowledge Is Power Program), a network of charter schools that             have produced impressive results with disadvantaged students,             isn’t as effective as it seems. KIPP’s success, he             argues, is largely attributable to the selection of more advantaged             students into their schools, the investment of significantly             greater resources, and the unusual motivation of their staff. These             sources of KIPP’s success, he suggests, cannot be replicated             on a larger scale, so imitating or expanding KIPP cannot             meaningfully reform the school system as a whole.</p>
<p>He makes similar arguments about how efforts             to improve teacher quality, instructional approaches like Success             for All, and high-expectation techniques practiced by educators             like Jaime Escalante and Rafe Esquith are not promising models for             reform because their success is due to the selection of students or             other factors that cannot be replicated on a broader scale. By             undercutting these reform strategies and presenting evidence on the             powerful influence of social class on student achievement,             Rothstein hopes to convince us that we can expect little from             focusing on reform within the school system.</p>
<p><img style="margin-right: 96px" src="http://educationnext.org/files/ednext_20074_75_figure2.gif" alt="" width="594" height="433" /></p>
<p>Leaving aside the merits of Rothstein’s             critique of these specific reforms, the general problem with             Rothstein’s argument is that he attempts to demonstrate the             limited potential of all school reforms by attacking a handful of             them. So he picks a few prominent reform models that have not             demonstrated large gains using the most rigorous evaluation             techniques. Mind you, he doesn’t             produce any evidence to demonstrate that these reforms are             ineffective; he only raises plausible doubts given their lack of             convincing proof.</p>
<p>But what about reforms that have produced             significant gains for students, according to evaluations adhering             to the highest social science standards? He doesn’t address             those. Studies have evaluated several reforms using             random-assignment research designs, also used in most medical             experiments, in which subjects are randomly assigned to treatment             and control groups. Random assignment helps eliminate concerns that             program outcomes result from the selection of students, to which             Rothstein attributes the apparent success of many reform claims.</p>
<p>Expanding school choice has been shown to             improve achievement for minority students by about one-third of a             standard deviation after a few years of intervention, according to             seven of eight random-assignment evaluations (the eighth showed             positive but statistically insignificant effects). The famous             Tennessee STAR random assignment study evaluated class size             reduction, which also produced about a one-third of a standard             deviation improvement in achievement for minority students. The What Works Clearinghouse, which the U.S.             Department of Education operates, lists more than a dozen school             interventions that have shown significant effectiveness in rigorous             evaluations, several of which used random assignment.</p>
<p>We don’t just have evidence of effective             school reforms from well-studied pilot programs; we also have such             evidence from large-scale initiatives that have produced             improvements for low-income and minority students. For example,             when Massachusetts began to require passage of a 10th-grade exam             for a regular diploma, the percentage of African American and Hispanic students passing more than doubled. When             Florida threatened to offer vouchers to students at chronically             failing public schools, those schools made significant gains. These             achievements in Massachusetts and Florida aren’t simply             improvements on state tests, which could be subject to             manipulation, but are confirmed by progress on national tests             administered in those states.</p>
<p>We may not all agree on which reforms have             been proven effective, but we could all agree that at least some of             these reforms, perhaps used in combination, could make a large             difference in the academic achievement of low-income and minority             students. To dismiss the potential of significant change through             school reform is to dismiss a large collection of rigorous social             science and the bulk of experience.</p>
<p>Rothstein may feel justified in downplaying             hopes for school reform because of the evidence he presents on the             large influence social class has on academic achievement. He is             entirely correct in observing the strong evidence showing that             family income, parental education, family composition, housing             stability, and other social factors have a substantial effect on             student achievement. But it does not follow that we should             therefore hold little hope for results from school reform.</p>
<p>Similarly, it is well established that health             behaviors, such as diet, exercise, smoking, and drinking, have a             very large influence on health outcomes. But almost no one uses             this evidence to reduce our expectations for health improvements             from medical interventions. We still (rightly) believe that doctors             can and should make a difference in our health. Why shouldn’t             we expect the same from teachers, even if we acknowledge the strong influence of factors outside their control?</p>
<p class="tocheading"><span class="bold">Comments in Context </span></p>
<p>Murray and Rothstein are likely to contend             that I have made a caricature of their views. They can point to             portions of their writings that affirm their commitment to school             reform. For example, Murray writes, “This is not to say that             American public schools cannot be improved. Many of them,             especially in large cities, are dreadful.” And Rothstein             observes, “Readers should not misinterpret this emphasis as             implying that better schools are not important, or that school             improvement will not make a contribution to narrowing the             achievement gap.”</p>
<p>But these are throwaway lines, completely at             odds with the clear, overall thrust of their arguments. For             example, Murray immediately follows his acknowledgment that there             might be some potential in school reform with a stern warning not             to expect much: “But even the best schools under the best             conditions cannot repeal the limits on achievement set by limits on             intelligence.”</p>
<p>Rothstein makes more concessions to the             possibility of progress in schools, but his pessimism about reform             is also patently obvious. Although his book contains a chapter of             suggested reforms that might help close the achievement gap,             reforms within schools are noticeably absent. The chapter does             include the need to integrate schools, which is not presented as a             reform in how schools operate but as a change in the societal             composition of schools. The chapter also advocates addressing             income inequality, providing stable housing, expanding access to             health clinics, and strengthening preschool, afterschool, and             summer school programs. The only in-school reform that Rothstein             mentions is the need to stop invoking slogans like “no             excuses” to raise expectations for results because it             undermines teacher morale. He adds, “‘no excuses’             slogans provide ideological respectability for those wanting to             hold schools accountable for inevitable failure.” With             phrases like “inevitable failure” and a complete focus             on out-of-school reforms, it is clear that Rothstein believes we can expect little from school reform.</p>
<p class="tocheading"><span class="bold">Politics of Inaction </span></p>
<p>Both Murray and Rothstein have large             constituencies for their views. Murray’s following is less             visible, largely because his views on these matters lost             respectability in polite company after the publication of <span class="italic">The Bell Curve</span>. But             don’t let his low profile on this issue fool you into             believing that there aren’t a significant number of             influential people who share Murray’s perspective. Believing             that race and class differences in education outcomes can largely             be explained by differences in cognitive limitations reinforces             many people’s private prejudices. Murray draws strength from             his marginalized status, playing the role of the man brave enough             to tell us the truth. Unfortunately, not every heretic is Galileo;             sometimes they are just cranks.</p>
<p>Rothstein has a much larger and more vocal             constituency. Everyone wishing to shift attention (and blame) away             from schools pays heed to Rothstein’s arguments. Saying that             we cannot expect significant progress in schools until we first             address a host of social ills outside of school is a recipe for             inaction. Waiting for society to fix all of its injustices before             we can really fix schools is like waiting for Godot. It will never             come.</p>
<p>Not surprisingly, the teacher unions             don’t mind waiting. For example, the February 2006 issue of <span class="italic">NEA Today</span> features an             article by David Berliner, the former head of the American             Educational Research Association and professor at Arizona State             University, that repeats Rothstein’s argument with greater             force and fewer reservations: “So why, when we have as much             credible research making connections between poverty and school             success, do we keep looking for other answers? (For example, it             must be the low expectations of teachers!)             What’s surprising is, in the face of that research, we still             concentrate our attention and resources on what happens inside             low-performing schools when the real problems are outside those             schools.” In an appearance on C-SPAN, Berliner observed that             students spend only 1,000 hours a year in school and another 5,000             waking hours with their families and friends. How are schools             supposed to counter these larger influences, he wondered? Of             course, we spend even less time each year with our doctor than we             do in school, but we still have very high expectations for medicine             to make a difference.</p>
<p>And to whom does Berliner credit his ideas?             “These musings could have been written also by Jean Anyon,             Bruce Biddle, Greg Duncan, Jeanne Brooks-Gunn, Gary Orfield,             Richard Rothstein, and many others whose work I admire and from             whom I borrow.” So Rothstein’s views reflect a broad             and deep tradition within education circles. Preceding him are the             likes of Jean Anyon, for example, who writes in the <span class="italic">Teachers College Record</span>,             “The structural basis for failure in inner-city schools is             political, economic, and cultural, and must be changed before             meaningful school improvement projects can be successfully             implemented. Educational reforms cannot compensate for the ravages             of society.”</p>
<p>To be sure, Anyon, Berliner, and Rothstein are             right to warn us that social forces play a significant role in             educational achievement. And Murray is right that at some point             cognitive limits do place a ceiling on student outcomes. But             without strong evidence that ours are the best of all possible             schools, we should reject attempts to shift attention from efforts             to improve schools. Recognizing constraints is not the same as             being paralyzed by them.</p>
<p class="italic">Jay P. Greene is professor of education             reform, University of Arkansas, a senior fellow at the Manhattan             Institute for Policy Research, and a contributing editor of Education Next.</p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=9223746&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/the-odd-couple/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Confidence Men</title>
		<link>http://educationnext.org/the-confidence-men/</link>
		<comments>http://educationnext.org/the-confidence-men/#comments</comments>
		<pubDate>Thu, 17 May 2007 21:32:08 +0000</pubDate>
		<dc:creator>Eric A. Hanushek</dc:creator>
				<category><![CDATA[Check the Facts]]></category>

		<guid isPermaLink="false">http://content.hks.harvard.edu/educationnext/?p=7560457</guid>
		<description><![CDATA[Selling adequacy, making millions]]></description>
			<content:encoded><![CDATA[<p><img src="http://educationnext.org/files/ednext_20073_73_opener.gif" border="0" alt="" align="right" /><span class="italic">Checked: </span><span class="italic">Picus and Associates. 2006. An             Evidence-Based Approach to School Finance Adequacy in Washington. </span></p>
<p><span class="italic"> </span><span class="bold">Checked by Eric A. Hanushek </span></p>
<p>Lawsuits             aimed at compelling legislatures to             increase school funding have been filed in some 42 states.             Courts have found for the plaintiffs in more than half of the cases             on the grounds that schools are not “adequately” funded             (see Figure 1). These decisions have, in effect, changed the way             education appropriations are made, moving decisionmaking from             legislatures to the courts. Instead of flowing from the political             process, determinations of adequate             appropriations come from judges who are informed by paid             consultants. Recently, adequacy plaintiffs have suffered some             serious setbacks (<a href="http://educationnext.org/adequately-fatigued/">see<span class="italic"> legal beat</span></a>). Undaunted,             they soldier on.</p>
<p>In the state of Washington, adequacy             plaintiffs filed a new lawsuit in early 2007 that is expected to             rely heavily on a report prepared at the request of a             gubernatorial-appointed commission, Washington Learns. This report,             “An Evidence-Based Approach to School Finance Adequacy in             Washington,” claims to present scientific evidence of exactly             what needs to be done to bring every child to proficiency as             defined under state and federal law. The advance, if true, would go             far beyond this specific court case and could revolutionize             American education. For if, indeed, we now know how to create an             effective educational system, and only the funds                                          are lacking, then the country’s education         problems can be solved.</p>
<p><img src="http://educationnext.org/files/ednext_20073_73_figure1.gif" border="0" alt="" align="center" /></p>
<p>The analysts who purport to have assembled             this knowledge are led by two professors, Lawrence Picus of the             University of Southern California and Allan Odden of the University             of Wisconsin. The two formed a consulting                                         group known as Picus and Associates and have         become increasingly popular among groups seeking to expand school         spending, be they plaintiffs in funding lawsuits, teachers unions, or         state departments of education. The Washington Learns commission asked         Picus and Associates to recommend                         policy changes that will place the state’s         education system on a sound footing. Specifically, Picus and Odden         answer the question, “What are the high-impact education programs         and strategies that will allow every school to provide each Washington         student with the opportunity to learn at or above proficiency on          state standards as measured by the Washington Assessment of         Student Learning, with proficiency standards calibrated over time to         those of NAEP [National Assessment of Educational Progress], or even         the performance of students in other countries?”</p>
<p>Even if only the state of Washington were             getting precise, scientific answers to such critical questions, the             work of the Picus-Odden team would command the attention of             national policymakers. But the consulting group has already             established a national reputation for its ability to ascertain,             scientifically, what needs to be done in education—and             precisely how much it costs to do it—through prior studies             along much the same lines prepared for policymakers in Kentucky,             Arkansas, Arizona, and Wyoming.</p>
<p>Of course, the evidence base does not change             very rapidly, as is evident from the various reports, which were             carried out between 2003 and 2006. The 2006 study conducted for             Washington Learns has an extensive bibliography, some 260 entries.             But, since the production of the cost study for Kentucky in 2003,             only 30 new references were added (including the obligatory             reference to Thomas Friedman’s <span class="italic">The             World Is Flat</span>). So similar are the             studies that at times it seems the copy function of the Microsoft             word processor deserves to be listed among the authors.</p>
<p>The ease with which one report can build on             another does not seem to translate into efficiencies in the             consulting group’s operations, at least as reflected in the             fees charged. According to available records, the Kentucky study,             conducted in 2003, was executed for $349,000. Arkansas’s             original study, conducted the same year, cost about the same             initially but rose to over twice that amount ($800,000) when the             authors accepted a commission to ascertain whether districts used             their extra money in a way consistent with the consultants’             evidence-based policies. Wyoming, a small but rich state, was asked             to pay $1,260,000 in 2005 for a calibration of its finance formula             along evidence-based lines and a subsequent implementation study.             Washington, in 2006, managed to squeeze the price back down to the             total Arkansas figure, although Washington could get only the             original evidence-based analysis without the follow-up.</p>
<p>Even the Wyoming deal is a bargain, however,             if the study can answer the question posed by the Washington Learns             commission. After all, we spend some $500 billion nationally on             K–12 education, and even small improvements applied to the             nation’s schools could quickly cover the study costs.</p>
<p class="tocheading"><span class="bold">The Picus-Odden Miracle </span></p>
<p>The frequency with which education policy             initiatives of the past, though based on high hopes, have yielded             disappointing results when implemented in the field has led to             rather low expectations. As a general rule, in education             discussions a policy is considered successful if an evaluation has             shown it to have a statistically significant positive effect on             student outcomes. Translated, there must be a high degree of             certainty that positive results were not simply the result of             chance. But just finding that some policy is likely to improve             student outcomes does not mean that the improvement will reach the             high levels sought by Washington Learns, or by others with similar             views about what students should know. The research would have to             provide evidence about the magnitude of improvements in achievement             that can be expected, and these improvements would have to be             large.</p>
<p>Such evidence is precisely what Picus and Odden             purport to provide for their fees. They have combed the research             evidence to provide rather precise, and remarkable, predictions             about the achievement effects of programs whose power has             apparently escaped the attention of almost all other researchers.</p>
<p>Picus and Odden convey the magnitude of             achievement gains that can be expected from their evidence-based             policies through a unit of measurement known as effect size. Effect             size is the change in standard deviations of achievement that can             be expected, according to the research, from the introduction of a             given policy. In itself, that step is perfectly acceptable, as the             unit is widely used in education research.</p>
<p>Discussion of effect sizes and standard             deviations is something most policymakers, even when introduced to             the concepts in their undergraduate statistics course, would rather             avoid. But some heuristics will help to understand the essence of             effect sizes and make clear the import of the Picus and Odden             evidence. The National Assessment of Educational Progress (NAEP)             measures achievement in different grades and attempts to put it on             a common scale. One full standard deviation (an effect size of 1.0)             is roughly equal to the average difference in test score performance between a 4th grader and an 8th             grader. In other words, it is a big effect, as the typical 8th             grader has learned quite a bit since 4th grade.</p>
<p>By this perspective, any education strategy             that in a single year can raise average achievement of a large             aggregate of students by one full standard deviation must be taken             very seriously. Pursued systematically, it could eliminate the             persistent ethnic test-score gap (which is about one full standard             deviation) or could vault the math and science performance of U.S.             students beyond counterparts in Korea, Singapore, and Japan (who             are about one-half of a standard deviation ahead now).</p>
<p>Picus and Odden identify strategies they claim             can do that, and much more. They provide “scientific             evidence” to support the claim that a specific set of             policies can shift average student performance upward by <span class="italic">three to six standard deviations</span>, an extraordinary gain. The policies they identify             include providing a year of full-day kindergarten, reducing class             size to 15 students through grade 3, using multi-age classrooms,             hiring classroom coaches, employing one-to-one tutoring for             disadvantaged students, getting half of the students eligible for             free and reduced-price lunch to attend summer school, embedding technology within the classroom, creating a gifted             and talented program for the top 5 percent of all students, and             accelerating instruction for the 2 percent of students capable of             benefiting from it (see Figure 2). The range in claimed impact             reflects the fact that they sometimes admit to uncertainty about             the exact effect size from a specific program.</p>
<p><img src="http://educationnext.org/files/ednext_20073_73_figure2.gif" border="0" alt="" align="right" />Most Americans would be extraordinarily             satisfied with average gains of one full standard deviation for a             school or district. Picus and Odden claim to be able to do that             three or possibly even six times over for all students in             Washington. After their policies are fully implemented in             Washington, Albert Einstein, were he not participating in these             programs, would find himself achieving at or below the state             average.</p>
<p>This can all happen within one year of             application of these policies, the consultants say. But they would             not give these programs just a single year. They would apply them,             where appropriate, across all years of schooling. (Full-day             kindergarten, for example, happens just once for each student.) If             one then assumes a cumulative impact from giving students not just             a single application but continuing treatment through grade 12, the             gains reach astronomical proportions, somewhere in the range of 23             to 57 standard deviations.</p>
<p class="tocheading"><span class="bold">The Truth behind the Numbers </span></p>
<p>This, of course, is the stuff of science             fiction novels, not research-based school policies. How does a             well-funded study, conducted by scholars of national reputation,             reach such startling conclusions? The procedure is roughly as             follows:</p>
<p><span class="bold">1) </span>Find a study,             preferably one that has some surface credibility, that shows that a             particular intervention had a certain effect on a particular group             of students.</p>
<p><span class="bold">2)</span> Ignore all             the studies of that intervention that show a smaller effect or no             effect at all.</p>
<p><span class="bold">3)</span> Interpret             the study as identifying a true causal relationship, not just a             correlation or association.</p>
<p><span class="bold">4)</span> Finally,             assume that the conditions that produced the very large effect can             be perfectly replicated throughout the state of Washington.</p>
<p>Take full-day kindergarten, for example, which             Picus-Odden estimate to have by itself an impact of 0.77 standard             deviations on student achievement for advantaged and disadvantaged             students alike. (In NAEP terms, this by itself would be equivalent             to three full years of later schooling.) Picus and Odden cite a             1997 meta-analysis by John Fusaro that shows such an impact. But             they disregard Fusaro’s own strong warning: “A             seductive conclusion from these results is that attendance at             full-day kindergartens causes students to achieve at a higher level             than attendance at half-day kindergartens. It is imperative,             however, that we strenuously resist succumbing to such a             seduction.” Meanwhile, Picus and Odden ignore a large body of             literature that shows little impact on advantaged students and             smaller impacts on disadvantaged ones, to say nothing of the             empirical reality that the 56 percent of students currently             attending schools that have full-day kindergarten do not surpass             the remaining 44 percent attending schools without full-day             kindergarten by anything like a 0.77 margin.  Note, for             example, that black students and disadvantaged students are             currently more likely to attend schools with full-day kindergarten             than more advantaged students.</p>
<p>Or take summer school, which Picus and Odden             estimate would have an effect size of 0.45 standard deviations.             This policy recommendation is apparently based on a single study in             2000 of the Voyager summer learning program, although they note             that a major meta-analysis suggests widely varying effect sizes             from the evaluations of different studies. Note also that in             Odden’s peer review in 2004 of William Driscoll’s and             Howard Fleeter’s Ohio study of the costs of bringing all             students to proficiency in math and reading in order to comply with             NCLB, he castigates the study’s authors, who called for             expanded summer school, because they “reference no research             to support this assertion, when in fact most research shows that             summer school as typically administered has little if any impact on             learning.”</p>
<p>These patterns are repeated when one goes to             the other “evidence-based” recommendations of Picus and             Odden, including class size reduction and professional development.             Their estimate of the benefits of professional development comes             directly from the professional association representing those who             supply professional development. And so on. There is little reason             to believe that the effect sizes identified in their work indicate             what can be expected from implementing any policy on a broad scale.</p>
<p>The approach of Picus and Odden to policies is             simple: if a program shows a large positive effect in one study, it             should immediately be implemented across the state. Indeed, they             assert in public hearings that adopting anything less than the             complete set of recommended programs would constitute an inadequate             program, and that they would testify to the inadequacy in court.</p>
<p class="tocheading"><span class="bold">Are Costs Important? </span></p>
<p>The primary purpose of reviewing the evidence             on programs is to establish the cost of providing a new and             improved (adequate) education. The various programs suggested by             Picus and Odden have very different price tags associated with             them. They make it hard to tell from their report what prices might             go with each of the programs, because they bury the costs within             the staffing of each prototypical school. It is, nonetheless,             relatively easy to obtain reasonable cost estimates for each             program.</p>
<p>The basic building blocks for calculating the             cost per pupil of the various policies Picus and Odden propose are             the approximate average expenditure of $7,800 per pupil and average             teacher compensation (salary plus benefits) of $60,000 for the             state of Washington. We can first translate these into the cost per             recipient for each program based on resource demands and then take             into account the proportion of all students who receive the             program. The results show wide variations in costs. For example,             full-day kindergarten would increase average spending in the state             by $154 to $300 per student, while the K–3 class size             reduction would increase average spending by $410 to $800 per             student. Some programs have no obvious costs. For example,             multi-age classrooms might reasonably be taken as free. Similarly,             changes in curriculum do not in general have significant added             costs (past, say, an initial teacher-training period). Other             programs, such as skipping grades, would actually save money, since             students would spend 12 rather than 13 years in the system.</p>
<p>Once program costs are separated, one can             immediately see the variation that exists and can make judgments             about where money is better (more efficiently) spent. A simple cost             calculation gives the improvements in student achievement (measured             again in standard deviations) that could, by the Picus and Odden             estimates of benefits, be expected for             a $100 addition to spending per pupil from each of the separate             programs. By their low-end estimates of benefits (which total to             just three standard deviations), each $100 spent on classroom             coaches would be expected to yield at least a 0.25 standard             deviations gain in achievement, very similar to the expected gain             for full-day kindergarten. Their class-size reduction proposal             would yield only <span class="italic">one-sixth</span> that gain, or 0.04 standard deviations, an effect             very similar to that for one-to-one tutoring.</p>
<p>Using the upper range of their effect size             estimates, $100 spent on classroom coaches would yield a gain of             over one-half standard deviations in student achievement, and             one-to-one tutoring would yield a one-quarter standard deviations             improvement. According to their estimates, some of their favored             programs (such as classroom coaches) are more than 10 times as cost             efficient as others, such as class size reduction for K–3.</p>
<p>Picus and Odden contend that all programs,             regardless of cost, must be simultaneously undertaken. But it is             clear that the programs they identify have very different expected             returns on spending. Their method of distributing costs through             their prototypical schools provides no information on the relative             efficiency of investing in the various components. Nor does it say             anything about the costs of improving outcomes if done efficiently.             Unless there are unlimited funds to spend on educational programs,             it would not make sense to put the money into all the programs             without regard to cost.</p>
<p class="tocheading"><span class="bold">What Are States Paying For? </span></p>
<p>Cost estimates are an important component in             the politics of court and legislative deliberations on schools. The             adequacy debates are typically motivated by obvious and real             shortfalls in the achievement of a state’s students, but a             combination of naive concerned citizens and self-interested parties             invariably pushes to translate these debates into a simple dollar             figure. Such translation is salient for courts and legislatures and             both simplifies and focuses the issue for the media.</p>
<p>What Picus and Odden provide in their reports             is essentially a selective review of the published literature on             program effects. Why do different states and organizations pay             ever-increasing amounts to see this research review when Google             would bring up the most recent version immediately and without             expense? The answer is simple. Clients want a bottom-line statement             about how much spending would provide an adequate education, and             they want this cost estimate attached to their specific state. Few             people care about the “studies” on which consultants             base their reports, or even their validity, because nobody really             expects schools to implement these specific programs if given extra             funding. Clients simply want a requisite amount of scientific aura             around the number that will become the rallying flag for political             and legal actions.</p>
<p>Summing the added cost of the separate programs             suggested by Picus and Odden, I estimate that the overall plan, if             fully applied, would increase average spending in Washington by             $1,760 to $2,760 per student, or 23 to 35 percent. This estimate of             the increased spending necessary to achieve “adequacy”             is very similar to the percentage increases they have recommended             to other states, and numbers like these will presumably become part             of the headlines surrounding the new court case.</p>
<p>But pity the poor states that actually             implement the Picus and Odden plan. They are sure to be             disappointed by the results, and most taxpayers (those who do not             work for the schools) will be noticeably poorer.</p>
<p><span class="italic">Eric A. Hanushek is a senior fellow at the             Hoover Institution, Stanford University, and a member of its Koret             Task Force on K–12 Education. </span></p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=7560457&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/the-confidence-men/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Debunking a Special Education Myth</title>
		<link>http://educationnext.org/debunking-a-special-education-myth/</link>
		<comments>http://educationnext.org/debunking-a-special-education-myth/#comments</comments>
		<pubDate>Fri, 23 Feb 2007 18:03:08 +0000</pubDate>
		<dc:creator>Jay P. Greene</dc:creator>
				<category><![CDATA[Check the Facts]]></category>
		<category><![CDATA[On Top of the News]]></category>

		<guid isPermaLink="false">http://educationnext.org/?p=6018321</guid>
		<description><![CDATA[Don't blame private options for rising costs]]></description>
			<content:encoded><![CDATA[<p>Can spiraling special education costs explain why educational achievement remained stagnant over the past three decades while real education spending more than doubled? Policy makers, education researchers, and school district officials often make this claim. Special education students—goes the argument—are draining resources away from regular education students.</p>
<p>In 1975, the federal government enacted the Education of All Handicapped Children Act, now called the Individuals with Disabilities Education Act (IDEA), which requires states to provide a “free appropriate public education” to all students with disabilities. Parents have the right to work with school officials to devise an individualized education plan for their child. They also have the right to pursue legal action if they and the district cannot agree on what services their child will receive and whether the public school or a private provider will deliver those services. Since the implementation of the federal law and subsequent state laws, the percentage of students in the nation identified as requiring special education has risen sharply, from 8.3 percent of all students in 1977 to about 13.7 percent in 2004, according to the U.S. Department of Education.</p>
<p class="tocheading"><strong>The “Two-Step”</strong></p>
<p>A popular riff on the idea that special education students are bleeding public school budgets blames private place ments. A large number of mostly undeserving disabled students and their clever parents, critics allege, have managed to get public schools to pay for attendance at expensive private schools. Tales of the “greedy needy”—disabled students who receive unreasonably expensive services—appear regularly in the media. The <span class="italic">San Francisco Chronicle </span>describes the case of a student with learning disabilities and an anxiety disorder whose parents “enrolled him in a $30,000-a-year prep school in Maine—then sent the bill to their local public school district.” The <span class="italic">Chronicle</span> declares that similar situations are “playing out up and down California as more parents of special education students seek extra-special education at public expense: private day schools, boarding schools, summer camps, aqua therapy, horseback therapy, travel costs, personal aides and more.” The <span class="italic">Chronicle</span> cites a school finance consultant to the California Department of Education to make the harm to general education parents clear: “This is not sustainable&#8230; Special education is a growing portion of budgets in many districts, squeezing out services for other pupils.”</p>
<p><span class="italic">Time</span> magazine relates a story about an autistic child whose parents put him in an expensive private school and then “informed Colorado’s Thompson school district it had to pick up the bill for Boston Higashi’s $135,000 annual tuition.” <span class="italic">Time</span> warns, “Special ed costs threaten to eat into budgets for school endeavors that are not federally mandated, like athletics or the gifted-and-talented program. The money has to come from somewhere, says Becky Jay, who was president of the local school board when the [family] first asked for tuition reimbursement, ‘and regular kids lose out.’”</p>
<p><span class="italic">The New York Times</span> does a similar dance routine. The paper profiles a wealthy community—Westport, Connecticut—where “some [special education students] are getting as little as a few hours of weekly speech therapy. Others get tuition for private school or home tutoring.” The superintendent, we are told, has held the line on special education services: “His administration has denied many special education requests—horseback riding and personal trainers, for instance—that it deemed extravagant.” Again, we are warned that runaway special education costs pose a threat: “The strain on the bottom line can be intense, even in Westport, where in the 2002–03 school year the $10.9 million spent on special education consumed 15.9 percent of the district’s education spending.”</p>
<p>It’s a two-step. First, provide colorful anecdotes of unreasonably expensive-sounding private placement, and then warn about how general education may suffer.</p>
<p>As it turns out, the evidence contradicts the private placement myth. Only a very small fraction of disabled students are placed in private schools at public expense. And contrary to claims that this is increasingly common, the likelihood that disabled students will be placed in a private school has not grown in the last 15 years. While some of those private placements are indeed expensive, the overall cost of private placement nationwide constitutes a tiny portion of public school spending.</p>
<p class="tocheading"><strong>The Extent of Private Placement</strong></p>
<p>The media dance would be an engaging one if the plural of anecdote were data. But the data on private placement are actually dancing to a very different tune. The U.S. Department of Education’s Office of Special Education Programs tracks the number of private placements. The information for each state and by disability classification is posted on its web site. We have reproduced the relevant information in Table 1.</p>
<p><a href="http://educationnext.org/files/20074_CTF_Tbl11.GIF"><img class="aligncenter size-full wp-image-49628720" src="http://educationnext.org/files/20074_CTF_Tbl11.GIF" alt="20074_CTF_Tbl1" width="237" height="716" /></a></p>
<p>As of 2004, private schools served, at public expense, a total of 88,156 students with disabilities of the 5,963,129 students with disabilities nationally, which amounts to 1.48 percent. And these privately placed students amounted to 0.18 percent of the 47,917,774 students enrolled in public education. Nor has the percentage of students who are privately placed substantially increased in recent years. According to the <span class="italic">Digest of Education Statistics</span>, a similar proportion, about 1.6 percent, of students receiving services under IDEA were educated in a private school setting in 1989. The percentage of all students who were privately placed has increased slightly since then, due to an increase in the percentage of students diagnosed as disabled, but there has been no surge in the proportion of special education students in private settings.</p>
<p>The fact is that private placement is extremely rare.</p>
<p>Instances of private placement that occur as a result of parental requests rather than at the initiative of school districts appear to be even more rare. In many cases, public schools simply do not have the facilities or staff to accommodate students with certain disabilities, and those students are sent by the public schools to specialized private schools. For example, some public schools are incapable of serving blind and deaf students, who constituted 3,022 of the 88,156 privately placed students in 2004. Another 19,876 students in private placements are mentally retarded, have multiple disabilities, or have suffered a traumatic brain injury. The lion’s share of the 38,510 emotionally disturbed students attending private school at public expense were also likely sent because public school officials believed that they were unable to handle the students’ needs. The number of private placements for students with mild disabilities that resulted from parental action is likely to be a very small portion of the 88,156 students in private placements. (See Figure 1 for a breakout of private placements by disability.) Media reports are often just the tip of the iceberg, but in this case there may not be much more beneath the waterline.</p>
<p><a href="http://educationnext.org/files/ednext_20072_67_2_fig11.gif"><img class="alignright size-full wp-image-49629908" src="http://educationnext.org/files/ednext_20072_67_2_fig11.gif" alt="ednext_20072_67_2_fig1" width="691" height="491" /></a></p>
<p class="tocheading"><strong>Evidence on Cost</strong></p>
<p>Perhaps the private placement of disabled students is so expensive that it still diverts significant resources from general education. To estimate the additional cost involved in placing disabled students in private schools, we relied on a study sponsored by the U.S. Department of Education. According to the Special Education Expenditure Project, the average cost of a private placement in 2000 was $25,580. The project report also provides the average cost of serving students in a public school setting, broken out by disability type (e.g., specific learning disability, mental retardation, autism, etc.). Weighting the costs by the type of disabilities among students placed in private schools, we can estimate that the average privately placed student would have cost $15,117 if he had instead been served in a public school. That is, we estimate that private placement cost an additional $10,463 per student.</p>
<p>This estimate likely overstates by a fair margin the additional cost of serving disabled students in private schools. It assumes that the cost of serving a privately placed student would be the same as the cost of serving the average student with the same disability. But we have good reason to believe that most privately placed students are more severely disabled and therefore more expensive to educate than the average student in the same disability classification. An emotionally disturbed student who requires private placement, for example, is likely to be more challenging and expensive to educate than the average emotionally disturbed student who remains in public schools. With the law’s emphasis on providing services in the least restrictive environment, the severity of the disability is likely to increase the probability of private placement.</p>
<p>Given the conservative estimate of $10,463 per pupil as the additional cost of private placement and given 88,156 privately placed students, the total additional cost of placing disabled students in private schools may be as high as $922 million. This sounds like a lot of money, but to public schools it is almost a rounding error. In the school year ending in 2000, the same year as our cost estimates, public schools spent $382 billion. The $922 million for private placement amounts to just 0.24 percent of the total budget. The cost of family-driven private placement is certainly less.</p>
<p class="tocheading"><strong>The Exceptions Do Not Make the Rule</strong></p>
<p>There are some school districts and states where private placement is more burdensome. In Washington, D.C., for example, privately placed students constitute 3.03 percent of enrollment as of 2004. According to the <span class="italic">Washington Post</span>, the cost of private placements represents 15 percent of the school district’s budget.</p>
<p>But Washington, D.C., is the exception, not the rule. No state has more than 1 percent of its students privately placed. Only four states (Connecticut, Massachusetts, New Hampshire, and New Jersey) have more than 0.5 percent of their students attending private schools at public expense. And to repeat, nationwide only 0.18 percent of all students are privately placed. Nor is D.C. typical for a large urban school district. According to the <span class="italic">New York Times</span>, New York City schools have 2,000 privately placed students at a cost of $24 million. That amounts to only 0.19 percent of student enrollment and only 0.17 percent of the budget.</p>
<p>Why do some places have an unusually large proportion of privately placed students? In Washington, D.C., the explanation might be found in the dysfunction of the school district (see “Old Wine, New Bottles,” <span class="italic">forum</span>, Fall 2001, and “How Vouchers Came to D.C.,” <span class="italic">features</span>, Fall 2004). The D.C. schools struggle to provide an adequate education to any of their students. Disabled students are entitled under federal law to demand an adequate education and to obtain one in a private school if the public schools are unable to provide it. The nondisabled students who remain in D.C. public schools lack the same mechanism for exiting failing schools. That is, the high rate of private placement in D.C. may be more a function of the quality of D.C. public schools than a function of special education per se.</p>
<p>The higher rate of private placement found in D.C. and a handful of northeastern states could also be explained by a self-reinforcing process. Once some students obtain private placements, it is easier for others to do so. A network of parents and lawyers develops as private placements become more common, spreading information about options and strategies. So private placement may beget more private placement.</p>
<p>If obtaining a private placement requires a battle with school officials, parents with greater awareness of their legal rights and greater resources to engage in the fight are more likely to win. Washington, D.C., and northeastern states have a high concentration of wealthy and well-educated parents. Because it is the nation’s capital, D.C. has an unusually high number of lawyers, disability advocates, and policy-savvy parents.</p>
<table border="0" cellspacing="0" cellpadding="5" bgcolor="#f7e4da">
<tbody>
<tr>
<td><span class="bold">Blaming Special Education </span></p>
<p>Following anecdotes about expensive and unreasonable-sounding private placements, news stories often segue into reports of the total cost of special education—not private placement costs per se. Perhaps special education as a whole is the legitimate target of complaint.</p>
<p>This claim also appears at odds with the facts. It is true that special education enrollments have been increasing at a rapid rate, but that doesn’t mean special education costs are rising faster than the resources available for regular education. To estimate the relative burden of providing special education services over time, we use information on the cost of these services by disability type reported by the Special Education Expenditure Project. We know the number of students in each disability classification over time from the U.S. Department of Education’s <span class="italic">Digest of Education Statistics</span>. If we multiply the number of students in each disability category by the cost of services in each disability, we can estimate the total cost of special education services.</p>
<p>Of course, we only have information on the costs per disabled student from a recent study, and it is possible that the cost of serving students in each disability classification has increased in real terms over time. To adjust for this, we assume that the change in the real cost of special education services is commensurate with the change in student-teacher ratios. Making that adjustment, special education services cost roughly $17.7 billion in 1977, when federal protection for special education began; spending almost doubled to $34.3 billion by 2003 as the number of students in special education increased by 76 percent.</p>
<p>The near doubling in special education costs is not attributable to a rise in rare and expensive disabilities. Media reports often emphasize the growth in students with autism but their numbers remain very small, less than 0.3 percent of enrollment. The total cost of special education services for autism does not exceed 0.45 percent of all spending. Severe disability categories like mental retardation, which are costly to serve, have actually experienced a decline in enrollment. The bulk of special education cost increases comes from explosive growth in the specific learning disability (SLD) category, which is among the least costly to serve. Students in this category grew from 796,000 in 1977 to 2,848,000 in 2003.</p>
<p>Still, the large cost increase doesn’t mean that special education is taking away more resources from general education. Total revenue for public education also nearly doubled between 1977 and 2003, adjusted for inflation. Special education costs constituted roughly the same share of total public school revenue (8.3 percent) in 2003 as in 1977. While special education does consume more money over time, the relative financial burden of special education on public education has not increased because public schools are also receiving significantly more money.<span class="italic"> </span></p>
<p><span class="italic">— Jay P. Greene and Marcus A. Winters </span></td>
</tr>
</tbody>
</table>
<p class="tocheading"><strong>Why So Few?</strong></p>
<p>Recent evidence from Florida’s McKay Scholarship Program for Students with Disabilities seems to indicate that the real question is why private placements are so rare. This program provides all students in special education with a generous voucher that they can use to attend a private school, eliminating the need for dissatisfied parents to sue their school. According to the Florida Department of Education, 16,144 students currently use a McKay Scholarship, about 4 percent of the students receiving services under IDEA. Given that only 1.48 percent of special education students are privately placed nationally, the experience with McKay suggests a pent-up demand for private schooling among the disabled.</p>
<p>In a phone survey, only one-third of parents who participated in the McKay program reported that they were satisfied with their child’s previous public school. If these results can be generalized to other states, then why don’t more parents pursue private placements? The simple answer is that it is not always easy for parents to secure their preferences if those preferences differ from the judgments of school authorities. IDEA regulations require school districts to provide services in the “least restrictive environment” possible for the child to reach full educational potential. Typically, officials consider the least restrictive environment to be the local public school.</p>
<p>Litigating against a school district costs time and money that many parents don’t have, and school districts are increasingly willing to spend. Determined public schools can outspend and outlast almost any family. In California, school officials “fought so hard to block the claims of a student that Judge Oliver W. Wanger of United States District Court took 83 pages to berate the district’s ‘hard-line position’ and its law firm for ‘willfully and vexatiously’ dragging out the case so long that the former student is now 24.” Litigated cases are extremely rare; media reports of a tidal wave of special education lawsuits are contradicted by an examination of the data. In California, only 0.6 percent of students with a disability file a formal complaint over their educational services. Far fewer ever reach the courts.</p>
<p class="tocheading"><strong>McKay as an Alternative</strong></p>
<p>The McKay program offers a number of benefits. First, our evaluation found that families reported obtaining higher-quality services in a private setting with a McKay voucher than they had received in public schools. Second, McKay offers the promise of slowing growth in the percentage of students identified as disabled. It provides schools with a disincentive for overdiagnosis, as each student identified as disabled becomes a voucher-eligible student who could leave public schools and take all of the money devoted to her education with her. Third, the McKay program should help contain legal costs, both for school districts and families. And, by allowing private placement without the cost of a legal struggle, it increases access to private placement for lower-income families. We found that lower-income families used McKay vouchers to gain access to private placement at about the same rate as higher-income families.</p>
<p>Programs like Florida’s McKay voucher for disabled students would also address the concerns that people have with the cost of private placement. The amount of the voucher is equal to what would be spent by a public school to educate a student with the same type and severity of disability. This would guarantee that private placement costs the public no more than serving the student in the public school.</p>
<p><span class="italic">Jay P. Greene is professor of education reform, University of Arkansas, and a senior fellow at the Manhattan Institute for Policy Research. Marcus A. Winters is senior research associate at the Manhattan Institute for Policy Research and doctoral fellow at the University of Arkansas. </span></p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=6018321&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/debunking-a-special-education-myth/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>The NCES Private-Public School Study</title>
		<link>http://educationnext.org/the-nces-privatepublic-school-study/</link>
		<comments>http://educationnext.org/the-nces-privatepublic-school-study/#comments</comments>
		<pubDate>Fri, 10 Nov 2006 19:01:43 +0000</pubDate>
		<dc:creator>Paul E. Peterson</dc:creator>
				<category><![CDATA[Check the Facts]]></category>

		<guid isPermaLink="false">http://content.hks.harvard.edu/educationnext/?p=4612612</guid>
		<description><![CDATA[Findings are other than they seem]]></description>
			<content:encoded><![CDATA[<p><em>Checked:<br />
Henry Braun, Frank Jenkins, and Wendy Grigg. 2006. &#8220;Comparing Private Schools and Public Schools Using Hierarchical Linear Modeling,&#8221;U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, NCES 2006-461.</em></p>
<p><strong>Checked by Paul E. Peterson and Elena Llaudet</strong></p>
<p>On July 14, 2006, the U.S. Department of Education’s             National Center for Education Statistics (NCES) released a study             that compared the performance in reading and math of 4th and 8th             graders attending private and public             schools. The study had been undertaken at the request of the NCES             by the Educational Testing Service (ETS). Using information from a             national sample of public and private school students collected in             2003 as part of the National Assessment of Educational Progress             (NAEP), ETS compared the test scores of public school students with             those of students in all private schools, taken together.             Separately, it compared student performance in public schools with             that in Catholic, Lutheran, and evangelical Protestant schools.</p>
<p>According to the NCES study, students attending             private schools performed better than students attending public             schools. But after statistical adjustments were made for student                                          characteristics, the private school advantage among         4th graders disappeared, giving way to a 4.5-point public school         advantage in math and parity between the sectors in reading. After the         same adjustments were made for 8th graders, private schools retained a         7-point advantage in reading but achieved only parity in math.</p>
<p>But, in fact, the NCES study’s measures             of student characteristics are                                         flawed. Using the same data but substituting         better measures of student characteristics, we estimated three         alternative models that identify a private school advantage in nearly         all comparisons. Similar results are found for Catholic and Lutheran         schools taken separately, while evangelical Protestant schools achieve         parity with public schools in math and have an advantage in reading         (see Figure 1).                         The results from our alternative models should not         be understood as evidence that private schools outperform public         schools. Without information on prior student achievement, one cannot         make judgments about schools’ efficacy in raising student test         scores. Thus, NAEP data cannot be used to compare the performance of         private and public schools. However, our results clearly reveal the         shortcomings of the NCES study—shortcomings so deep-seated that         their purported findings lack credibility. In fact, in view of the criticisms received, NCES is reconsidering the         propriety of its involvement in studies of this sort. “This is         not what we should be doing.… Our job is to collect the data and get it out the         door,” said Mark Schneider, the commissioner of NCES, in a recent         interview with <span class="italic">Education Week</span>.</p>
<p class="tocheading"><strong><span class="bold">Problems with the NCES Model</span></strong><span class="bold"> </span></p>
<p>The NCES analysis is at serious risk of having             produced biased estimates of the performance of public and private             schools. The study’s adjustment for student characteristics             suffered from two sorts of problems: a) inconsistent classification             of student characteristics across sectors, and b) inclusion of             student characteristics open to school influence.</p>
<p class="tocheading"><strong><span class="bold">Classification Bias </span></strong></p>
<p>To avoid bias, classification must be             consistent for both groups under study. The NCES study repeatedly             violates this rule when it infers a student’s background from             his or her participation in federal programs intended to serve             disadvantaged students. Public and private school officials have             quite different obligations and incentives to classify students as             participants in these federal programs: a) the Title I program for             disadvantaged students; b) the free and reduced-price lunch             programs; c) programs for those classified as Limited English             Proficient (LEP); and d) special education, as indicated by having             an Individualized Education Program (IEP). As a result, NCES             undercounted the incidence of disadvantage in the private sector             and overcounted its incidence in the public sector.</p>
<p><span class="bold">Title I.</span> If a             public school has a schoolwide Title I program, which is permitted             if 40 percent of its students are eligible for free or             reduced-price lunch, then every student at the             school—regardless of poverty level—is said to be a             recipient of Title I services. By contrast, private schools cannot             directly receive Title I funds nor can they operate Title I             programs. Instead, private schools must negotiate arrangements with             local public school districts, which then provide Title I services             to eligible students. Many private schools lack the administrative             capacity to handle these complex negotiations or do not wish to             make available services that they will not administer, making             private school participation haphazard. In the 2003–04 school             year, only 19 percent of private schools were reported by the U.S.             Department of Education (DOE) to participate in Title I, compared             to 54 percent of public schools.</p>
<p><span class="bold">Free Lunch.</span> Access             to free or reduced-price lunch is also an imperfect indicator of a             student’s family income. According to official DOE             statistics, nearly 96 percent of public schools participated in the             National School Lunch Program in the 2003–04 school year,             while only 24 percent of private schools did so. The disparities             are explained in part by the greater administrative challenges the             private sector faces, not just by differences in the neediness of             the children it serves. The administration of the school lunch             program is generally organized within the central office of each             school district so that local schools are buffered from the             responsibility of dealing with state officials. Private schools             that seek to participate in the program usually must work directly             with the state department of education, and many appear to have             concluded that the burden of compliance with federal regulations             governing the program outweighs any benefits low-income children             might receive. Furthermore, as many as one-fifth of the public             school students participating in the free lunch program may not be             in fact eligible, a Department of Agriculture study has shown.</p>
<p>In short, using these two variables as             indicators of family background undercounts the incidence of             poverty among students in private schools and overcounts its             incidence in public schools. In the alternative models discussed             below, we employ two other indicators of family background that are             less at risk of classification bias. The first, parental education,             is well known to be a particularly appropriate control variable, as             other studies have shown that it is the background variable most             highly correlated with student achievement. Based on this             indicator, 69 percent of 4th graders in public schools had parents             with a college education, compared to 85 percent of those in the             private school sector. The second indicator, region of the country             in which the school is located, as well as its rural, urban, or             suburban location, is also appropriate inasmuch as student             performance is known to vary significantly by locality. Private             schools are located disproportionately in central cities and in the             Northeast.</p>
<p><span class="bold">Limited English Proficient (LEP).</span> Eleven percent of the             4th graders in public schools were classified as Limited English             Proficient “according to school records,” while only 1             percent of private school 4th graders were so classified. Among 8th             graders, the percentages were 6 and 0 percent, respectively. While             LEP was used by NCES as the indicator of students’ language             skills, other information in the NAEP data suggests that sector             differences in language background are not that extreme. When 4th             graders themselves were asked how often a language other than             English was spoken at home, 18 percent in the public sector replied             “all or most of the time” as did 12 percent in the             private sector. Also, the percentage of students in the public             sector who were Hispanic was 19 percent, while it was 9 percent in             the private sector. The percentage of students who were Asian was             approximately the same in the two sectors.</p>
<p>To avoid undercounting those students in the             private sector with language difficulties, we substitute for the             LEP indicator the students’ own reports of the frequency that             a language other than English was spoken in their home. While             students may not always accurately report this information, there             is no reason to expect errors to vary systematically by school             sector.</p>
<p><span class="bold">Special Education.</span> Fourteen             percent of the public school 4th graders were reported to have an             Individualized Education Program (IEP), while only 4 percent of             4th-grade students in private schools had an IEP. Among 8th             graders, the percentages were 14 and 3, respectively. The NCES             study assumes that these differences accurately describe the             incidence of disability in the public and private sector. However,             public schools must, by law, provide students with an IEP if it is             determined that the student has a disability, while private schools             have no such legal obligation. In addition, public schools receive             extra state and federal funding for students so identified.             Although some private schools also receive financial support for             IEP students, the administrative costs of classifying students may             dissuade private officials from seeking that aid unless             disabilities are severe.</p>
<p>IEP participation may thus undercount the             incidence of disability within the private sector. As a substitute             for IEP, we use an indicator of whether the student received an IEP             because of a severe or moderate disability. Six percent of the 4th             graders in public schools were identified as having a severe or             moderate disability while only 1 percent of those in the private             sector were so identified.</p>
<p><strong><br />
</strong></p>
<p class="tocheading"><strong><span class="bold">Student Characteristics Open to School             Influence </span></strong></p>
<p>Characteristics influenced by the school the             students are attending will bias estimates if they are included in             statistical adjustments for student background. Three variables             open to school influence were included in the NCES analysis: a) the             student’s absenteeism rate; b) number of books in the             student’s home; and c) availability of a computer in the             student’s home. NCES assumed absenteeism to be solely a             function of a student’s background; yet, it is not             unreasonable to believe that schools have an effect on             students’ attendance records. In the same way, school             policies—school requirements, homework, and conferences with             parents, for example—can affect what is available in             students’ homes. In the third alternative model, we eliminate             these variables.</p>
<p><strong><br />
</strong></p>
<p class="tocheading"><strong><span class="bold">Results from the Alternative Models </span></strong></p>
<p>In order to check the sensitivity of NCES             results to the particular methodology that was employed, we first             replicated the results from the NCES study’s primary model.             With that accomplished, it was possible to identify the             consequences of relaxing the questionable assumptions that             underpinned the NCES model.</p>
<p>Figure 1 reports the original NCES results for             public and private schools (both sectors taken as a whole), and             then those from the three alternative models. These models             gradually exclude the NCES variables that suffered from the biases             discussed above, replacing them with better measures of student             characteristics. Alternative Model I substitutes parents’             education and the location of the school for the Title I and Free             Lunch variables in the NCES study. In addition, Model II replaces             the LEP indicator with student reports of the frequency with which             a language other than English is spoken at home and replaces the             IEP indicator with teacher reports of whether the child was given             an IEP because of a profound or moderate disability. Finally, Model             III, while keeping the other improvements, eliminates the             absenteeism, computer, and books-in-the-home variables, thereby             avoiding the inclusion of student characteristics that can be             influenced by the school. Some may think that Model III does not             include sufficient indicators of the student’s family             background. Those for whom this is a concern should place greater             weight on Model II.</p>
<p>The number of observations under study drops             significantly when moving from the NCES model to Model I, in part             because many students did not report the level of education their             parents had attained. To ascertain whether results were influenced             by the change in the size of the sample under analysis, we ran the             NCES model on the same sample of observations as used in Model I.             The results were reassuring, as the estimated coefficients of the             effect of the private sector as a whole were never more than half a             point away from those obtained from the whole sample.</p>
<p>According to the alternative models, in             8th-grade math, the private school advantage varies between 3 and             6.5 test points; in reading, it varies between 9 and 12.5 points.             Among 4th graders in math, parity is observed in one model, but             private schools outperform public schools by 2 and 3 points in the             other two models; in 4th-grade reading, private schools have an             advantage that ranges from 7 to 10 points.</p>
<p>The results for Catholic schools using the             alternative models are very similar to those of the private sector             as a whole. Lutheran schools are estimated to have a larger             advantage in math and a similar one in reading when compared to the             results of the private sector taken together. And evangelical             Protestant schools are found to perform at a similar level to             public schools in math but at a higher level in reading. Detailed             results for these separate categories of private schools are             available at www.educationnext.org.</p>
<p class="tocheading"><strong><span class="bold">Summing Up </span></strong></p>
<p>Let us be clear. We do not offer our results as             evidence that private schools outperform public schools but rather             as a demonstration of the dependence of the NCES results on             questionable analytic decisions. Although the alternative models             are an improvement on the NCES analysis, no conclusions should be             drawn about causal relationships from these or any other results             based on snapshot NAEP test scores.</p>
<p>Asked by <span class="italic">Education             Week</span> to comment on our findings,             the lead author of the NCES report freely acknowledged the problems             with some of the variables used in the NCES analysis, but asserted             that our alternative models may be “underadjusting for the             disadvantage in the public sector” because we do not control             separately for mothers’ and fathers’ education.              While this is desirable in principle, in practice it would             have significantly reduced the number of observations available to             use as fewer than half of the 4th graders, for example, reported             the educational attainment of both parents. Despite this             limitation, our main conclusion still stands: NAEP data are too             fragile to be used to measure the relative effectiveness of public             and private schools. Making judgments about causality based on             observations at one point in time is highly problematic, so much so             that it is surprising that NCES commissioned a study to analyze the             NAEP data set for this purpose.</p>
<p>Fortunately, the practice seems to have come             to an end. Commissioner Schneider has stated that his agency should             not have initiated the study and NCES will in the future refrain             from analyses of the raw data that it collects. Let’s hope             that private researchers also exercise responsibility by not using             NAEP data for purposes for which they are clearly not suited.</p>
<p><em><span class="italic">-Paul E. Peterson is professor of             government at Harvard University and a senior fellow at the             Hoover Institution. He serves as editor-in-chief of </span>Education Next<span class="italic">. Elena             Llaudet is a research associate in the Harvard Department of             Government, where she is pursuing her Ph.D. </span></em></p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=4612612&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/the-nces-privatepublic-school-study/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Is Your Child&#8217;s School Effective?</title>
		<link>http://educationnext.org/is-your-childs-school-effective/</link>
		<comments>http://educationnext.org/is-your-childs-school-effective/#comments</comments>
		<pubDate>Wed, 06 Sep 2006 22:41:10 +0000</pubDate>
		<dc:creator>Paul E. Peterson</dc:creator>
				<category><![CDATA[Check the Facts]]></category>
		<category><![CDATA[On Top of the News]]></category>

		<guid isPermaLink="false">http://content.hks.harvard.edu/educationnext/?p=3853947</guid>
		<description><![CDATA[Don’t rely on NCLB to tell you]]></description>
			<content:encoded><![CDATA[<p><em>Checked: No Child Left Behind Act of 2002, Title I: Adequate Yearly Progress Florida A+ Plan: School Grades</em></p>
<p><strong>Checked by Paul E. Peterson and Martin R. West</strong></p>
<p class="firstLetter">No Child Left Behind (NCLB),the federal school-accountability law, is widely held to have accomplished one good thing: require states to publish test-score results in math and reading for each school in grades 3 through 8 and again in grade 10. The results appear to be telling parents whether their child&#8217;s school is doing a better job than the one across town, in the neighboring city, or across the state.</p>
<p>But accountability works only if the yardstick used to measure performance is reasonably accurate. Unfortunately, the yardstick required by the federal law is not. Our analysis of its workings in Florida reveals it to be badly flawed and not as accurate as the measuring stick employed by the state of Florida for similar purposes.</p>
<p>To her credit,  Secretary of Education Margaret Spellings has apparently recognized the need to fix the NCLB yardstick. In November 2005, she announced a pilot program that would allow a few selected states to incorporate student growth into their AYP grading scheme. Although 20 states initially requested to participate, only 2-Tennessee and North Carolina-have so far been given the go-ahead, and the modifications they have been allowed to make are relatively minor. Meanwhile, the yardstickto be used by the other 48 remains asdefective as ever.</p>
<p>Part of the problem is that NCLB makes only crude distinctions between schools achieving performance benchmarks and schools not doing so.Florida&#8217;s grading system divides schools into five different categories, just as teachers do when they grade students on a scale from A to F. (See Figure 1 for the number of schools that received each mark.) Another part of the problem is that the federal approach pays only a passing nod to the improvement made by individual students, while Florida&#8217;s own method takes into account how much specific students have learned in a given year-exactly what parents care about.</p>
<p>It is not that Governor Jeb Bush(and his legislature) got it exactly right, while his brother (and Congress) ran amuck. But there is little doubt that NCLB needs repairing, something that Congress can do when the federal law is reauthorized.</p>
<p><img src="http://educationnext.org/files/ednext20064_76fig1.gif" border="0" alt="" width="449" height="596" /></p>
<p class="tocheading"><strong>Measuring Quality</strong></p>
<p>Finding the right yardstick is no easy task. Not everyone agrees on what makes for a good school. Some reject test scores, while others care more about building students&#8217; character than boosting their academic achievement. But Congress took a clear stance on the issue in NCLB when it determined that a school with subpar student test scores in reading and math is not doing its job. Most Americans would agree that schools should aim to ensure that all students are proficient in these core subjects.</p>
<p>NCLB requires states to divide schools into those making &#8220;Adequate Yearly Progress&#8221; (AYP) toward the goal of having all of their students proficient in math and reading by 2014 and those that aren&#8217;t. While the term &#8220;progress&#8221; would seem to imply that the law considers how much students are learning over time, the federal system in fact is  based on a series of snapshots that fail to track individual students from one year to the next. Instead, to make AYP, schools must meet statewide targets for the percentage of students each year who are proficient. Those targets are gradually increased until they reach 100 percent in 2014. The percentage of proficient students within various subgroups, broken out by ethnicity, income, disability, and English-language-learner status, must also meet these same targets. If a school does not make AYP for two consecutive years, parents are given the choice of another school and, after five failing years the school is to be restructured.</p>
<p>But does the AYP yardstick actually distinguish between higher- and lower-quality schools? The answer to this question is best obtained by looking at how much students at the school know at the end of the year, as compared to how much those same students knew one year previously. If students are making large achievement gains, the school would seem to be more effective than if student improvement is meager or nonexistent.</p>
<p>Surprisingly, in much of the United States, it is not possible to track an individual student&#8217;s achievement over the course of a year to determine how well the federal yardstick identifies schools where students are learning the most. In Florida, however, the topic can be explored systematically because that state&#8217;s Department of Education has assembled an impressive warehouse of data on student performance.</p>
<p>As long as students remain within the state, it is possible to track how well most of them are doing from one year to the next on the Florida Comprehensive Achievement Test (FCAT), the exam the state uses to comply with NCLB requirements. (Privacy concerns preclude general release of the data, but qualified researchers who sign a confidentiality agreement can apply for access.)</p>
<p class="tocheading"><strong>Checking the Federal Yardstick</strong></p>
<p>We drew on this information to calculate how much students learned, on average, in each school in Florida during the 2003-04 school year. We first subtracted from each student&#8217;s test score performance the child&#8217;s demonstrated knowledge the previous year.We then adjusted those one-year-gain scores to take into account a statistical property that artificially generates larger gains for initially low-performing students (and smaller gains for high performers). Finally, we compared the average gains by students in schools meeting and not meeting the requirements for AYP.</p>
<p>The results were telling. On average, students in schools making the AYP target gained on their math achievement test an amount that was only 9 percent of a standard deviation more than the amount gained by students at schools said not to be making the AYP grade. The difference in gains in reading was just 7 percent of a standard deviation (see Figure 2). A full standard deviation&#8217;s worth of progress equates to about four years of elementary schooling, so gains of 9 percent total a bit more than a third of a school year. A difference of that magnitude is surely worth noting, yet it is hardly enough to warrant saying one school is adequate while the other is not.</p>
<p><img src="http://educationnext.org/files/ednext20064_76fig2.gif" border="0" alt="" width="500" height="440" /></p>
<p>Nor is it the case that schools making AYP are those doing a much better job with minority student populations. In math, the differences in gains made by African Americans and Hispanics at AYP schools and non-AYP schools are 11 and 12 percent of a standard deviation, respectively. In reading, the difference in gains for both groups is 6 percent of a standard deviation. Clearly, such differences are not so dramatic as to be the basis for federal intervention.</p>
<p>Schools face varying challenges that depend in part on the populations they serve, so perhaps the federal yardstick does better when those challenges are considered. But that proved not to be the case. When we adjusted the gains made by students in each school to take into account a wide variety of individual and peer-group background characteristics, such as ethnicity, English language-learner status, family income, and student mobility rates, the yardstick&#8217;s performance actually worsened. In fact, the apparent benefit of attending a school that had made AYP was only 4 percent of a standard deviation in math performance and just 2 percent of a standard deviation in reading. To be credible, a grading system must do better than that.</p>
<p>Still another way of thinking aboutthe accuracy of the NCLB yardstick isto calculate the probability that AYP identifies correctly the higher-performingof any two schools being compared. Of course, any two-category classification system will get it right 50 percent of the time, by chance alone, just as one can guess correctly half of the time which way a coin will flip.</p>
<p>How much better than chance did the NCLB grading system do in Florida in 2004? In math, a school that made AYP outperformed a random non-AYP school 71 percent of the time. In otherwords, 29 percent of the time the school in which students are making smaller gains is the one that passed AYP, a pretty hefty error rate (see Figure 3). In reading,that error rate was 28 percent. To be wrong nearly three times out of ten does not inspire confidence-especially when one can get it right half the time simply by random guessing.</p>
<p>So error-prone an emissions-testing program would soon invite the wrath of the auto-owning public.</p>
<p><img src="http://educationnext.org/files/ednext20064_76fig3.gif" border="0" alt="" width="500" height="465" /></p>
<p><strong>Testing Florida&#8217;s Approach</strong></p>
<p>But can any other accountability system, especially one put together by a legislative body, do any better? Are we using the perfect to criticize the good? We can check this by comparing the federal yardstick with the one used byFlorida as part of its state accountabilitysystem.</p>
<p>Florida&#8217;s A+ Plan for Education(A+ Plan) rewards schools for ensuring that their students reach a minimum level of proficiency in math and reading, just as NCLB does. But unlike the federal grading system, the A+ Plan bases half of its points on the percentage of students in each school who improved their performance against state standards over the previous year. Equally important, it divides schools into five easily recognized categories that range from A to F, instead of just the two bureaucratically labeled categories employed by the federal government.</p>
<p>The Florida accountability system has its own limitations. But by having five categories, A through F, it provides parents and taxpayers with a good deal of useful information. Admittedly, some of the finer distinctions attempted by the Florida A+ grading scheme do little better than the federal AYP grading scheme. In 2004, for example, average learning gains in math were only 7 percent of a standard deviation higher in A schools than in those given a B (see Figure 2).</p>
<p>But the performance of the A+ Plan improves when schools are assigned significantly different grades.The math learning gap between A and C schools was 11 percent; between A and D schools it was 14 percent, and the gap between A and F schools differed by 25 percent of a standard deviation. Put in more familiar language, the one year difference between A and F schools amounted to more than a full year&#8217;s worth of learning. In reading, the differences were almost as large.</p>
<p>As with AYP, we calculated an error rate for the Florida grading system, the chance that one would make a mistake-that is, pick a school where average learning rates were lower-if one picked a school solely on the basis of its official grade. Once again, the Florida A+ Plan can be seen to be employing a more accurate measuring stick than the NCLB one, where the error rate, it should be remembered, was nearly 30 percent. Under Florida&#8217;s own accountability plan, parents would make an error 30 percent of the time if they chose an A school over a B school on that basis alone. But as Figure 3 shows, mistakes happen much less frequently if one picks an A school rather than a C, D, or F school. Indeed, one can haveas much confidence in Florida&#8217;s distinction between an A and an F schoolas the Food and Drug Administration requires when evaluating drugs subject to rigorous clinical trials.</p>
<p>The Florida system also does a better job of isolating the seriously defective schools, helping state and local officials identify exactly where attention is needed. In 2004, only 47 of the state&#8217;s 2,649 schools were given an F, while 184 were given a D. Meanwhile, under the federal yardstick, 75 percent of schools did not make AYP, including more than half of the schools Florida had given an A (see Figure 1).</p>
<p>As these numbers suggest, having two accountability systems operating simultaneously has generated a great deal of confusion in Florida, as it has in other states. Things could be improved by melding both systems into one, but only if the revised system can do a better job of identifying schools where student achievement is rising and of isolating the worst-performing schools for remediation.</p>
<p class="tocheading"><strong>A National Problem</strong></p>
<p>The shortcomings of the federal law&#8217;s yardstick have a ready explanation. Because NCLB schools are evaluated primarily on the basis of achievement<em> levels</em>, the evaluation cannot readily detect how much <em>growth</em> is taking place within a school, simply because children come with dramatically different educational endowments. The correlation between school average levels and growth in the 2003-04 school year was just 0.63 in math and 0.71 in reading-a positive relationship, to be sure, but hardly one on which to construct a meaningful accountability system.</p>
<p>Some may argue that our focus on student growth is misplaced, that Congress, when devising its formula for gauging AYP, did not intend to distinguish good schools from less effective ones. Its sole aim was to make sure that every school would by 2014 bring every student up to proficiency,and a level-based system is needed to direct reformers&#8217; attention to those schools and districts with the farthest distance to go.</p>
<p>But such claims are difficult to square with the legislators&#8217; designation of schools as not making &#8220;AdequateYearly Progress,&#8221; much less with the fact that the law gives families the option to attend another school if their school twice fails to make AYP. Why let families move to another school without evidence that their children will learn more at the new address?</p>
<p>Of course, we have direct evidence about how the NCLB grading system is playing out from only one state. But scholars from the Northwest Evaluation Association have similarly documented the loose connection between growth scores and the level-based measures of school performance that underpin the AYP grading system in their database of 840 schools in 22 states, suggesting that the problem we have identified is hardly limited to Florida. Since the federal yardstick fails to zero in on how much each student is learning, it can hardly be otherwise.</p>
<p>It must also be admitted that most states could not have used growth scores when NCLB was enacted, simply because most states had not constructed the tracking system Florida has put together. Congress may have done all that it could in 2002. But since other states are now beginning to build their own warehouses of data that follow the progress of individual students, the time has arrived when a legislative fix should be feasible.</p>
<p>It will take Congress to do the job, since the original law was written with such specificity that it is virtually impossible to correct it through administrative action alone. Experienced authors know there&#8217;s no such thing as good writing-only good rewriting. Let&#8217;s hope that when NCLB is reauthorized Congress can avoid partisan bickering and use the information coming back from the states to improve on their first draft. People deserve to know that when the federal government says a school is not working, it means it.</p>
<p><em>-Paul E. Peterson is professor of government at Harvard University and a senior fellow at the Hoover Institution. Martin R.West is an assistant professor at Brown University. Both serve as editors of</em> Education Next.</p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=3853947&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/is-your-childs-school-effective/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>RAND versus RAND</title>
		<link>http://educationnext.org/randversusrand/</link>
		<comments>http://educationnext.org/randversusrand/#comments</comments>
		<pubDate>Thu, 20 Jul 2006 21:02:53 +0000</pubDate>
		<dc:creator>Eric A. Hanushek</dc:creator>
				<category><![CDATA[Check the Facts]]></category>

		<guid isPermaLink="false">http://content.hks.harvard.edu/educationnext/?p=3391341</guid>
		<description><![CDATA[What Do Test Scores in Texas Tell Us? by Stephen P. Klein et al. ]]></description>
			<content:encoded><![CDATA[<p class="tocheading"><strong>The Sequel</strong></p>
<p><strong>What Do Test Scores in Texas Tell Us? </strong></p>
<p><span> </span></p>
<p><em><span>by Stephen P. Klein, Laura S. Hamilton, Daniel F. McCaffrey, and Brian M. Stecher </span></em></p>
<p><span><em>RAND Corporation, 2000.</em> </span></p>
<p><span> </span></p>
<p>Just two weeks before the presidential election, yet another team of RAND researchers released a short paper that seemingly contradicted Grissmer et al.’s celebration of Texas’s achievement gains on the NAEP. RAND II found only small NAEP achievement gains in Texas, similar to those nationwide and contrasting sharply with “soaring” scores on the Texas Assessment of Academic Skills (TAAS). These disparities, the authors suggested, point to potentially serious flaws in Texas’s state-run testing program.</p>
<p>The direct conflicts of RAND I and RAND II underscore the fact that RAND is a collection of franchisees. The parent company attempts to maintain some degree of quality control but ultimately is not able fully to adjudicate quality—particularly, one suspects, when the answers are fuzzy and when the sponsor pressures are high.</p>
<p>RAND II presents two separate analyses that, taken together, seem to undermine Texas students’ spectacular gains on the TAAS. First, Texas students showed substantially more improvement on the TAAS than they did on the NAEP during the 1990s. Second, in a sample of 20 schools that the authors had collected for other purposes, the expected negative relationship between a student’s TAAS score and his eligibility for the federal school lunch program, a common measure of disadvantage, didn’t arise on the TAAS. This latter finding led RAND II to conclude not just that the TAAS is a poor instrument but also that high-stakes testing leads to the artificial inflating of scores through “teaching to the test,” especially for disadvantaged students.</p>
<p>It should not be particularly surprising that student performance improved more dramatically on a test that was aligned with a particular state’s curriculum (the TAAS) than on a more generic test of subject matter (the NAEP). Thus, while the question of the TAAS test’s validity is an important one, the simple evidence presented in RAND II falls very short of yielding any solid answers.</p>
<p>Likewise, the fact that data on 20 schools show a peculiar relationship with any variable is unremarkable. After all, even if the authors attempted to draw a representative sample—which they did not—the idiosyncrasies of such a small sample would preclude any ability to generalize. Indeed, a simple plot or a formal statistical analysis of TAAS scores across all Texas schools reveals a clear, and expected, strong negative relationship between students’ scores and their eligibility for subsidized school lunches.</p>
<p>The point of clearest conflict with RAND I is the consideration of NAEP performance. RAND I—not as focused on the relationship between its statistics and presidential campaigns—considered all seven NAEP tests given between 1990 and 1996 and attempted to adjust for differences in the students’ backgrounds. The result was high marks for Texas’s performance improvements on the NAEP. RAND II, by contrast, ignored student background, placed more weight on a different subset of test results (including the 1998 results, which were not included in RAND I), used somewhat different approaches, and concluded that there was nothing special about performance in Texas.</p>
<p>What lessons might we take away from the RAND I vs. RAND II debate?</p>
<p>• Analyses of small amounts of imperfect data can yield widely different conclusions. Such analyses should be heavily discounted.</p>
<p>• Consideration of a study’s quality tends to get lost in the ensuing policy discussion. Neither RAND study holds up to a modicum of scrutiny.</p>
<p>• The desire for publicity apparently pushes some researchers to prepackage their own sound bites. The PR blitzes that accompanied both RAND I and RAND II undermined any public discussion of what turns out to be relatively impotent research designs.</p>
<p>• Journalists tend to judge a study’s quality—particularly a complicated statistical study—by its conclusions and by an undue emphasis on the study’s source rather than the strength of its analysis. RAND’s undeniable history of producing solid research doesn’t mean that every study under the RAND imprimatur deserves unquestioned repeating.</p>
<p>The result is a distorted and unhealthy policy discussion.</p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=3391341&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/randversusrand/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Deconstructing RAND</title>
		<link>http://educationnext.org/deconstructing-rand/</link>
		<comments>http://educationnext.org/deconstructing-rand/#comments</comments>
		<pubDate>Thu, 20 Jul 2006 20:52:56 +0000</pubDate>
		<dc:creator>Eric A. Hanushek</dc:creator>
				<category><![CDATA[Check the Facts]]></category>

		<guid isPermaLink="false">http://content.hks.harvard.edu/educationnext/?p=3391226</guid>
		<description><![CDATA[Improving Student Achievement: What NAEP State Test Scores Tell Us by David W. Grissmer et al.]]></description>
			<content:encoded><![CDATA[<p><strong>Improving Student Achievement: What NAEP State Test Scores Tell Us </strong></p>
<p><span> </span></p>
<p><em><span>by David W. Grissmer, Ann Flanagan, Jennifer Kawata, and Stephanie Williamson </span></em></p>
<p><span><em>RAND Corporation, 2000.</em> </span></p>
<p><span> </span></p>
<p>In the summer of 2000, perfectly timed to shape the election debate over education reform, came a new RAND study that claimed to contradict the conventional research wisdom on the connection between school expenditures and class size on the one hand and student achievement on the other. “Our results certainly challenge the traditional view of public education as ‘unreformable,’” the study’s director, David Grissmer, said in an accompanying press release. “But the achievement of disadvantaged students is still substantially affected by inadequate resources. Stronger federal compensatory programs are required to address this inequity.” While academic studies usually retire to footnote-land, a well-orchestrated PR blitz pushed the RAND report to the front pages. It even earned prominent campaign mentions: Both presidential candidates commandeered the study’s findings to their own ends—Al Gore to support his proposal to lower class sizes, George W. Bush to trumpet Texas’s accountability system.</p>
<p>A trusted name like RAND lent instant credibility to the study’s results—so much credibility that the major newspapers reported the findings without even a question mark. This, combined with the lack of statistical expertise among journalists and the crushing deadlines under which they work, allowed RAND to sculpt the dissemination of its results with a carefully worded press release that pumped its most provocative yet methodologically flawed conclusions. “The education reforms of the 1980s and 1990s seem to be working,” the release began. It went on to highlight the report’s finding that “[d]ifferences in state scores for students with similar families can be explained, in part, by per-pupil expenditures and how these funds are allocated.” In particular, RAND reported that, other things being equal, National Assessment of Educational Progress (NAEP) scores in math are higher in states that have:</p>
<ul>
<li>higher per-pupil expenditures</li>
<li>lower pupil-teacher ratios in the early grades</li>
<li>higher percentages of teachers reporting that they have adequate resources</li>
<li>more children in public prekindergarten programs</li>
<li>lower teacher turnover</li>
</ul>
<p>These highlights were asserted without qualification or doubt, without any mention of weaknesses in the data or the analysis. RAND was notably less charitable with results that accorded with past research findings. The press release at least mentioned the study’s finding that “having a higher percentage of teachers with master’s degrees and extensive teaching experience appears to have comparatively little effect on student achievement across states. Higher salaries also showed little effect.” But here the authors were sure to qualify their findings, carefully emphasizing that “salary differences may have more important achievement effects within states than between states.” The authors quickly rushed past these less popular findings to boldly propose specific policy interventions: “To raise achievement scores, the most efficient and effective use of education dollars is to target states with higher proportions of minority and disadvantaged students with funding for lower pupil-teacher ratios, more widespread prekindergarten efforts, and more adequate teaching resources.” In short, any reader of the news release—or the articles it generated—might have reasonably concluded that RAND, the highly respected think tank, had overturned years of research (including this author’s).</p>
<p>What research does the RAND study purport to contradict? Between 1960 and 1995, per-pupil spending in the United States (in constant 1996–97 dollars) grew dramatically, from $2,122 to $6,434, a threefold increase. This trend cannot be explained by the country’s increased commitment to disabled students, which at most accounts for just 20 percent of the increase. At the same time that costs were rising, the student-to-teacher ratio fell by about a third, from 26:1 to 17:1. Nevertheless, despite our greatly enhanced commitments to public education—and despite the fact that children are growing up in better-educated and smaller families than ever before—student performance during this period, as measured by NAEP test scores for high school seniors in math and reading, moved hardly a hair’s breadth. Complementing these overall trends are more than 400 studies that have searched for a connection between spending and achievement in particular schools, districts, and, occasionally, states. In general, these studies have been unable to detect any consistent, positive relationship between increased resources and student learning.</p>
<p>This is not to say that schools don’t matter. The best of these studies, so-called value-added studies that concentrate on the determinants of growth in achievement across individual classrooms, find that differences in teacher quality have a profound impact. But they also find that teacher quality is not closely related to school resources. The only studies that consistently find positive effects of resources are those that rely on student performance and school data averaged across all students and schools in a state.These aggregate studies, of which the RAND study is one, rely on limited data and are prone to serious statistical shortcomings, so they have been heavily discounted in the past. Undaunted, RAND’s researchers argue that their results should lead to a reinterpretation of three decades’ research.</p>
<table border="0" cellpadding="5" align="center">
<tbody>
<tr>
<td bgcolor="#eeeeee"><strong><span style="color: navy">The major newspapers reported RAND’s findings without even a question mark.</span></strong></td>
</tr>
</tbody>
</table>
<p>However much they might protest, RAND’s researchers for the most part have only confirmed what has been known all along. In fact, the RAND study is startling in its conformity to conventional wisdom. RAND’s best model for estimating the impact of spending increases on student performance yields an estimate that an additional $1,000 per student—a $50 billion annual increase nationally—would yield a rise in performance of about two percentile points (just 0.05 standard deviations), a trivial impact (see Figure 1). Moreover, the RAND study repeats the finding that teachers’ salaries, experience, and whether or not they hold a master’s degree bear little or no relationship to student performance.</p>
<table border="0" width="500" align="center">
<tbody>
<tr>
<td align="center"><strong><span style="font-family: verdana,arial,helvetica,sans-serif">High Costs, Low Returns (Figure 1)</span></strong></p>
<p><em>Though RAND said resources were inadequate, it actually found that huge increases in spending would raise test scores by only a trivial amount.</em></p>
<p><strong>National cost of obtaining a 2 percentile increase in NAEP performance using RAND cost estimates<a href="#fig1note">*</a></strong></td>
</tr>
<tr>
<td><img src="http://educationnext.org/files/ednext2001sp_65b.jpg" border="0" alt="" width="494" height="370" /></td>
</tr>
<tr>
<td><a name="fig1note">*</a>The hardcopy of Education Matters erroneously refers to Normal Curve Equivalents (NCE) instead of percentiles in the text and presents cost estimates for changing NCE scores by two points in Figure 1.  NCEs are a transformed version of percentile scores that follow a normal distribution with a mean of 50 and a standard deviation of 20.</p>
<hr size="1" /></td>
</tr>
</tbody>
</table>
<p>What about the study’s most celebrated finding, on the impact of class size? The study found that class size, as measured by a state’s average pupil-to-teacher ratio, has a minuscule impact on the performance of the average student. At best, the RAND study is just another in a long list of reports that have demonstrated the minimal impact of school resources on the typical student’s performance. RAND attempts to distance itself from the conventional research wisdom by declaring that “money, if spent appropriately, is productive.” But who would be surprised by such a tautology? Only if we are told exactly which expenditures are productive can the study give much guidance. But the RAND study’s data are too weak and its methodology too flawed to support the specific policy recommendations its authors make.</p>
<p class="tocheading"><strong>A Sow’s Ear</strong></p>
<p>Be skeptical when a research analyst tells you he has fashioned a silk purse out of the proverbial sow’s ear. Consider the limitations of the data with which RAND was working. The study’s sample consisted of 44 independent observations—the states that voluntarily participated in one or more of seven NAEP tests that were administered from 1990 to 1996. Moreover, the number of states participating in any one test varied from 35 to 44. Tests were given in 8th grade math in 1990, 1992, and 1996; in 4th grade math in 1992 and 1996; and in 4th grade reading in 1992 and 1994. Although RAND attempted a variety of analytical methods, its general approach was to estimate the impact of family background and measures of school resources on average student performance on as many of these tests as were administered in a given state.</p>
<p>The NAEP tests themselves have certain advantages.They have been carefully designed, the same test is given in all states, and they allow for comparisons from one time period to the next. Schools have few incentives to score high on the NAEP, leaving little chance that much “cheating” or “teaching to the test” goes on. It is troublesome that, when asked, a sizable number of schools exercise their right to refuse to participate in NAEP testing. Despite this drawback, though, the NAEP remains one of the best available measures of average student performance in most states.</p>
<p>But RAND’s analysis of the NAEP scores is another matter. First of all, 44 observations is a very small sample, so drawing any strong, statistically valid conclusions is at best difficult, at worst misleading. Moreover, data collected at the state level are marvelously imprecise. These aggregate data ignore the enormous differences within a state—implicitly assuming that the past three decades of legal challenges to the inequitable distribution of resources among well-to-do and poor school districts are groundless. When all these differences are averaged away so that it is impossible to identify their importance, how can we possibly have high quality data that trump all previous research on the subject?</p>
<p>While the measure of student performance with which RAND was working was adequate, not much else was. RAND attempted to control for the family background of the students taking the test, but the only information on family background available to RAND was census figures on the average statewide education and income of school-age families in 1990. RAND attempted to adjust these data to the actual years the students took the tests by assuming that these factors change precisely with changes in the racial composition of test-takers. But, of course, one cannot assume that the education and income of students of different racial groups change at the same rate in all 44 states. And RAND did not have any information from individual students; throughout its analysis it refers to average results across the state. So, from the very beginning, RAND was forced to work with an imprecise measure of the characteristics of students who actually took the tests.</p>
<p>Similarly, RAND used statewide averages as its measure of school resources, an extremely imprecise indicator of the actual resources being spent on particular students who attend specific schools. RAND also relied on statewide averages of teachers’ impressions of whether their school supplies were adequate, statewide averages of prekindergarten attendance, and statewide averages of class size. These averages obviously mask wide disparities within a state.</p>
<p>The RAND researchers insist that their study is superior because they factored in the average school resources for all the years that students were in school, a measure they find superior to studies that look only at current resources being spent on a student. This may be a worthy research innovation, though the average school resources available to a student from one year to the next do not change dramatically—unless the student moves, something that happens with surprising frequency. In 1995, 6 percent of the school-age population lived in a different state than they had in 1990; another 2.5 percent had been living outside the United States in 1990. These percentages vary widely among regions of the country. In the mountain states, as many as 15 percent of students had lived elsewhere in 1990. None of this movement was taken into account by the RAND study.</p>
<p class="tocheading"><strong>Where Are the Reforms?</strong></p>
<p>These weaknesses in the data were exacerbated once RAND tried to glean specific policy recommendations from its findings. For instance, RAND says that we should reduce class size in states with higher shares of disadvantaged students. RAND, however, doesn’t ever look at whether or not disadvantaged students are in large classes, because they have averaged across all students in the state.</p>
<p>RAND also says that students perform better when teachers think that their supplies are adequate. This finding is plausible. If teachers have adequate materials, one would expect them to be more effective. But it suffers from the chicken-and-egg problem: We can’t be sure whether high-performing students make teachers feel better about their supplies, or whether the supplies themselves have a causal impact. And, of course, this subjective question means wildly different things to teachers in different schools and states.</p>
<p>Much the same can be said for the finding that low teacher mobility leads to higher student performance. Do high levels of teacher mobility lower student performance, or does low performance increase the chances that teachers will move on? One simply cannot tell from the kind of data with which RAND was working.</p>
<p>The authors admit that it would be preferable to have data from the schools that students actually attended. But they claim that using statewide data allows them to consider the fact that the states, not local school boards, are the ultimate political entities responsible for public education within their boundaries. Only by looking at states as a whole can one incorporate the panoply of state policies that may influence school achievement, RAND says. To be sure, statewide analyses can provide accurate estimates of the impact of school resources—but only if the analyst includes within the statistical model all the factors that affect student performance and, in the standard linear regression model generally favored by RAND, if these factors have a constant, additive effect on student achievement. In other words, if the same amount of class size reduction has similar effects on those originally in very large classes and those originally in quite small classes, and if all other factors in the model work in the same constant, additive manner, then relying on state-level data can provide unbiased statistical estimates. But RAND itself argues that the impacts of resources on student performance are anything but constant and additive. Witness its conclusions on class size, where it finds that class-size reduction has its greatest effect in states with high shares of disadvantaged children. Witness also its finding that it is particularly important to reduce class sizes in states that begin with high average pupil-teacher ratios.</p>
<table border="0" cellpadding="5" align="center">
<tbody>
<tr>
<td bgcolor="#eeeeee"><strong><span style="color: navy">Scholars have been unable to detect any consistent, positive relationship between increased spending and student learning.</span></strong></td>
</tr>
</tbody>
</table>
<p>Finally, while the motivation of the entire study was to investigate the role and effect of different state policies, the only policies RAND’s researchers actually built into their main statistical models were differences in per-pupil spending, student-teacher ratios, and other resource variables. Except in an ad hoc fashion, RAND overlooked state efforts to establish accountability in the form of standards and testing and the wide variance in teacher certification requirements. The researchers themselves claim that these policies are important—in fact, they even suggest that such policies explain why Texas students perform better than California’s—yet they didn’t include variations in these policies in the models they constructed, except by creating “fixed effects” models that have so few independent observations that their results can’t survive rigorous statistical tests. In other words, RAND’s analysis failed to include the precise variables that the study itself claims are key.</p>
<p class="tocheading"><strong>Overturned?</strong></p>
<p>The RAND study’s authors want to convince people that they have identified the most effective interventions and that outcomes are improving as a result of past reforms. If true, the authors argue, then there is no need to consider more fundamental changes in the education system’s structure or incentives. In order to make this case, the authors must prove that most earlier studies of the impact of school resources on student achievement should be disregarded.</p>
<p>Most scholars believe that studies that look at the impact of resources available to individual schools and specific school districts should be given the heaviest weight because they are the most precise. These studies are also the least likely to find that per-pupil expenditures, teacher pay, or class sizes make a difference.</p>
<p>The studies most likely to find that school resources have a positive effect rely on statewide data, like RAND’s. In this sense, RAND simply repeats an already well-known finding: that if you rely on imprecise statewide data and if you ignore all other aspects of state educational policy, you will often find that average statewide school spending and class size have at least a minor effect on student performance. But as mentioned previously, these studies have a serious methodological limitation: They rely on average results obtained from large, heterogeneous units that differ from one another in many ways other than the amounts they spend on schools.</p>
<p>RAND claims that only by looking at statewide data can you include the impact of statewide policies. Yet statewide studies have not yet found a way of including information about these policies in their statistical analyses. As a result, it is difficult to place more weight on these findings than on those that look at individual schools and school districts.</p>
<p>Again, this is not to say that schools don’t matter. On the contrary, value-added studies find that teacher quality has a major impact on student performance. If we could find ways of keeping good teachers in the classroom—perhaps by giving these successful teachers the additional compensation it would take to encourage them to make teaching a lifelong career—then we could probably boost student performance significantly.</p>
<table border="0" cellpadding="5" align="center">
<tbody>
<tr>
<td bgcolor="#eeeeee"><strong><span style="color: navy">The RAND study is startling in its conformity to conventional wisdom. A huge, $50 billion annual increase in spending would yield a trivial two-point rise in test scores.</span></strong></td>
</tr>
</tbody>
</table>
<p>But the authors of the RAND study take exception to value-added research. They claim that value-added studies that measure gains from one point in time to the next fail to account for the fact that “two students can have pretest scores and similar schooling conditions during a grade and still emerge with different posttest scores influenced by different earlier schooling conditions.” Put simply, Suzie may learn more than Johnny in 3rd grade not because Suzie had the better teacher that year but because she may have had a better education the previous year, even though this was not reflected in her 2nd grade test score. Since value-added studies usually don’t incorporate a student’s entire educational history, their results, according to the RAND study, may be biased in some unknown direction.</p>
<p>RAND, however, doesn’t provide any persuasive evidence that this is the case either in its own study or from other studies. Of course, one cannot rule out the possibility that gains in a particular year may somehow be influenced by events in the past. But RAND’s critique of value-added studies comes back to haunt its own research. If its critique is valid, then RAND’s own results are just as flawed as the results of the studies RAND criticizes. If earlier school conditions are important and affect the impact of current resources on student achievement, then one cannot assume constant, additive effects across all students in the state—the RAND researchers’ own methodology. Instead it is necessary to know the specific paths of resources to the individual students in the state and to incorporate that information into the statistical analysis. In other words, the very arguments the authors use to make the case for the superiority of their estimates over the hundreds of previous estimates again undermine their own analysis.</p>
<p>The RAND researchers also try to bolster their methodology by referring to the Project STAR (Student-Teacher Achievement Ratio) experiment, which involved a substantial reduction in class size (from an average of 24 students to an average of 16 students) in Tennessee. The study has received a great deal of attention, in part because it is one of the few evaluations of school resources based on random assignment of students to test policy effects while controlling for other conditions, a method that is generally thought to be a high-quality research design. However, the findings from the study are often misunderstood and misinterpreted, and RAND’s scholars have only added to the confusion.</p>
<p>In essence, the Tennessee study shows that students in substantially smaller classes in their first year of schooling (whether kindergarten or 1st grade) perform better than those remaining in classes of larger size. No similar benefits were observed for students in older grades, however. Those in the smaller kindergarten classes maintained the same higher achievement level that they had realized in kindergarten.</p>
<p>The STAR study, while methodologically superior to the RAND study, has its own limitations. The principle of random assignment was potentially compromised in several ways, and no student test information was obtained before assigning students to “control” and “experimental” groups. As a result, it is unclear how much the study, as implemented, deviated from a random-assignment design. Since almost all the gains from small-class assignment were registered in the initial year, it is possible that even these small “gains” were apparent rather than real.</p>
<table border="0" cellpadding="5" align="center">
<tbody>
<tr>
<td bgcolor="#eeeeee"><strong><span style="color: navy">RAND’s interpretation of its results far exceeds the normal bounds of inference, suggesting that the authors had a prior policy commitment.</span></strong></td>
</tr>
</tbody>
</table>
<p>But even if the STAR study doesn’t suffer from these implementation flaws—without baseline data we’ll never know one way or another—the study is not open to the inferences made by the RAND researchers. First, RAND assumed that the STAR study demonstrates that class-size reduction is effective in multiple grades when in fact it demonstrates, at most, that a very large reduction in class size has positive effects only in the first year of schooling. After that, the initial effects only manage to survive—they do not continue to increase even when the student remains in much smaller classes. Yet RAND uses these results to justify its policy recommendation to lower class size throughout the elementary school years.</p>
<p>Second, the RAND authors try to validate their own problematic methodology by claiming that their estimates of the effects of class size reduction are essentially the same as those obtained from the STAR study. But assessing the validity of studies by their answers violates all scientific principles. Generally speaking, a study’s validity depends on the scientific merits of its methodology, not the results it obtains. And even if one were to accept RAND’s claim to validity by virtue of its match with the results of another study, this claim applies only to the class-size findings.</p>
<p class="tocheading"><strong>Conclusions</strong></p>
<p>RAND’s claims to have overturned conventional research wisdom are highly problematic. The report draws sweeping conclusions from average statewide data for just 44 states. The analysis of these data is subject to significant analytical error. The authors leave out of the statistical equations factors that they themselves insist are of critical importance. Claiming that only state-level analysis can take state policies into account, the researchers then leave key state policies out of their most crucial equations.</p>
<p>Worse, the interpretation of the results far exceeds the normal bounds of inference, thereby suggesting that the authors had a policy commitment that shaped their handling of the material.</p>
<p>But let’s take the RAND study at its word. If we do, we would conclude that, in general, education expenditures have little effect on student performance, that increasing teacher pay yields no effect, that the effects of class-size reduction depend very much on the state in which it is implemented, that monies should be set aside so that teachers who say they need them have more materials. The study also asserts that the strong accountability systems in Texas and North Carolina led to particularly spectacular student achievement gains in the early to mid-1990s. This is not necessarily a bad policy agenda. But one can hardly cite the RAND study as scientific evidence that it is the correct one. The conclusions reached by the RAND authors are based more on their personal sense of plausibility than on results from high-quality data subject to properly specified statistical equations.</p>
<p>–<em><a href="http://www.hoover.org/bios/hanushek.html">Eric A. Hanushek</a> is a senior fellow at the Hoover Institution at Stanford University and a research associate of the National Bureau of Economic Research. </em></p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=3391226&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/deconstructing-rand/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Selective Reporting</title>
		<link>http://educationnext.org/selective-reporting/</link>
		<comments>http://educationnext.org/selective-reporting/#comments</comments>
		<pubDate>Wed, 19 Jul 2006 22:51:24 +0000</pubDate>
		<dc:creator>Chester E. Finn, Jr.</dc:creator>
				<category><![CDATA[Check the Facts]]></category>

		<guid isPermaLink="false">http://content.hks.harvard.edu/educationnext/?p=3384531</guid>
		<description><![CDATA[Quality Counts 2001, A Better Balance: Standards, Tests, and the Tools to Succeed by the editors of Education Week ]]></description>
			<content:encoded><![CDATA[<p><img src="http://educationnext.org/files/ednext20013_69.gif" border="0" alt="" width="200" height="175" align="right" /></p>
<p><strong>Quality Counts 2001, A Better Balance: Standards, Tests, and the Tools to Succeed</strong><br />
by the editors of <em>Education Week</em><br />
<em>Editorial Projects in Education, 2001.</em></p>
<p>In just five years, <em>Education Week</em>&#8216;s high-profile annual compilation <em>Quality Counts (QC)</em> has emerged as perhaps the K-12 education field&#8217;s most prominent source, besides the publications of the federal government, of statistical information, particularly at the state level. The reporters and editors of <em>Education Week</em>, which modestly styles itself &#8220;American Education&#8217;s Newspaper of Record,&#8221; prepare <em>QC</em>, with generous subsidy from the Pew Charitable Trusts. Appearing each January, <em>QC</em> typically runs to a whopping 200 folio-size pages.</p>
<p>Each successive edition of <em>QC</em> includes some familiar measures, drops some old categories, and adds some newly developed ones, the latter tied mostly to the year&#8217;s policy theme. This year&#8217;s theme was attaining &#8220;A Better Balance&#8221; between academic standards and tests on the one hand, and what the editors term &#8220;the tools to succeed&#8221; on the other. (In 2000, the theme was teachers; in 1999, accountability.) Besides thousands of numbers, <em>QC</em> features dozens of interpretive essays by <em>Education Week</em> reporters and editors&#8211;thus raising the dual specters of selective statistics and biased journalism.</p>
<p>We have no reason to doubt the bona fides of the editors, researchers, and advisors who choose the numbers and pen the essays. They presumably yearn to be interesting, timely, relevant, and influential. They want to get noticed and buzzed about. They want to sell copies, please their advertisers, gratify their donors, and ensure that next year&#8217;s edition is eagerly awaited (and chockablock with ads). If their report had no message, no conclusions, and no edge, it would be less noticed.</p>
<p>However, <em>Quality Counts</em>&#8216;s numbers and essays certainly do not get treated as  neutral entries in a wholly academic sweepstakes. In today&#8217;s education policy wars, for better or worse, no choice of a fact can be deemed wholly neutral. Facts are also weapons. Which ones you select matter a great deal. If, for example, you seek to convey to readers a sense of teacher salaries, it matters whether you report beginning salaries or those at the top rung; whether the focus is on the mean or the median; whether fringe benefits as well as cash wages are included; and whether, for perspective, teacher salaries are set alongside the earnings of bus drivers or neurosurgeons. (Teacher salaries didn&#8217;t appear in this year&#8217;s <em>QC</em>, but in 2000 average teacher salaries, adjusted for the cost of living, were reported, though not counted as part of each state&#8217;s &#8220;grade.&#8221;)</p>
<p class="heading"><strong>Framing the Question</strong></p>
<p>Subjectivity begins, of course, with the selection and framing of the theme itself. In choosing this year&#8217;s &#8220;Better Balance,&#8221; for example, the editors signaled that something is awry in the existing balance between the &#8220;hard&#8221; elements of standards-based reform (namely the academic standards, assessments, and interventions that make up a state&#8217;s accountability system) and such &#8220;soft&#8221; components as teacher training, instructional materials, and classroom environment.</p>
<p>Concern about this balance is as old as the standards movement itself. For at least a dozen years, a debate has raged in Washington and in state capitals over what the profession generally calls &#8220;opportunity to learn&#8221; standards, or OTL. This concern is often captured by the aphorism &#8220;It&#8217;s not fair to hold students accountable for learning things they&#8217;ve never been taught.&#8221; According to OTL doctrine, policymakers mustn&#8217;t attend solely to standards and results. They should also concern themselves with the education system&#8217;s ability to ensure that those being held accountable have ample opportunities and resources to attain the desired results.</p>
<p>Reasonable, yes? Sure&#8211;but only up to a point. It&#8217;s a fact that education policymakers cannot confine themselves to goals and results. They also need to be reasonably confident that the available resources and institutional arrangements have a fighting chance of producing the desired outcomes&#8211;that salaries are high enough to draw talented applicants, that school districts can provide students with up-to-date textbooks and technology.</p>
<p>However, it is easy to lose one&#8217;s focus on results while bogging down in resource arguments. That seems to be just fine with those who are nervous about accountability in the first place. OTL is the chief means by which yesterday&#8217;s fixation on school inputs and services reasserts itself in today&#8217;s era of results-based education. Opportunity to learn&#8211;or what <em>QC</em> terms &#8220;the tools to succeed&#8221;&#8211;can become a handy, even virtuous, excuse for not holding anyone to account for actually teaching or learning anything, or at least for justifying mediocrity. There is always some inadequacy or shortcoming to be found somewhere in the vastness of the K-12 delivery system, not to mention the varied problems the kids bring to school. Hence, as one starts down the path of &#8220;balance,&#8221; a reason can readily be found to rationalize unsatisfactory outcomes or to defer the day when results actually count.</p>
<p>The education profession has persuaded itself that all the inputs must be exactly right before any results should count for students, much less for those who teach them and lead their schools. Consequences for adults in the education system are politically touchy anyway, so OTL-type excuses for skirting them are particularly welcome. In statewide accountability systems, the notion of &#8220;cracking down&#8221; on the kids is widespread. But we look far and wide before finding any teachers or principals in serious jeopardy. It&#8217;s as if only the soldiers and not the officers are being held to account for winning or losing the battle. Whenever someone suggests accountability for the educators, the furor that follows combines OTL concerns (what if the teachers didn&#8217;t have enough professional development? What if there was high turnover among their pupils?) with moral indignation and invocation of seniority rules, tenure laws, and contractual rights.</p>
<p class="heading"><strong>Captured by the System</strong></p>
<p>The editors of <em>Education Week</em> have succumbed to OTL-type reasoning, more vividly in 2001 than in the preceding four editions of <em>QC</em>. &#8220;States,&#8221; they now write, &#8220;must balance policies to reward and punish performance with the resources needed for students and schools to meet higher expectations.&#8221; The fundamental message of <em>QC 2001</em> is that such &#8220;balance&#8221; is lacking and needs to be developed.</p>
<p>Thus <em>Quality Counts 2001</em> succors those made uneasy by standards-based reform and high-stakes testing. In so doing, it partakes of the central assumptions of the education profession itself and risks sliding over the edge into being a professional trade journal for educators, like, say, <em>Phi Delta Kappan</em> or <em>Educational Leadership</em>, rather than a watchdog on behalf of the broader American public.</p>
<p>Consider the report&#8217;s &#8220;Executive Summary.&#8221; The reader need penetrate only to paragraph three to find the caution lights flashing about standards and tests. The first paragraph reports that states have been trying hard to raise academic standards and that the public supports this effort. The second paragraph says that slow progress is being made. Then comes the big But. Paragraph three warns that, without a &#8220;better balance,&#8221; all this progress is in jeopardy, together with the life prospects of &#8220;tens of thousands&#8221; of youngsters. Paragraph four then closes in for the policy kill:</p>
<blockquote><p><em>Specifically,</em> Quality Counts <em>found, state tests are overshadowing the standards they were designed to measure and could be encouraging undesirable practices in schools. Some tests do not adequately reflect the standards or provide a rich enough picture of student learning. And many states may be rushing to hold students and schools accountable for results without providing the essential support.</em></p></blockquote>
<p>The full report has three major sections. Part I consists of six essays by <em>Education Week</em> reporters and editors, based partly on surveys and polls. Part II is the annual state-by-state report card, full of charts and tables assigning grades and rankings to the fifty states on their level of student achievement, progress in adopting standards and accountability, efforts to improve teacher quality, their school climate, and the resources they devote to education. Finally, in part III, come 80 pages of individual state profiles.</p>
<p>The essays in part I are troubling on several counts, beginning with their main source of &#8220;data,&#8221; which is a survey of public school teachers&#8211;and no one else. Teachers&#8217; views on education warrant careful attention, of course, but they&#8217;re certainly not the only affected parties and they&#8217;re among the most self-interested. To learn about foxes, one wouldn&#8217;t settle for polling only chickens. The results of this survey are predictable: protests about narrowing the curriculum, teaching to the test, inadequate professional development, unfairness toward disadvantaged and minority youngsters&#8211;and toward hard-working teachers themselves. To their credit, these essays also profile some states, districts, schools, and teachers that are responding constructively to standards and testing. But as one browses these pages to see whose opinions (besides teachers) are taken seriously, it becomes clear that most of the interviews and quotations come from critics and doubters within the education profession. Where are the comments from legislators, employers, or college admissions officers? The key essay on testing, for example, written by <em>QC</em> uber-editor Lynn Olson, quotes five teachers, ten academics, one parent, and two policymakers. The overwhelming majority of these comments are negative or skeptical toward high-stakes testing.</p>
<p>The trappings of objectivity and scholarly rigor are certainly present in part II, the report card: endless charts, elaborate footnotes, and long methodological explanations written in tiny type. Here reportorial selectivity yields to subtler decisions about which data to include and how to interpret them. Project research coordinator Ulrich Boser boasts in the report card&#8217;s introduction (none too subtly entitled &#8220;Pressure without Support&#8221;) that the tables are based on the &#8220;most comprehensive to date&#8221; survey of  &#8220;state policies that aim to hold schools and students responsible for results and build their capacity to reach academic standards.&#8221; In their effort to be contemporary, the researchers omit all sorts of long-term trends and patterns that might be even more revealing than the &#8220;very latest&#8221; data. For example, no effort is made to show the increase in public-school spending in America during the past 30 (or 50) years, the uses to which that money has been put, the steady reduction in class size, the huge increase in numbers of school employees, and the various trends in achievement that correlate almost not at all with any of these resource trends.</p>
<p><img src="http://educationnext.org/files/ednext20013_69fig1.gif" border="0" alt="" width="499" height="812" /></p>
<p>The data under the heading <em>student achievement</em> are fine. Its six subcategories are all based on states&#8217; National Assessment of Educational Progress (NAEP) scores in various subjects in grades 4 and 8. The key barometer throughout is what fraction of a state&#8217;s youngsters scored at or above &#8220;proficient&#8221; on the NAEP scale. In 4th grade reading in 1998, for example, scores ranged from a low of 17 percent in Hawaii to a high of 46 percent in Connecticut. Eleven states didn&#8217;t take part.</p>
<p>So far, so good. It&#8217;s exactly what one would want from a publication named <em>Quality Counts</em>: a nice, clear focus on academic results, namely student achievement, measured on the best yardstick available.</p>
<p>Turning to <em>standards and accountability</em>, we encounter three major subheadings, two of which (accounting for 70 percent of this grade) are also pretty solid. Under &#8220;standards,&#8221; states get points depending on how many core subjects and levels of schooling they have &#8220;clear and specific&#8221; standards in, as judged by the American Federation of Teachers. Under &#8220;accountability,&#8221; a state&#8217;s score depends on how many of five different ways it holds schools (not just kids!) accountable for their performance. All are reasonable things to look for, albeit the most important of them&#8211;&#8221;sanctions&#8221; for failing schools&#8211;can be found in just 14 states (including jurisdictions with plans to institute sanctions at some later date).</p>
<p>The &#8220;assessment&#8221; subheading is more problematic. Here a state can get full marks only if it uses five different kinds of test items, including &#8220;extended response&#8221; questions and &#8220;portfolios.&#8221; A state that relied on multiple-choice questions could not possibly do well here. This partakes of the view fashionable among educators that multiple-choice testing is inherently inadequate because it cannot be used to appraise anything but the most rudimentary of skills and factual recall-type knowledge. Of course that&#8217;s not so. A well-conceived multiple-choice question can probe deeply into a student&#8217;s command of complex cognitive skills, prowess at problem solving, and sophisticated knowledge of subject matter. To be sure, multiple-choice items cannot expose a student&#8217;s ability to write lucid prose or engage in original research, but they can go a long way toward revealing the sorts of things we want youngsters to know and be able to do. Moreover, they do so with great efficiency and speed, and they are low cost, flexible (computer-adapted items), and objective (with machine-based scoring).</p>
<p>Larger problems loom in the report card&#8217;s three remaining areas. In the section on <em>improving teacher quality</em>, a state&#8217;s grade depends in part on its embrace of some of the education profession&#8217;s trendier &#8220;reforms.&#8221; Rather than probing the skills and knowledge that a teacher imparts to her students, for example, <em>QC 2001</em> puts considerable weight on whether the state uses a &#8220;performance assessment&#8221; (including videotapes, portfolios, etc.) to appraise teachers. It also rewards states that give bonuses to teachers who have been certified by the National Board for Professional Teaching Standards. Unfortunately, we know from the work of economists Michael Podgursky and Dale Ballou and others that to date there is no hard evidence that being certified by the National Board translates into being an effective teacher.</p>
<p><em>QC</em> also tacitly privileges the conventional education-school path into the classroom, though it no longer rewards states for having their new teachers emerge from &#8220;nationally accredited&#8221; institutions. <em>QC 2001</em> does, however, assign points to states that require at least 12 weeks of practice teaching as part of a preparation program&#8211;not necessarily a bad thing, but limiting for states and districts that are experimenting with programs such as Teach for America and alternative pathways to certification. Indeed, <em>QC</em> grants no points to states with alternative-certification programs! (It did last year.)</p>
<p>The section on <em>school climate</em> has some good features. For example, a quarter of a state&#8217;s grade is based on having public-school choice and charter schools. Troubling, though, is the fact that 35 percent of the climate grade depends on having classes smaller than 25 pupils, which means that <em>QC</em> has taken sides in the great class-size debate, notwithstanding the rivers of doubt that Hoover Institution economist Eric Hanushek and others have poured on the notion that smaller classes are an efficient means of boosting achievement. The remaining 40 percent of a state&#8217;s climate grade addresses legitimate concerns such as classroom misbehavior, pupil tardiness, and the extent of parents&#8217; involvement in school. Unfortunately, those indicators depend on self-reporting by 8th graders. While we shouldn&#8217;t fault <em>QC</em>&#8216;s editors for the fact that these were the only such data they could find, we may wonder how reliable these numbers are.</p>
<p>The touchy topic of <em>resources</em> has two major subheads: adequacy and equity. Here is where one might most expect OTL doctrine to rule. Yet <em>QC 2001</em> is even more primitive, relying instead on dollars alone. A state&#8217;s grade on resource &#8220;adequacy&#8221; turns not on some calculus of what resources are needed to furnish its youngsters with an adequate education, but simply on how rapidly the state&#8217;s education spending is rising and how much of the state&#8217;s total worth is being devoted to education. This section might be called &#8220;quantity counts,&#8221; and it yields some curious results.</p>
<p>West Virginia, of all places, gets the highest grade here&#8211;a straight A&#8211;as it reportedly spent $8,322 per pupil on public education in 1999 and has been boosting its outlays faster than any other state and digging deeper than all but one. Yet West Virginia is at or below the national average on all the <em>QC</em> achievement scores, gets a D+ for standards and accountability, a C for teacher quality and a D+ for school climate. By contrast, Connecticut, which also spent more than $8,000 per pupil and which is in first or second place among the states on four of six NAEP scores (and eighth in the remaining two) clocks in with just a B- in &#8220;resource adequacy.&#8221; Adequate for what, one wonders. <em>Education Week</em>&#8216;s strange way of measuring adequacy lauds a state, like West Virginia, that has only recently begun raising its spending while punishing a state like Connecticut whose spending has been high for years. Likewise, West Virginia fares better than Connecticut because it is poorer; if both states spend exactly the same per pupil, West Virginia naturally winds up devoting more of its per-capita income to education.</p>
<p>The measure of resource &#8220;equity&#8221; is incomprehensible to anyone who has not specialized in school finance and earned a degree in statistics. Half of a state&#8217;s grade hinges on something called &#8220;state equalization effort&#8221;; the rest comprises still more obscure factors: the &#8220;wealth-neutrality score,&#8221; &#8220;relative inequality in spending per student among districts,&#8221; and something called the &#8220;McLoone Index.&#8221; Named for school finance analyst Eugene McLoone, it is &#8220;based on the assumption that if all the pupils in a state were lined up according to the amount their districts spend on them, perfect equity would be achieved if every district spent at least as much as was spent on the pupil smack in the middle of the distribution&#8230;.The ratio between what is currently spent by districts in the bottom half and what needs to be spent to achieve equity is the McLoone Index.&#8221;</p>
<p>The equity upshot: Hawaii naturally wins, because its unified statewide school system spends the same amount on all students. Never mind that the Aloha State&#8217;s achievement scores are among the lowest in the land.</p>
<p>Something closer to objectivity reappears in the final section of <em>QC 2001</em>, where profiles of individual states are more balanced and informative than this reviewer expected. Each profile includes a report card recapping the state&#8217;s NAEP results and its letter grades reported in earlier pages. Each gives a few basic facts about school enrollments and demographics. Then each has an essay of a page or two about what&#8217;s going on in that state. The authors are <em>Education Week</em> reporters who seem to have been given a fairly free hand to frame a state&#8217;s story according to what they found interesting there and with whom they talked.</p>
<p>Most of the essays are sober, matter-of-fact accounts of recent doings on the education reform front. The Indiana essay is a model of that kind, as are those of Louisiana, Maine, and Delaware. Some report interesting information that national observers may not have known, such as Nebraska&#8217;s abiding love of locally selected tests and its rejection of statewide assessments. Some report that heated controversies&#8211;such as the uproar surrounding Florida&#8217;s voucher program&#8211;are cooling down. Even some places where recent developments could lend themselves to a reporter&#8217;s bias against testing don&#8217;t always produce the expected &#8220;spin.&#8221; The Massachusetts account, for example, is acceptably balanced, as are those for Colorado and high-profile Texas. There are occasional slips, however. The Ohio story, for one, tends to favor the views of those who are grumping about the state&#8217;s proficiency testing program.</p>
<p>On balance, however, this sprawling publication displays an unmistakable, albeit uneven, set of assumptions that align with the values, preferences, and biases of the education profession itself. It thus becomes more of a report to the <em>profession</em> on matters that interest people within the field than a report to the public about how well that field is serving the nation.</p>
<p>Perhaps we shouldn&#8217;t be surprised. Most of <em>Education Week</em>&#8216;s and <em>QC</em>&#8216;s subscribers, after all, are educators, and most of the advertisers are firms that want to sell things to educators. This inevitably tempts reporters, editors, and publishers to view the world through the lenses of readers within the field rather than outsiders who most want to know whether the system is performing as well as it should. &#8220;Give educators what they want to see&#8221; may never have been stated in planning meetings and editorial sessions. Possibly all that happened is that the authors and their advisors and supervisors have been so close to the K-12 education system for so long that they&#8217;ve lost perspective on it and its players. They may even suffer from a touch of the Stockholm syndrome, identifying with their oppressors&#8211;their customers, in <em>Education Week</em>&#8216;s case. Whatever the reason, the unhappy bottom line is that quality does not count quite as much as it should in <em>Quality Counts</em>.</p>
<p><em><a href="http://www.hoover.org/bios/finn.html">Chester E. Finn Jr.</a> is president of the Thomas B. Fordham Foundation, a senior fellow at the Manhattan Institute, and a visiting fellow at the Hoover Institution.</em></p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=3384531&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/selective-reporting/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cooking the Questions?</title>
		<link>http://educationnext.org/cooking-the-questions/</link>
		<comments>http://educationnext.org/cooking-the-questions/#comments</comments>
		<pubDate>Mon, 17 Jul 2006 23:13:05 +0000</pubDate>
		<dc:creator>Terry M. Moe</dc:creator>
				<category><![CDATA[Check the Facts]]></category>
		<category><![CDATA[Standards, Testing, and Accountability]]></category>

		<guid isPermaLink="false">http://content.hks.harvard.edu/educationnext/?p=3368361</guid>
		<description><![CDATA[The 33rd Annual Phi Delta Kappa/Gallup Poll of the Public’s Attitudes Toward the Public Schools]]></description>
			<content:encoded><![CDATA[<p class="tocheading">The 33rd Annual Phi Delta Kappa/Gallup Poll of the Public&#8217;s Attitudes Toward the Public Schools</p>
<p><strong>By Lowell C. Rose &amp; Alec M. Gallup</strong><br />
<em>Phi Delta Kappa International, 2001.</em></p>
<p>Support for vouchers is declining. Or so we are told by Phi Delta Kappa (PDK), the nation’s premier association of education professionals, whose annual poll (conducted by Gallup) is widely accepted as the definitive measure of where Americans stand on education issues.</p>
<p>On the surface, it might seem that PDK is just documenting the obvious. After all, two voucher initiatives—one in California, the other in Michigan—were roundly defeated in the 2000 elections, and since then the buzz in education circles is that vouchers have dropped in popularity.</p>
<p>Much of this talk, however, is part of a public relations campaign being waged by the opponents of vouchers, whose aim is to persuade policymakers to stay away from the issue. Such a campaign is to be expected from the teacher unions and their allies, because this is the way the game of politics is played. My fear, though, is that PDK is actively participating in this spin campaign and has been for years.</p>
<p>I cannot read the minds of PDK’s researchers, and I do not want to accuse them of such a thing. But I’m convinced that one of two conclusions is justified. Either PDK’s polls have purposely been designed to reflect negatively on the voucher issue. Or its researchers have been careless in their design decisions—which I doubt.<br />
<a name="fig1"><img src="http://educationnext.org/files/ednext20021_70fig1.gif" border="0" alt="" hspace="2" vspace="2" width="220" height="473" align="right" /></a></p>
<p class="tocheading"><strong>PDK’s Key Measure</strong></p>
<p>From the 1970s until 1991, PDK measured voucher support with a survey item that defined vouchers as a government-funded program allowing parents to choose among public, private, and parochial schools. After support rose to 50 percent (with 39 percent opposed) in 1991, PDK abruptly dropped this item in favor of a new one. The new question read: “Do you favor or oppose allowing students and parents to choose a private school to attend at public expense?” This question, first asked in 1993, gave results that were strikingly more negative: only 24 percent expressed support (see <a href="#fig1">Figure 1</a>). Indeed, it indicated that even private school parents were opposed to vouchers, a result no expert would be prepared to believe.</p>
<p>Why such different “facts”? Research has long shown that most Americans are poorly informed about public policy and don’t have well-developed views on most issues. Recent polls have shown the same for vouchers. This does not mean that Americans can’t connect the voucher issue to their own values and beliefs. But it does mean that, because they come to any survey with little information, they will be quite sensitive to information contained within the survey itself, especially to the specific wording of questions and the order in which they are asked. This information determines how the issue is “framed.” And the framing, in turn, influences which (of many possible) values and beliefs get activated in people’s minds, and thus how people respond.</p>
<p>All public-opinion researchers are well aware of this. And they all know that, if public opinion is to be well measured on an issue, the issue must be framed with great care. The framing should provide respondents with enough information to give them a good sense of what the issue is about. The information also needs to be balanced, so that respondents are not pushed to see the issue in a positive or a negative light.</p>
<p>PDK’s “at public expense” item does not even come close to meeting these basic criteria. The central purpose of a voucher program is to expand the choices available to all qualifying parents, especially those who now have kids in public schools. But the PDK item does absolutely nothing to convey this information. It says nothing about choice, nothing about public school parents’ being eligible to participate. Instead, it focuses entirely on private school parents and asks respondents whether the government ought to be subsidizing them. Vouchers are presented, in effect, as a special-interest program for an exclusive group.</p>
<p>This is bad enough. But PDK’s researchers compound the bias with the way they choose to inform survey respondents that vouchers are funded by the government. They could have said just that, or they could have found some other neutral way of wording it. Instead they settled on the phrase “at public expense”—which is implicitly pejorative and begs for a negative reaction.</p>
<p>By scientific standards, PDK’s “at public expense” question is a poor measure of voucher support. It should never have seen the light of day. Nevertheless, PDK not only adopted this item as its own, but has persisted in using it in every annual survey but one since 1993—with results that, predictably, are on the low end of what we would expect, given the results of better-worded polls on the subject.</p>
<p class="tocheading"><strong>PDK’s Second Measure</strong></p>
<p>Interestingly, soon after the “at public expense” item made its appearance, PDK introduced a second question to measure support for vouchers. This one is actually informative and neutral, precisely the kind of item that should have been used all along. As such, it gives the appearance that PDK is seriously trying to get a valid measure of voucher support. It reads, “A proposal has been made that would allow parents to send their school-age children to any public, private, or church-related school they choose. For those parents choosing nonpublic schools, the government would pay all or part of the tuition. Would you favor or oppose this proposal in your state?”</p>
<p>Year after year, the results from this second item always show higher support for vouchers than the “at public expense” item does. In 2001, for instance, the “at public expense” item produced a support level of 34 percent, while the second item showed support to be 44 percent. In 1999 the comparable figures were 41 percent and 51 percent, and in 1996 they were 36 percent and 43 percent (see <a href="#fig2">Figure 2</a>).</p>
<p>These higher scores from the second support item, however, do not find their way into PDK’s press releases. The media annually turn to PDK as the official source of data on how Americans view the voucher issue, and PDK provides the lower scores from its “at public expense” question. These are the results that show up in the newspapers, on TV, and on the desks of policymakers.</p>
<p>For readers who want to dig deeper, the higher scores from the well-worded item can be extracted from PDK’s longer published report. But even these more believable scores must be interpreted with caution. The reason is that when the survey is actually administered to respondents, PDK always asks the well-worded question immediately <em>after</em> the “at public expense” question. This means that the voucher issue is already negatively framed before respondents get to the well-worded question, so its scores are also likely to be biased downward.</p>
<p>It doesn’t take a rocket scientist to see that there is a problem here. The effects of question ordering are well understood, and competent researchers always think carefully about where each question is placed. It is hard to believe that PDK’s researchers are somehow oblivious to all this and are suppressing the higher scores by accident.</p>
<p><a name="fig2"><img src="http://educationnext.org/files/ednext20021_70fig2.gif" border="0" alt="" width="450" height="288" /></a></p>
<p class="tocheading"><strong>The Gallup Experiment</strong></p>
<p>In January 2001, researchers at Gallup carried out an experiment that was written up and posted on the Internet. Noting that measured support for vouchers seems to vary with question wording, they took the two PDK items as their models and submitted a version of each to separate samples of respondents. Would these items produce different levels of support, and how different would they be?</p>
<p>This exercise was not as enlightening as it could have been. The Gallup researchers did not retain the identical wording of each PDK item, but instead tried to improve the questions by tinkering with words they thought might be sources of bias. In the process, the phrase “at public expense” was actually eliminated from the “at public expense” item—an obvious improvement, but one that compromises the experiment.</p>
<p>The findings, however, are still eye- opening. The new version of the “at public expense” question asked, “Would you vote for or against a system giving parents government-funded school vouchers to pay for tuition at a private school?” This wording at least retained its focus on private school parents. The result: 48 percent in favor, 47 percent against. The second PDK item became the following: “Would you vote for or against a system giving parents the option of using government-funded school vouchers to pay for tuition at the public, private, or religious school of their choice?”  This item suggests that the purpose of the program is to expand choices for a broader population of parents. The result: 62 percent in favor.</p>
<p>A few comments. First, PDK’s claim that Americans are turned off by vouchers is simply untrue. When the purpose of a voucher program is well conveyed, most Americans respond positively. In this case, by a very big margin. Second, if this experiment is any indication, the well-worded item on PDK’s own survey <em>is</em> downwardly biased, despite its reasonable wording. When it is not placed immediately after the  “at public expense” question, it produces higher support scores. Third, Gallup’s well-worded question produces a support score that is 14 percentage points higher than the special-interest item’s score. The gap would likely have been bigger still if Gallup had retained the pejorative phrase “at public expense.” A reasonable measure of voucher support, fairly tested, gives <em>much</em> higher support scores than the  “at public expense” item does (see <a href="#fig3">Figure 3</a>).</p>
<p>The Gallup experiment was available to PDK researchers well before they conducted their 2001 survey. One would think that, in light of this information, objective researchers would have modified their survey, or at least discussed the problems of measurement and interpretation that the Gallup experiment clearly raises. But nothing like this occurred. They designed their items as they always had, and when the results were in, they presented the “at public expense” findings to the media as hard evidence that Americans don’t like vouchers.<br />
<a name="fig3"><img src="http://educationnext.org/files/ednext20021_70fig3.gif" border="0" alt="" hspace="2" vspace="2" width="240" height="498" align="right" /></a></p>
<p class="tocheading"><strong>Other Voucher Surveys</strong></p>
<p>Over the years, many organizations have carried out surveys that ask questions about vouchers. Their surveys just haven’t received as much attention as PDK’s have.</p>
<p>It would be nice if these studies could somehow yield a single, coherent perspective on the voucher issue, but comparing them with any precision is a tricky business. Each has its own voucher item, its own ordering of questions, and its own range of topics being covered (which usually go well beyond education). All of these differences are likely to influence the results on voucher support. When these influences are combined with chance fluctuations due to sampling error—inherent in all surveys, regardless of how carefully they are designed—it is often impossible to tell exactly why surveys yield different results.</p>
<p>Even so, there is helpful information here. An important pattern in these studies is that voucher questions usually come in two types, which mirror the types we’ve been discussing. The first focuses attention on government subsidies for private school parents. The only real departure from PDK’s approach is that government funding is described in some neutral way, without the pejorative “at public expense”—a phrase that no one but PDK is inclined to use. The second type is worded to suggest that vouchers would expand choices for parents generally and that parents with children in public schools would be part of the program.</p>
<p>Not surprisingly, questions of the second type tend to produce much higher support scores than items of the first type do. Furthermore, they show that a majority of Americans tend to express support for the central purpose of a voucher program. The scores jump around from study to study, for all the reasons I’ve noted, and we are wise not to read too much meaning into particular findings or comparisons. On average, though, the existing studies tend to confirm—many times over—what we already know based on the PDK and Gallup results.</p>
<p>One issue needs addressing, however. Rather often, the survey questions used by these other organizations have been of the first type, which focus on government subsidies to private school parents. The researchers behind these surveys are presumably objective and competent. Why would they word their questions in ways that fail to convey the central purpose of a voucher program?</p>
<p>Again, I can’t read their minds, but here is my best guess. Sponsoring groups like CBS, NBC/<em>Wall Street Journal</em>, and ABC/<em>Washington Post</em> attempt to measure public opinion on a great variety of issues: presidential popularity, gun control, abortion, foreign policy, and many more. Education is but a small part of this, and the voucher issue is just a part of education. However well trained these researchers may be in survey methodology, they cannot be expected to have a nuanced understanding of each and every issue. As a result, they may sometimes adopt wording that seems perfectly acceptable, but that misses the mark.</p>
<p>In my view, that is what’s happening here. After all, the special-interest wording says something about vouchers that is quite true: a voucher program would indeed provide government subsidies to parents who go private. Moreover, this description is simple and short, properties that researchers value as they economize on survey time. So I’m not surprised that items of the first type have proved popular with general polling organizations. The only problem is that this simple, straightforward approach fails to capture the very purpose of a voucher program.</p>
<p>Given their diverse responsibilities, these researchers can be cut some slack. Their measures are inappropriate, but I suspect they would change them if they gained new perspective on the issue. I can’t say the same for PDK’s researchers, however. They are responsible, every year, for putting together a survey that deals entirely with education. They know the voucher issue only too well, and they have chosen to measure it in a way that is guaranteed to elicit low numbers.</p>
<p class="tocheading"><strong>Has Support for Vouchers Declined?</strong></p>
<p>Now let’s return to the issue with which this essay began, the issue of whether there has been a marked drop-off in voucher support over the past few years. PDK claims to have discovered such a downturn. Here is an excerpt from its 2001 press release:</p>
<blockquote><p><em>It is clear that the decade of the ’90s saw support for the use of public funds for parents and students to use in attending private and church-related schools increase, peak, and then begin what has become a significant decline. Support in this area was at 24 percent in 1993, climbed to 44 percent in 1997 and 1998, and has since dropped to the current level of 34 percent.</em></p></blockquote>
<p>The figures, of course, are taken from PDK’s “at public expense” item, which is a terrible measure to begin with and consistently gives low scores. Even so, this item has been asked in identical form over time, and the fact that the findings seem to fit a pattern is at least interesting. The question is, does this pattern stand up to scrutiny?</p>
<p>The first point to keep in mind is that survey results are likely to fluctuate from year to year by chance alone. Annual shifts of a few percentage points in either direction may be quite meaningless, and analysts have to resist the temptation to overinterpret. Especially in a short time series, random fluctuations can sometimes look like patterns or significant events, when in fact nothing about public opinion has changed.</p>
<p>Based on other surveys, there does seem to have been a real (nonrandom) increase in support from the early 1990s to the mid-1990s, as PDK claims. But beyond the mid-1990s, the evidence does not suggest any pattern at all. In particular, it does not suggest that vouchers have gone into “significant decline” in recent years.</p>
<p>Gallup’s own surveys are telling. They are well worded and so should give good measures of how people respond to the basic purposes of a voucher program. Yet there is no indication, by these measures, that support has budged much over the past five years. In 1996 Gallup tested support for vouchers on two separate occasions, and each time came up with support scores of 59 percent. In 2000 it surveyed opinion using the identical question, resulting in support of 56 percent. And in 2001—the experiment—it used a virtually identical question and found support to be 62 percent (see <a href="#fig4">Figure 4</a>). All this is consistent (given normal sampling error) with the notion that public opinion has not changed—and it is surely inconsistent with PDK’s claim that vouchers have gone into “significant decline” in the past few years.</p>
<p>Consider another example. In two recent NBC/<em>Wall Street Journal </em>polls, respondents were asked to choose between the following positions. “Position A: Government should give parents more educational choices by providing taxpayer-funded vouchers to help pay for private or religious schools. Position B: Government funding should be limited to children who attend public schools.” In 1999 the results were 47 percent in favor of vouchers, 47 percent against. But in 2000—at a time when vouchers were supposedly dropping like a stone in popularity—the results were 56 percent in favor and 38 percent against (see <a href="#fig4">Figure 4</a>).</p>
<p>I could provide more examples of vouchers’ increasing in popularity over the past few years. I could also find examples of vouchers’ declining somewhat in popularity. All of this, however, is quite normal and precisely what we ought to expect if the underlying reality hasn’t changed much. Due to sampling error alone, some numbers will go up and others will go down. But the fluctuations probably don’t mean much of anything.</p>
<p><a name="fig4"><img src="http://educationnext.org/files/ednext20021_70fig4.gif" border="0" alt="" width="450" height="338" /></a></p>
<p class="tocheading"><strong>PDK Changes the Survey</strong></p>
<p>This is true of voucher surveys in general. The recent plunge in PDK’s own measures of voucher support, however, cannot be chalked up to sampling error alone. There is a concrete reason, I believe, why PDK’s results have so dramatically gone south. The reason is simply this: PDK has recently <em>changed</em> its survey.</p>
<p>Before 2000, PDK followed a format in which respondents were asked to give the public schools a grade from A to F and then were presented with the two voucher items. In 2000, however, PDK altered the survey in a way that any competent researcher would expect to be consequential. Respondents were asked the same A to F grading questions, but then—immediately before the key voucher items—they were asked five additional questions that surely had important framing effects.</p>
<p>The first two essentially set up a dichotomy between vouchers and the public schools. The second of them asks, “Which one of these two plans would you prefer—improving and strengthening the existing public schools, or providing vouchers for parents to use in selecting and paying for private and/or church-related schools?” PDK is thus clearly suggesting to respondents that people who support public education—as most Americans do—cannot at the same time support vouchers. From a framing standpoint, this is a killer. It is also factually incorrect. Most activists in the voucher movement <em>are</em> dedicated to improving the public schools, and they see vouchers as a powerful means of effecting improvement through greater choice and competition.</p>
<p>The next three new items are also problematic. Two focus attention on a long list of public-school ideals. The third contrasts parental choice with other “possibilities”—like rigorous academic standards and competent teachers—again giving the impression that they are alternatives to vouchers rather than (as is in fact the case) entirely complementary.</p>
<p>There is little doubt, in my view, that the introduction of these five new items just prior to the usual voucher items produced a more negative framing of the voucher issue and encouraged the lower support scores that became PDK’s findings in 2000. The same sort of thing happened in 2001, except in that year PDK’s researchers changed the survey again. This time, they eliminated three of the five lead-in items, and included just the two killer items: the ones that portray vouchers as antithetical to public education (see <a href="#table1">Table 1</a>). Whether this lead-in is more or less negative in its framing than the 2000 lead-in is unclear. I suspect it is more negative, because it is so simple and forceful and avoids all the distractions of the other three items. In either event, it is surely more negative than the original framing from 1999 and before, and there can be no surprise that it again led to lower support scores.</p>
<p>The most reasonable conclusion, therefore, is that the “significant decline” in voucher support—loudly proclaimed by PDK and reported by the media as fact—is an artificial phenomenon of PDK’s own making. The important changes didn’t occur in public opinion. They occurred in the design of PDK’s survey—a factor that, needless to say, is under the conscious control of PDK’s own researchers.<br />
<a name="table1"></a></p>
<table border="0" cellspacing="0" cellpadding="5" bgcolor="#eeeeee">
<tbody>
<tr>
<td>
<p class="tocheading"><strong>Molding Public Opinion (Table 1)</strong></p>
<p><em>It’s no wonder that the public’s support for vouchers was so low on Phi Delta Kappa’s most recent opinion survey. The survey asked a series of questions that framed education reform as a false trade-off between vouchers and fixing the public school system.</em><br />
<strong>Voucher Questions from the 33rd Annual Phi Delta Kappa/Gallup Poll of the Public’s Attitudes Toward the Public Schools, 2001</strong></p>
<p class="tocheading">Framing Questions</p>
<p><strong>Survey Question #5:</strong> In order to improve public education in America, some people think the focus should be on reforming the existing public school system. Others believe the focus should be on finding an alternative to the existing public school  system.  Which approach do you think is preferable—reforming the existing public school system or finding an alternative to the existing public school system?<br />
<strong>Survey Question #6:</strong> Which one of these two plans would you prefer—improving and strengthening the existing public schools or providing vouchers for parents to use in selecting and paying for private and/or church-related schools?</p>
<p class="tocheading">Biased Question</p>
<p><strong>Survey Question #7:</strong> Do you favor or oppose allowing students and parents to choose a private school to attend at public expense?</p>
<p class="tocheading">Well-worded Question</p>
<p><strong>Survey Question #8:</strong> A proposal has been made that would allow parents to send their school-age children to any public, private, or church-related school they choose. For those parents choosing nonpublic schools, the government would pay all or part of the tuition. Would you favor or oppose this proposal in your state?</td>
</tr>
</tbody>
</table>
<p class="tocheading">
<p class="tocheading"><strong>Conclusion</strong></p>
<p>I do not want to believe that PDK is using its survey to further its own political agenda. But what is the alternative? That PDK’s researchers have simply been careless in their design decisions? That these decisions have by sheer accident led to lower support scores for vouchers? That the most biased of these scores have unwittingly been urged on the media as good-faith evidence of American public opinion? This is a lot to swallow. It is much more reasonable to believe that PDK’s researchers are competent at their jobs and that they have not been making one mistake after another.</p>
<p>Whatever the explanation, one conclusion is sure. With the public relations campaign against vouchers in full swing, it is important for people who want the facts about public opinion to look askance at this most official of all education surveys. On the voucher issue, its findings are not to be believed.</p>
<p>Having said this, I don’t want to leave the impression that, when the truth of the matter is revealed, Americans turn out to be wild about vouchers. Granted, well-worded survey items do show majority support for the idea. This is very important. But in a population that is poorly informed about public policy, the positions people take on surveys are often soft—and complicated—and need to be evaluated with care.</p>
<p>This is what I have tried to do in my recent book, <em>Schools, Vouchers, and the American Public</em>. One of its central themes is that Americans are on both “sides” of the voucher issue at once: they like the public school system <em>and</em> they are positively inclined toward vouchers. In their minds (and mine), there is no reason one can’t be supportive of both, no reason vouchers can’t coexist with and promote a healthy public school system. Problems arise, however, when people are told—as they are during initiative campaigns, through massive media blitzes by the teacher unions—that vouchers will destroy the public schools. Faced with such draconian framing, and responding to the uncertainty that it creates, many people back away from vouchers and embrace the current system. They don’t want to lose what they have.<br />
Support for vouchers, then, is a complex matter that cannot be summed up in simple survey numbers. I suspect, moreover, that this complexity may take a still different form over the next year or so, as the tragedy of September 11 generates an upswing in support for government institutions, along with an increasing aversion to change and uncertainty—all of which could show up (temporarily) in the way people respond to survey items about vouchers.</p>
<p>Whatever the future holds, however, the challenge for those of us who are studying this issue—regardless of which side we’re on—is to understand what Americans are really thinking. This means insisting on well-designed surveys. And on facts that can be believed.</p>
<p><em>–Terry M. Moe is a professor of political science at Stanford University and a senior fellow at the Hoover Institution.</em></p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=3368361&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/cooking-the-questions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Waiting for Utopia</title>
		<link>http://educationnext.org/waitingforutopia/</link>
		<comments>http://educationnext.org/waitingforutopia/#comments</comments>
		<pubDate>Mon, 17 Jul 2006 20:26:32 +0000</pubDate>
		<dc:creator> </dc:creator>
				<category><![CDATA[Check the Facts]]></category>

		<guid isPermaLink="false">http://content.hks.harvard.edu/educationnext/?p=3366751</guid>
		<description><![CDATA[It&#8217;s easy to tell when someone is in the grip of a Big Idea That Explains Everything. Tunnel vision sets in; every analysis, whatever the topic, becomes an occasion for the grand theory to appear. Evidence is read and supplied selectively, in such a way that the theory re-mains unscathed. Skepticism is deployed selectively as [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://educationnext.org/files/ednext20022_73a.jpg" border="0" alt="" width="250" height="253" align="right" /><br />
It&#8217;s easy to tell when someone is in the grip of a Big Idea That Explains Everything. Tunnel vision sets in; every analysis, whatever the topic, becomes an occasion for the grand theory to appear. Evidence is read and supplied selectively, in such a way that the theory re-mains unscathed. Skepticism is deployed selectively as well. Findings that comport with the Big Idea are held to a relaxed standard, while the work of critics is subjected to withering scrutiny.</p>
<p>Richard Rothstein, author of the <em>New York Times</em>&#8216;s widely read &#8220;Lessons&#8221; column, a weekly commentary on education issues, frequently exhibits these symptoms. His Big Idea is that economic forces, especially inequality and poverty, largely determine the outcome of American social projects-including attempts at education reform. The effect of this obsession is two-fold. First, his writings display a factual carelessness, suggesting that details hardly matter if one possesses a higher truth. Second, nearly every engagement with issues of schooling, testing, standards, and teaching becomes an occasion to reassert the primacy of the economic factor.</p>
<p>Rothstein believes that most contemporary criticism of the public schools is misplaced. The main problems lie not with the schools, he claims, but with the injustices associated with the American economic system. It isn&#8217;t a lack of competition in the public school system, antiquated hiring and compensation systems, or a dearth of solid research on educational methods that depresses student achievement; it&#8217;s an economic system that allows for large differences in income and wealth. As a result, to Rothstein&#8217;s mind, education reform won&#8217;t be effective without far broader social reforms.</p>
<p class="tocheading"><strong>The Sky Isn&#8217;t Falling</strong></p>
<p>Rothstein is committed to the view that no crisis exists in American education, that all the critics are merely Cassandras trying to scare the public into accepting their pet reforms. His statistical gymnastics are used most often in the service of defending current school practices against most reform proposals, usually by denying the need for reform in the first place. Thus Rothstein&#8217;s explanation for falling SAT scores is not a decline in education quality, but an increase in the number and the diversity of test takers. Says Rothstein, in an August 2000 column: &#8220;Interpreting the SAT is more complex than it seems. SAT trends would reflect school quality changes only if every 18-year-old took the test. Not all do. Average scores are affected by who takes the SAT. If only the brightest seniors take it, averages are higher. If more lower-ranked seniors aspire to college and take the test, this could indicate better performance by schools, but still depress the average.&#8221;</p>
<p>The claim that slipping scores result from a changed demographic (and hence could even be good news) has surfaced repeatedly in the writings of education commentators such as Gerald Bracey, but it is demonstrably false. <em>Washington Post</em> economics columnist Robert Samuelson summarized the matter in a 1994 column by noting: &#8220;The change in the student population <em>preceded </em>the drop in test scores. Between 1951 and 1963, the number of test takers went from 81,000 to nearly 1 million; test scores rose slightly.&#8221; Moreover, the percentage of test takers remained relatively constant between 1972 and 1984 (see <a href="#fig1">Figure 1</a>). There were still a million test takers in 1985, the first year in which test scores showed a small uptick after 19 years of decline. Scores have been flat or slightly improved since then, with math scores returning to their levels of 30 years ago, but failing to reach their mid-1960s apex.</p>
<p><a name="fig1"></a><br />
Changes in the composition of the test-taking pool don&#8217;t explain the decline in test scores either. Studies by the Educational Testing Service and others have showed, in the words of Robert Samuelson, that &#8220;the main declines occurred among whites and could not be explained by changes in student&#8217;s gender, economic class, or parental education.&#8221; This analysis was seconded by Harvard sociologist Christopher Jencks, who pointed out that the SAT scores of advantaged white males have also exhibited a steep decline.</p>
<p>Yet Rothstein, exactly one year later, parroted his earlier claims. His reaction to the release of the 2001 scores, which showed no improvement over the previous year and hence were termed &#8220;stable&#8221; by the College Board, was to write, &#8220;Stable in this case does not mean unimproved. Hidden in the data is more hopeful news than most people would expect. These tests are voluntary. If only high achievers take them, average scores mean one thing. But if a broader range of students takes them, the results must be interpreted differently. The number taking the tests has in fact grown a lot. . . . It is remarkable that averages gained at all while the test-taking base was expanding.&#8221;</p>
<p>Again, in an October 2001 column, the familiar refrain reappeared in a rebuttal of the 1983 <em>Nation at Risk</em> report. The authors, we are told, &#8220;misunderstood the decline in test scores. College Board results had dropped, but that was due to the growth in college-going ambition. The SAT was no longer taken only by top students, and so average scores of test takers naturally fell.&#8221;</p>
<p>In short, in three successive bombing runs, Rothstein failed even to acknowledge, much less refute, any evidence that would undermine his assertions. This suggests an opinion resistant to the complexities of the issue.</p>
<p>Having dismissed claims that the achievement of American students has declined over time, Rothstein turns to the other major source of worry: the woeful performance of American students on international comparisons of educational achievement. Take a May 2001 column, in which Rothstein manages to convert lead into silver. He starts by facing the facts. He concedes that on the Third International Mathematics and Science Study, &#8220;Our 8th graders scored below their peers in almost every other industrial nation that took part. Students in Japan and Korea ranked near the top. In math, average American 8th graders would have scored below the 25th percentile in Japan or Korea.&#8221; But don&#8217;t worry, because the kids certainly aren&#8217;t: &#8220;The study also asked students if they liked math and science. . . . here Japanese and Koreans scored at the very bottom. . . . In the United States, 35 percent felt positively about math and 32 percent about science, more than in almost every other industrial nation.&#8221; A Japanese scholar is also invoked to assure us that his countrymen do &#8220;not attach great importance to students&#8217; rankings because the exams measure skills valued by the old education system, not the new.&#8221; In fact, Rothstein concludes, the dour Japanese want to emulate our schools because of our &#8220;zest for living.&#8221;</p>
<p>In a July 2001 column, Rothstein tells his readers not to fret over data from the National Assessment of Educational Progress showing that two-thirds of American 4th graders can&#8217;t read above a basic level, because &#8220;on an international survey of reading ability, American 4th graders scored higher than pupils everywhere except Finland.&#8221; In other words, international comparisons are apparently valid when they corroborate Rothstein&#8217;s fundamental beliefs, but easily dismissed when they reflect poorly on the American education system.</p>
<p class="tocheading"><strong>Poverty&#8217;s Pull</strong></p>
<p class="tocheading">
<p class="tocheading">For Rothstein, even if there were a crisis in American education, it certainly wouldn&#8217;t be the schools&#8217; fault. Any problems that do exist are the result of social inequality. Concerned about low SAT scores? Rothstein insists that &#8220;the best predictor of test scores has always been students&#8217; social class.&#8221; Thus low scores are to be expected from disadvantaged youngsters.</p>
<p>Rothstein returns to this theme time and again. For instance, Rothstein&#8217;s December 2001 column highlighted a study by Eric Dearing of the Harvard Graduate School of Education. Dearing examined a ratio termed the &#8220;income-to-needs&#8221; of families below the poverty line and showed that when income increased (roughly $4,500 per year over three years), very young children performed better on tasks where they were asked to identify colors, shapes, and letters (skills considered important in school readiness).</p>
<p>This was a only a small-scale study, but that didn&#8217;t stop Rothstein from drawing the far-fetched conclusion that &#8220;educators have a stake in promoting a federal income policy that focuses on immediate income support for the unemployed, because this in itself could make instruction more effective.&#8221; There&#8217;s nothing wrong with arguing that the unemployed should receive more generous benefits, but this has hardly anything to do with education; it&#8217;s certainly not a conclusion based on solid evidence, which Rothstein is always sure to demand of those who don&#8217;t agree with him.</p>
<p>Besides, most of the serious research points in the opposite direction. For example, the University of Chicago&#8217;s Susan Mayer undertook a far more comprehensive analysis of the relationship of income to school achievement in a 1997 book, <em>What Money Can&#8217;t Buy</em>. She examined nearly 17,000 records in two massive data sets in her search for the true effects of income. The study is important for its methodological sophistication and its conclusions, which take us beyond the traditional left-right political axis regarding welfare programs and the causes of poverty.</p>
<p>Mayer showed that income per se is not a consequential factor in children&#8217;s performance. Beyond providing the ability to satisfy basic needs like food and shelter, income is not a necessary, much less a sufficient, explanation of children&#8217;s academic achievement. Mayer found that a supportive family structure (a stable, two-parent home), a culture of learning within the family and neighborhood, and natural abilities were much more important than income. Given these factors, income can certainly help people achieve their ends. In their absence, however, income is largely inconsequential.</p>
<p>These findings would seem to present the perfect opportunity for Rothstein to flex his critical muscles by rebutting Mayer&#8217;s scholarship-or at least addressing it. Nevertheless, he seems to have ignored her study entirely-despite the fact that her findings run directly counter to his Big Idea.</p>
<p>What&#8217;s strange is that Rothstein certainly isn&#8217;t shy about attacking his ideological opponents. In his first column of 2001, he tackled a much-publicized Heritage Foundation study of 21 schools that exhibit both high poverty rates and high test scores. This was a direct challenge to Rothstein&#8217;s Big Idea-that there is an immutably inverse relationship between income and student achievement. How would Rothstein assail the study? Let&#8217;s try invoking ideology first. Heritage is &#8220;conservative,&#8221; we read, so its report &#8220;as a whole is enveloped in such contempt for most public education that its valid messages are lost.&#8221; Then let&#8217;s supplement the Big Idea. &#8220;The report&#8217;s biggest flaw is its assumption that poverty alone defines the problems of the low-income school. . . . Rather, schools with consistently low scores typically have children for whom poverty per se is only one problem. They also suffer from crime-ridden neighborhoods, broken families, parental stress, inadequate housing, and poor health.&#8221;</p>
<p>Rothstein&#8217;s maneuver here is a tactical retreat. Heretofore, income has served for him like an engine&#8217;s governor, an upward limit on a school&#8217;s (or individual&#8217;s) capacity to perform. But Heritage located schools that perform well despite their low-income student body. Rothstein&#8217;s way out is to declare the study myopic, saying that it focuses exclusively on income, when other criteria, not examined in the study, are clearly the relevant factors. However, this maneuvering is a bit circular, considering that inadequate housing, bad neighborhoods, and poor health are usually direct proxies for poverty. Still, in an effort to avoid a contradiction, Rothstein makes an important acknowledgment: successful children, he says, come from &#8220;stable homes and parents [who are] regularly employed.&#8221; Hence, Rothstein suggests, if only the study had asked proper questions, instead of obsessing over income, it would have found those factors (like parents) that really contributed to the schools&#8217; academic success. Here it sounds as if he agrees with Mayer-when it suits his ideological purposes.</p>
<p class="tocheading"><strong>The Bell Tolls</strong></p>
<p class="tocheading">
<p>So fixated is Rothstein on the determinative role of income that he actually proposes a sliding scale for learning standards. It is inherently unfair, he has argued on several occasions, to judge all schools and students by the same academic standards. Because of their disadvantages, low-income students should not be expected to compete against the affluent. He recommends adjusting the expectations for the schools located within a given neighborhood based on the area&#8217;s average income.</p>
<p>Rothstein broached this argument in a November 1999 column entitled, &#8220;Does Social Class Matter in School?&#8221; We are in no suspense about Rothstein&#8217;s answer, of course: the poor, because they are poor, should not be expected to learn as much. Poor children are stuck in shoddy day care, while &#8220;typical middle class parents raise their children differently,&#8221; providing them their own &#8220;head start.&#8221; As Rothstein puts it, we must resist the &#8220;dangerous myth&#8221; that &#8220;all children, regardless of academic background, can achieve to the same high standards if only schools demand it.&#8221; Accordingly, &#8220;schools with privileged children should be termed failing if they test only at the 70th percentile. . . . A school with many poor children, scoring at the 35th percentile, could be highly successful, though it tests below average.&#8221; Substituting race for class in such an argument would make it perilously near to the views that were attributed to Charles Murray and Richard Hernnstein in their controversial book <em>The Bell Curve</em>.</p>
<p>Rothstein actually invoked Murray&#8217;s controversial theories in a December 1999 column, presumably because he found this framework useful: &#8220;We simply cannot set one standard applicable to all,&#8221; he writes, since that would be &#8220;statistically foolish.&#8221; The foolishness derives from the observation that human performance is often normally distributed, which means that some are necessarily one or more standard deviations above or below the average. It follows for Rothstein that differing income levels call for differing academic standards.</p>
<p>Most schools do something like this by &#8220;tracking&#8221; students, albeit by previous academic performance, not income. Rothstein, however, denounces this form of relativism, apparently because it is not income-based. For him, tracking is based on another &#8220;dangerous myth,&#8221; since the practice of taking &#8220;poor children . . . assigned to tracks and taught less challenging lessons&#8221; assumes that &#8220;disadvantaged children could not learn.&#8221; So you can offer all students, rich or poor, the same curriculum; you just can&#8217;t test to see whether they&#8217;ve all mastered it. Evidently the only solution is a classless society.</p>
<p>The danger is that, while we await socioeconomic utopia, Rothstein would expect nothing better from the schools themselves. Indeed, his answers to the problem of disparities in student achievement often involve placing limits on high-achieving children rather than improving the education of low-achieving children. Consider homework. In a May 2001 column, Rothstein lamented the fact that teachers are assigning more homework, which is said to be &#8220;up 50 percent in the last two decades.&#8221; This is a problem, Rothstein believes, because it &#8220;may increase the gap between students from middle-class and low-income homes. With growing inequality now a greater danger than middle-class pupils&#8217; inadequate achievement, policies that widen learning differences should be avoided.&#8221;</p>
<p>Rothstein cites an academic authority to reinforce the claim, quoting University of North Carolina professor Eugene Brooks, who says, &#8220;Because of homework, schools either consciously or unconsciously reproduce social inequality. It can be avoided only if teachers take over homework supervision from parents.&#8221; That&#8217;s a somewhat breathtaking mission for the school-reducing the impact of social class on learning by expunging parents from the equation, since they are unequal in their degree of helpfulness. It is apparently better for all youngsters to languish in dreary study halls-presumably reducing the amount of time left for instruction-than to take the risk that one mother might help her child learn faster than another. Rothstein concludes, &#8220;It is unconscionable for educators to exacerbate inequality by assigning homework without first ensuring [that social programs] are in place.&#8221; Perhaps Rothstein ought to teach 5th grade. No homework, no testing, and being graded on their &#8220;zest for living&#8221; rather than on their achievement in math and science-his students would love him!</p>
<p class="tocheading"><strong>Hungry for Facts</strong></p>
<p class="tocheading">Though Rothstein&#8217;s column is nominally about education, regular readers find themselves spending as much time rooting about in sundry social programs, organized around the theme that those who would fix schools by attending directly to schooling are misguided. &#8220;By focusing only on schools, government may waste money trying to fix academic problems that it could have prevented in the first place at less expense.&#8221; For example, if schoolchildren falter on tests, the reason must be malnutrition. In a January 2000 column, he writes, &#8220;Forced to spend more, poor families often raid food budgets to pay rent. Children then suffer nutritionally, compounding cognitive problems.&#8221;</p>
<p>He amplified this theme in August 2001, asking, &#8220;What is the most efficient way to raise low-income pupils&#8217; achievement? . . . Improving nutrition might bring a bigger test-score gain. . . . Undernutrition found in the United States affects performance. . . . Reducing hunger that causes low test scores may be accomplished more easily than the various unproved educational reforms commonly advanced. A higher minimum wage that helps low-wage parents feed their children could be one important step in ensuring that no child is left behind. And those wanting to narrow academic gaps can . . . contribute to the nation&#8217;s overstressed food banks. This too would amount to real education reform.&#8221;</p>
<p>A claim that malnutrition might affect cognition and learning is not unreasonable. But there is little credible evidence that a large hunger problem exists in America. Activist groups that routinely assert that American children suffer malnutrition are actually measuring something they term &#8220;food insecurity,&#8221; a condition that usually results more from poor household management than from scarce resources.</p>
<p>Rothstein has shown himself to be as gullible to the claims of activists as the popular media. When a 1990s survey was said to show that there were 12 million hungry children, NBC&#8217;s Tom Brokaw quickly transformed that claim into &#8220;12 million American children are malnourished.&#8221; On CBS, Dan Rather went further, announcing, &#8220;A startling number of American children [are] in danger of starvation.&#8221; These preposterous facts were generated by the Food Research Action Center (FRAC) and rely on some statistical sleight of hand. The surveys don&#8217;t measure nutrition, but instead ask about feelings of &#8220;insecurity&#8221; concerning food, measuring the percentage of households that reported &#8220;difficulty getting enough nutritious, safe food at all times, in a socially acceptable way.&#8221; This approach has drawn scorn from such commentators as the<em> New Republic</em>&#8216;s Mickey Kaus and Johns Hopkins professor George Graham, who said of these numbers, &#8220;A lot of what the activists are calling hunger is just absolute rubbish . . . irresponsible people making irresponsible claims.&#8221; Kaus was more succinct: &#8220;The whole project oozes phoniness.&#8221;</p>
<p>In fact, the larger food-related problem besetting American children today is not hunger but obesity. Indeed, one reads in one of Rothstein&#8217;s November 2000 columns, here citing the U.S. Surgeon General, that &#8220;with the percentage of overweight children doubling in the last 30 years, school kids are less healthy.&#8221;</p>
<p>In the obesity column, Rothstein is defending mandatory phys ed, which he worries could be sacrificed to increased academic demands. He wonders, &#8220;Is all this (obesity) an unintended consequence of raising academic standards? Are schools squeezing in more math by eliminating gym?&#8221; This was not the first time he made this claim. In a November 1999 column, Rothstein, fretting that schools are being carried away by standards and testing, wrote, &#8220;Test scores are rising. But more teenagers have fatty diets. . . .  adolescent obesity has soared. Neither children nor our economy will prosper if graduates have higher scores, but are overweight.&#8221; Surely Rothstein can make better arguments against testing than the shaky hypothesis that it leads to obesity. Are testing programs really taking the place of physical education? Do existing phys ed programs make any difference in children&#8217;s weight? Here Rothstein is engaging in pure conjecture.</p>
<p>Continuing his seeming obsession with the relationship between food and cognition, Rothstein writes, in a March 2001 column, &#8220;Surgeon General David Satcher reported . . . that more than a third of poor children have untreated dental cavities. Pupils taking tests with toothaches are unlikely to score as well as those undistracted by pain.&#8221; In other words, we&#8217;re unfairly expecting teachers and schools to administer tests before society has ensured adequate dental care for everyone. But are most cavities even associated with toothaches? Does Rothstein have any evidence that toothaches impair performance? And with all this social engineering, when will we ever teach reading?</p>
<p class="tocheading"><strong>Double Standards</strong></p>
<p class="tocheading">Rothstein&#8217;s own assertions are rarely backed by solid research or even by decent facts. However, when he is confronted with research that counters his worldview, the methodological gloves come off. Consider his approach to voucher research. If poor families were given vouchers redeemable at the schools of their choice, and the achievement of some students rose, it would call into question Rothstein&#8217;s notion that income is the master variable. We therefore ought to expect him to resist such research. And that is exactly what he has done.</p>
<p>Confronted with studies that show black students making gains as a result of being given vouchers (full disclosure: these studies were led mainly by Paul Peterson, this journal&#8217;s editor-in-chief), Rothstein turns to Stanford education professor Martin Carnoy to build a strict critique. Proper experiments should be double-blind, Carnoy says, as in drug trials. This cannot be done with kids who know what school they&#8217;re in. Moreover, some students left the voucher program after one year, and some declined to participate at all. Rothstein writes: &#8220;Researchers cannot know if leavers differ significantly from stayers.&#8221; The gains reported were &#8220;small&#8221; and &#8220;inconsistent.&#8221;</p>
<p>There are many problems with this critique. A study doesn&#8217;t need to be double-blind in order to be valid; many studies in the medical field are not double-blind. Moreover, the researchers used well-regarded statistical techniques to control for the fact that some students left the program after one year or didn&#8217;t participate at all. The broader problem is that Rothstein fails to use this standard, or anything remotely like it, when tackling research that favors his Big Idea. The information that he routinely asserts with confidence, such as studies supporting the primacy of economic factors, represents inferences based on, if anything, methodological procedures that were far less rigorous than those used in empirical work on vouchers. The impression offered is of one straining at gnats while swallowing camels.</p>
<p>There is more, much more. The pattern persists from week to week. The Big Idea is defended at all costs. Critics are dismissed. Economics and social class explain just about everything-except where they don&#8217;t. America&#8217;s self-styled newspaper of record has but one regular commentary on education, and its author is gripped by a worldview that often blinds him to evidence and confounds the rules of logic.</p>
<p><em>-David W. Murray is the former director of the Washington-based Statistical Assessment Service, a nonpartisan organization that examines the use of quantitative research by the media.</em></p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=3366751&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/waitingforutopia/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dodging the Questions</title>
		<link>http://educationnext.org/dodging-the-questions/</link>
		<comments>http://educationnext.org/dodging-the-questions/#comments</comments>
		<pubDate>Mon, 17 Jul 2006 17:35:36 +0000</pubDate>
		<dc:creator>Terry M. Moe</dc:creator>
				<category><![CDATA[Check the Facts]]></category>

		<guid isPermaLink="false">http://content.hks.harvard.edu/educationnext/?p=3365396</guid>
		<description><![CDATA[Somehow I expected more. When I challenged Phi Delta Kappa (PDK) and Gallup&#8217;s claim that they had discovered a &#8220;significant decline&#8221; in voucher support, I figured they would respond with detailed justifications of their procedures and findings. But they haven&#8217;t done that. Their response reads more like an exercise in public relations than a serious [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://educationnext.org/files/ednext20023_77a.gif" border="0" alt="" hspace="2" vspace="2" width="200" height="275" align="right" /><br />
Somehow I expected more. When I challenged Phi Delta Kappa (PDK) and Gallup&#8217;s claim that they had discovered a &#8220;significant decline&#8221; in voucher support, I figured they would respond with detailed justifications of their procedures and findings. But they haven&#8217;t done that. Their response reads more like an exercise in public relations than a serious attempt to deal with the issues I&#8217;ve raised.<br />
Lowell Rose and Alec Gallup, writing the official response for PDK, begin by setting out their annual data from 1993 through 2001 on support for school vouchers. As anyone can see, the data seem to show that support for vouchers has &#8220;significantly declined&#8221; in recent years. The authors note that, while I think the first of their two questions is biased, I regard the second as well worded. Yet this second item documents the same trend as the first: that voucher support has dropped considerably.<br />
This is a useful lead-in for them because it allows them to show, visually, that their numbers really do go down. But however obvious the trend might look, it does not prove a thing. The reason we are having this debate is that the data are in dispute—and, if I am right, are not to be believed.</p>
<p>I do happen to think their second question is well worded. But that doesn&#8217;t mean I think the support scores it generates are valid. In fact, I think they are <em>not</em> valid—and I say so quite clearly in the article and give my reasons. The key problem is that PDK/Gallup changed the survey in recent years, and these internal changes—not changes in public opinion—are probably responsible for the drop-off in numbers.</p>
<p>Rose and Gallup ignore all this. Right from the start, they fail to deal with or even acknowledge the argument I actually make.</p>
<p><img src="http://educationnext.org/files/ednext20023_77fig1.gif" border="0" alt="" width="600" height="598" /></p>
<p class="tocheading"><strong>Shifting Measures</strong></p>
<p>I began my article by noting that from the 1970s until 1991 PDK measured support for vouchers with a question that portrayed vouchers as a government-financed program allowing parents to choose among public, private, and parochial schools. By 1991 public support for vouchers had risen to 50 percent (with 39 percent opposed) on this measure—whereupon it was unceremoniously dumped by PDK in favor of a new question. This one asked, &#8220;Do you favor or oppose allowing students and parents to choose a private school to attend at public expense?&#8221; The findings from this new measure were—surprise!—astonishingly more negative. In 1991 it yielded a support level of just 26 percent; in 1993, an abysmal 24 percent.</p>
<p>It is only reasonable to ask, Why did PDK dump its traditional voucher item? Why did it embrace the &#8220;at public expense&#8221; item as an alternative? And, as both items are allegedly measuring support for vouchers, what accounts for their strikingly different results?</p>
<p>Rose and Gallup ignore the whole thing, thus providing no objective rationale to dispel the unavoidable suspicion that the &#8220;at public expense&#8221; item may have been favored precisely because it gives much lower support scores. They had a chance to provide such a rationale here, and they chose not to.</p>
<p><img src="http://educationnext.org/files/ednext20023_77fig2.gif" border="0" alt="" width="350" height="407" /></p>
<p class="tocheading"><strong>The &#8220;At Public Expense&#8221; Item</strong></p>
<p>I go on in the article to give reasons why the &#8220;at public expense&#8221; item is a downwardly biased measure of voucher support. Most people are poorly informed about vouchers (and other policy issues as well) and are thus heavily influenced by the wording and ordering of survey questions. Questions must give people enough information to know what the policy issue is about, and the information must be balanced. The &#8220;at public expense&#8221; item fails on these counts.</p>
<p>First, the central purpose of a voucher program is to expand choices for all parents, especially those with kids currently in public schools, but the PDK item doesn&#8217;t convey this information. Instead, it cryptically asks whether parents should be allowed to send their kids to private schools at public expense—which, especially for ill-informed respondents, tends to frame vouchers (implicitly, through the images it evokes) as a special-interest program for private school parents rather than as a program of expanded choice for parents generally. Second, the phrase &#8220;at public expense&#8221; is a pejorative way of referring to government financing, likely to elicit more negative responses still.</p>
<p>Here Rose and Gallup have an opportunity to argue that their &#8220;at public expense&#8221; item is not negatively biased, that it is indeed an excellent measure, preferable to others PDK might have used. But they don&#8217;t do this. Instead, they begin by recounting my brief definition of the central purpose of a voucher program (which they don&#8217;t dispute) and by claiming that it is inappropriate to convey such information in a survey question: &#8220;The purpose of an opinion poll is to survey public opinion based on the information the public has at the time; it is <em>not</em> to educate the public.&#8221;</p>
<p>This claim may sound authoritative, but it actually makes no sense. If it were true, PDK would be best off simply asking its survey respondents, &#8220;Do you support school vouchers?&#8221; Period. No information about what vouchers are. No attempt to educate. No nothing. But of course PDK doesn&#8217;t do that, nor does any other polling organization. The reason is that most Americans wouldn&#8217;t know what to make of such a bare-bones question. They <em>need</em> some kind of information to let them know what a voucher program is so that they can get their bearings and express an opinion. In practice, whether Rose and Gallup want to admit it or not, the &#8220;at public expense&#8221; item <em>is</em> an attempt to provide respondents with information that tells them something about vouchers. What Rose and Gallup should be arguing is not that survey items have no business conveying basic information, but rather that the information their &#8220;at public expense&#8221; item does provide is entirely appropriate and unbiased. This is an argument they never make.</p>
<p>Instead they offer a lame diversion. They note that, in their 1997 poll, they tested whether a change in wording to &#8220;at government expense&#8221; would lead to different results. And it did. The responses were more positive toward vouchers (by 4 percent). They go on to speculate that, had they changed the wording to &#8220;at taxpayer expense,&#8221; the shift in results would have been negative. They embraced the &#8220;at public expense&#8221; item, they say, because it &#8220;represents an effort to chart a middle course and avoid bias&#8221;—which presumably justifies their choice of measures.</p>
<p>What kind of logic is this? There are <em>many</em> ways that support for vouchers might be measured and, within each, many ways that the element of government financing can be worded: &#8220;government-funded,&#8221; &#8220;publicly funded,&#8221; &#8220;with government paying all or part of the tuition,&#8221; and so on. The question is not whether &#8220;at public expense&#8221; is preferable to &#8220;at government expense&#8221; or &#8220;at taxpayer expense&#8221;—phrases that are themselves pejorative in tone. The question is whether it is preferable to all the other, more neutrally worded possibilities. Which they never even consider. So once again, Rose and Gallup avoid the real issues here. Their justification is no justification at all.</p>
<p>Although they don&#8217;t tell us why the &#8220;at public expense&#8221; item is a good measure, they do offer a concrete reason for continuing to ask it year after year without modification. Their answer: &#8220;to preserve the trend line the question had established.&#8221; If this were so important to them, however, why did they dump their traditional voucher item in the first place? It provided a long series of data stretching back to the early 1970s and represented the best single source of information—anywhere—on the historical development of public attitudes toward vouchers. Yet this time series was abruptly ended, and its value entirely lost, when PDK/Gallup shifted to the &#8220;at public expense&#8221; item. It&#8217;s a little hard to accept, then, that this latter item has been continued over the years in order to &#8220;preserve the trend line.&#8221; Its very adoption ended the best trend line we had.</p>
<p class="tocheading"><strong>The Second Item</strong></p>
<p>Since 1995, the PDK/Gallup poll has included a second voucher item in addition to the &#8220;at public expense&#8221; item. This one, interestingly, is well worded and does convey the essence of a voucher program, asking respondents to consider a proposal that &#8220;would allow parents to send their school-age children to any public, private, or church-related school they choose.&#8221; It is a mystery why PDK thought it needed such an item, given its faith in the &#8220;at public expense&#8221; item and given its belief that no information about the purpose of vouchers (expanded choice) should be provided.</p>
<p>Be this as it may, we should expect this item to yield higher voucher support scores than the negatively biased &#8220;at public expense&#8221; item. And it does, consistently, year after year. Yet it doesn&#8217;t yield support scores that are as high as one might expect, given the findings of other surveys. A possible explanation, I argue, is that this item is always asked immediately <em>after </em>the &#8220;at public expense&#8221; item on the PDK survey—and thus after the voucher issue is already framed in a negative way. Responses to the second item, then, may be downwardly biased as well.</p>
<p>Rose and Gallup have nothing to say about this. Instead, they lead us into a Kafkaesque world of bureaucratic obfuscation. They say that I misunderstand the relationship between PDK and the Gallup Organization. While I talk in my article as though PDK has researchers that design surveys, the researchers and design decisions are all Gallup&#8217;s. By shifting the responsibility in this way, Rose and Gallup have a rationale for not responding to the basic points at issue and (implicitly) kicking the ball over to the Gallup Organization, which has &#8220;exclusive responsibility.&#8221; Yet Gallup does not pick up the ball and run with it. In fact, the official Gallup response, written by senior editor David Moore, is all of three paragraphs long, and has nothing to say about most of the issues.</p>
<p>I am not privy to inside information about how PDK and Gallup design their annual survey and cannot know who is responsible for what. But I do know this. The rejoinder by Lowell Rose and Alec Gallup was submitted to <em>Education Next </em>as the official response of PDK. Yet Alec Gallup is the cochairman of the Gallup Organization. His name is not only on this official PDK response; it has also been on every PDK/Gallup annual poll since 1986. And PDK and Gallup have teamed up to produce these polls every year for the past 34 years. In the final analysis, it doesn&#8217;t make any difference who is responsible for the design decisions. The decisions were made, and these authors—Gallup in particular—are in a position to know how and why the decisions were made as they were. As is David Moore.</p>
<p>Getting back to substance: Why is the well-worded question always asked after the &#8220;at public expense&#8221; item, and what are the likely consequences? Rose and Gallup are silent, but Moore offers a response (if ever so brief). He admits that placing the well-worded item after the &#8220;at public expense&#8221; item could downwardly bias the survey responses. But he says that the initial ordering (in 1996) was determined &#8220;as a matter of chance,&#8221; and that since then the ordering has been kept intact &#8220;to protect the integrity of the trend data they produce. Otherwise, any changes in results could be due to a change in the order of the questions, not to any real change in public opinion.&#8221; (Remember this quote, folks.)</p>
<p>Moore is right to say that the ordering of questions is crucial. Precisely because this is so, however, it is hard to fathom why Gallup would leave the ordering of these two survey items to chance. In 1993 the &#8220;at public expense&#8221; item (asked alone) yielded a support score of 24 percent. Just a year later, in 1994, the well-worded item (asked alone) yielded a support score of 45 percent. This is a huge difference, and it clearly shows that the &#8220;at public expense&#8221; item portrays vouchers in a far more negative light than the well-worded item does. Asking one item right after the other on the same survey, then, could well affect the way people respond to whichever item comes second. I would think, in light of this, Gallup researchers would have pondered the question-ordering issue long and hard—and considered, as well, not putting the two on the same survey, or at least asking them in different sections of the survey, far apart. The notion that they just flipped a coin, and that the coin flip happened to favor putting the negative item immediately before the well-worded item, is difficult to accept.</p>
<p class="tocheading"><strong>Changes in the Survey</strong></p>
<p class="tocheading">
<p>The issues I&#8217;ve discussed thus far are important to an evaluation of the &#8220;facts&#8221; generated by PDK&#8217;s surveys. Still, they cannot explain why both the well-worded item and the &#8220;at public expense&#8221; item would register a sudden drop in voucher support over the last few years. Even if the measures are biased, the fact that their content and ordering have remained constant over the whole time period means that something else must have happened—some change—to cause scores to plummet in recent years. According to PDK, this something is that public opinion has changed: Americans are much less sympathetic toward vouchers than before. What PDK doesn&#8217;t say, however, is that <em>the survey itself was changed</em>—and in ways that almost surely caused their findings on voucher support to shift downward.</p>
<p>Here again, the issue is one of question ordering. Before the 2000 survey, the two voucher items were immediately preceded on the survey by items asking respondents to grade the public schools from A to F. But in 2000, PDK injected five additional items between the grading items and the two voucher items. These new items dramatically changed the lead-in to the voucher items in several ways. Most important, they portrayed vouchers as a stark alternative to the public school system and implied that people who support the public schools—which, of course, most people do—cannot at the same time support vouchers. In 2001 PDK compounded the confusion by changing the survey again, deleting three of the five items, but keeping the two items that portray vouchers as being opposed to public schooling.</p>
<p>Any competent researcher would expect such changes in question ordering to affect responses to the voucher items. It is obvious, moreover, that the effects should be negative, and that they could easily account for the &#8220;significant decline&#8221; that PDK attributes to public opinion in 2000 and 2001. I should note that they cannot account for the fact that the &#8220;at public expense&#8221; item (but not the other item) seems to begin its decline in 1999, a year before the survey was changed. But it would be a mistake to attribute much significance to this. The PDK/Gallup survey, like all surveys, is subject to sampling error; even if there were no change in public opinion whatsoever, its support scores could randomly vary by several percentage points from year to year. Indeed, the &#8220;margin of error&#8221; for this survey is plus or minus 4 percent. There is little basis, then, for asserting that a decline clearly began in 1999. There is a strong basis, however, for <em>expecting </em>a &#8220;significant decline&#8221; after that point, simply due to the survey&#8217;s change in question ordering.</p>
<p>This is perhaps the key point in my entire argument, and it should have been the prime focus of the PDK and Gallup responses. But it wasn&#8217;t. After asserting early on that the Gallup Organization is &#8220;continuously alert to the possibility of -order bias,&#8217;&#8221; Rose and Gallup tell a story about how the new questions found their way onto the survey—but remarkably, they have nothing to say about why these items were placed immediately before the voucher items in 2000 and 2001, and do not even acknowledge that doing so threatened to bias the voucher responses.<br />
Even more remarkable is Moore&#8217;s official response for Gallup. As I noted earlier, he explicitly acknowledges that changes in question ordering can lead to changes in results (recall the quotation I asked you to remember), and he uses this fact to explain why Gallup continued to place the &#8220;at public expense&#8221; item before the well-worded item year after year. Yet by changing the lead-in questions to these voucher items, Gallup had done exactly what Moore said should not be done. Egregiously. There can be no justification for this and, faced with a hopeless contradiction, Moore&#8217;s response is simply to ignore it. On this most pivotal of issues, he has nothing at all to say.</p>
<p>He does, at least, have something to say about a related issue. In my article, I argued that other surveys, including surveys by Gallup itself, do not support the &#8220;significant decline&#8221; thesis. Moore ignores the surveys carried out by other organizations, but he does make a point of rejecting my argument about Gallup, saying, &#8220;Apart from the trends in the PDK/Gallup polls, the Gallup Organization has asked no other series of voucher questions repeatedly during the 1990s.&#8221;</p>
<p>This sounds definitive, as Moore is the spokesman for Gallup, and Gallup must know its own surveys. Yet his statement is carefully parsed. It is true that Gallup did not ask a &#8220;series&#8221; of questions &#8220;repeatedly&#8221; over this entire time period. However, in 1996 Gallup asked the same well-worded voucher question on two separate occasions, leading in both cases to support scores of 59 percent. It asked the same question again in 2000, producing a support score of 56 percent. And it asked an almost-identical question in 2001, producing a support score of 62 percent. These data were generated by Gallup&#8217;s own surveys, just as I claimed, and they are clearly <em>not</em> compatible with the &#8220;significant decline&#8221; thesis. They suggest, as I think the larger body of survey evidence does, that public support for vouchers probably has not changed much over the past five years.</p>
<p>Moore should have recognized the internal conflict within Gallup&#8217;s own data, seen the incompatibility as an important (and genuinely interesting) issue, and dealt with it. But he chose to avoid the whole thing.</p>
<p class="tocheading"><strong>The Gallup Experiment</strong></p>
<p>In January 2001 Gallup&#8217;s own researchers conducted an experiment to see if question wording affects survey results on the voucher issue. They began with the two PDK items as models, made a few modifications (in which they dropped the &#8220;at public expense&#8221; phrase from one item, not surprisingly, but kept its focus on subsidies for going private), and presented each item to <em>separate </em>samples of respondents. The results: voucher support was 48 percent as measured by the modified &#8220;at public expense&#8221; item, but 62 percent (the figure I used above) as measured by the well-worded item, whose content was virtually the same as before.</p>
<p>This experiment is obviously of direct relevance here. It shows that there is a huge difference between the two measures when they are asked separately. It shows that Gallup&#8217;s own researchers think the phrase &#8220;at public expense&#8221; ought to be dropped. It suggests that support for vouchers may be much higher than PDK has been claiming and that something seems to be suppressing responses to the well-worded question on the PDK survey. It contradicts the claim that support for vouchers has &#8220;significantly declined.&#8221; And the entire experiment was a Gallup operation.</p>
<p>Yet Rose, Gallup, and Moore don&#8217;t even acknowledge that this experiment was ever carried out. Worse, they assert that there are no Gallup surveys, aside from the PDK / Gallup poll, that are of any relevance here—and that I have misled everyone. What can they possibly be thinking?</p>
<p class="tocheading"><strong>Conclusion</strong></p>
<p class="tocheading">
<p>The issues I&#8217;ve raised are objective issues of survey methodology, having to do with the content and ordering of questions. PDK and Gallup should have responded by taking my charges one by one and dealing with them in a serious, scientific way. This is normal procedure in any area of research: researchers do studies, other researchers respond with objectively based criticisms, the former respond in turn, and through such back-and-forth exchanges the research community—and society—moves toward a better understanding of the world.</p>
<p>PDK and Gallup have shown little interest in participating in such a process. I am heartened that they plan to carry out a test to explore one of the issues: whether the second voucher item is biased downward by being asked right after the &#8220;at public expense&#8221; item. But this is a small step, and not really critical to our debate. It does nothing to explore the bias of the &#8220;at public expense&#8221; item itself. Most important, it does nothing to determine whether the changes they made to their survey in 2000 and 2001 biased their results.</p>
<p>All in all, PKD and Gallup simply haven&#8217;t dealt with the issues. Their responses here are largely vacuous. Even so, our exchange will have been worth it if PDK and Gallup design their future surveys with greater sensitivity to the need for justifiable measures and methods—and with some sense that, if they don&#8217;t, people will call them to account. That is the way all other researchers live their professional lives. PDK and Gallup should have to do the same.</p>
<p><em>-Terry Moe is a professor of political science at Stanford University and a senior fellow at the Hoover Institution.</em></p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=3365396&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/dodging-the-questions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Voucher Research Controversy</title>
		<link>http://educationnext.org/voucherresearchcontroversy/</link>
		<comments>http://educationnext.org/voucherresearchcontroversy/#comments</comments>
		<pubDate>Thu, 06 Jul 2006 20:12:39 +0000</pubDate>
		<dc:creator>Paul E. Peterson</dc:creator>
				<category><![CDATA[Check the Facts]]></category>

		<guid isPermaLink="false">http://content.hks.harvard.edu/educationnext/?p=3288426</guid>
		<description><![CDATA[New looks at the New York City evaluation ]]></description>
			<content:encoded><![CDATA[<table border="0" cellspacing="0" cellpadding="1" width="300" align="right">
<tbody>
<tr>
<td><img src="http://educationnext.org/files/ednext20042_pete1.jpg" border="0" alt="" width="290" height="244" /></td>
</tr>
</tbody>
</table>
<p><strong><span class="tocheading"><br />
&#8220;Principal Stratification Approach to Broken Randomized Experiments: A Case Study of School Choice Vouchers in New York City,&#8221; &#8220;Comment,&#8221; and &#8220;Rejoinder&#8221;</span></strong><br />
<span style="font-family: arial,helvetica,sans-serif"><em>By John Barnard, Constantine E. Frangakis, Jennifer L. Hill, and Donald B. Rubin; &#8220;Comment&#8221;<br />
by Alan Krueger and Pei Zhu</em><br />
<em><br />
Journal of the American Statistical Association, June 2003.</em></span></p>
<p><strong><br />
</strong></p>
<p><strong><span class="tocheading">Another Look at the New York City School Voucher Experiment</span></strong></p>
<p><span style="font-family: arial,helvetica,sans-serif"><em>By Alan Krueger and Pei Zhu</em><br />
<em>Presented at the National Press Club, April 2003</em>.</span></p>
<p>In <em>The Education Gap: Vouchers and Urban Schools </em>(Brookings, 2002), we and our colleagues reported that attending a private school had no discernible impact, positive or negative, on the test scores of non-African-American students participating in school voucher programs in Washington, D.C., New York City, and Dayton, Ohio. But after one, two, and three years in New York City, and after two years in Washington and Dayton, significantly positive impacts for African-Americans were observed.</p>
<p>Our results came from randomized field trials, which are generally thought to be the gold standard for research on human subjects. In such studies, subjects are randomly assigned to treatment and control groups by means of a lottery. In the best of worlds, researchers are able to collect information on the subjects&#8217; characteristics before the lottery begins, enabling them to confirm that the lottery, in fact, worked as intended. If the treatment and control groups are similar at the beginning of the study, any differences between the two groups that emerge over time can be attributed to the programmatic intervention-in the case at hand, using a voucher to switch from a public to a private school. The results reported in this article are thus to be understood as the difference in test scores between those students who used vouchers to attend a private school and those of their public school peers who would have used a voucher had they been offered one.</p>
<p>Despite the strength of our evaluation&#8217;s design, the findings have not been without controversy. Specifically, two secondary analyses of the New York City data have recently been published, with widely diverging results. One study, conducted by a group of distinguished statisticians, John Barnard, Constantine Frangakis, Jennifer Hill, and Donald Rubin (hereinafter referred to as Barnard), has confirmed our first-year results but has been virtually ignored in the public media. The other, by Princeton economists Alan Krueger and Pei Zhu, has contradicted our results and twice received favorable coverage in the <em>New York Times</em>, where Krueger is an occasional columnist.</p>
<p>From the standpoint of pure innovation and analytical rigor, Barnard has produced the more impressive piece. As befitting an article published in the nation&#8217;s leading statistics journal, it introduces new statistical techniques to deal with problems that often emerge in randomized field trials: 1) missing data (for instance, not all students who initially joined the study participated in the follow-up testing sessions), and 2) noncompliance (some students, for example, refused the vouchers that were offered to them).</p>
<p>It remains to be seen whether the statisticians&#8217; proposed innovation becomes more widely used. At its current stage of technical development, it permits the examination of effects only after one year. Also, in using the technique, Barnard opted to restrict their analysis to those families with only one child participating in the voucher program.</p>
<p>Despite differences in statistical approach and in the selection of students to be included in the analysis, Barnard&#8217;s findings are largely consistent with those we reported. While we estimated that, after one year, African-American students scored 7 percentile points higher on the math portion of the Iowa Test of Basic Skills than their peers in public schools, Barnard reports impacts of 6 percentile points for African-American students from low-performing public schools. (Almost all the African-American students came from schools with average test scores below the district mean; the few that did not had almost identical average impacts, but the number of available observations was too small to recover precise estimates.)</p>
<p>By contrast, Krueger and Zhu concluded, &#8220;The provision of vouchers in New York City probably had no more than a trivial effect on the average test performance of participating black students.&#8221; This conclusion rests primarily on three methodological decisions that distinguish their research from both our study and that of Barnard:</p>
<p>• We and Barnard let the mother&#8217;s ethnicity define the student&#8217;s ethnicity, while Krueger and Zhu defined a student as African-American if either parent was African-American.</p>
<p>• We and Barnard considered the results for only those students in grades 1-4, almost all of whom took achievement tests before the lottery. This provided us with what are known as &#8220;baseline test scores&#8221; that can be used to obtain more precise estimates of program effects. By contrast, Krueger and Zhu also included a large number of kindergartners for whom no baseline test scores were available.</p>
<p>• We and Barnard always adjusted the data to account for students&#8217; baseline test scores in estimating our results. Krueger and Zhu, in their preferred results, as presented in their &#8220;Comment&#8221; on Barnard, exclude these baseline test scores.</p>
<p>All three of these alterations to the research strategy must be made in order to obtain results that differ substantially from those that we and Barnard obtained. Using any one or two of these different strategies does not generate appreciably different results.</p>
<p class="tocheading"><strong>How to Define African-American</strong></p>
<p>Let&#8217;s consider Krueger and Zhu&#8217;s decision to classify students as African-American if either parent was African-American. Krueger and Zhu regard this decision as a key reason why they obtained results different from ours.</p>
<p>To understand the issue, bear in mind that because many of the students were very young, their ethnic backgrounds were ascertained from information provided in questionnaires filled out by the adults who accompanied them to the testing sessions. These adults were asked to report the ethnicity of the student&#8217;s mother and, separately, the student&#8217;s father. They could assign parents to one of nine categories, five of which are: Black/African-American (non-Hispanic); White (non-Hispanic); Puerto Rican; Dominican; and Other Hispanic. Classifying a child&#8217;s ethnicity is usually straightforward, because both parents are of the same background. In cases where parents were not of the same ethnicity, we classified the child by the mother&#8217;s ethnicity, simply because most children lived with their mothers, 74 percent of whom were single parents. Sixty-seven percent of the students lived with only their mother, compared with just 2 percent who lived with only their father. Mothers accompanied 84 percent of children to testing sessions; in 94 percent of the cases, the accompanying adult claimed to be a caretaker of the child.</p>
<p>Given the fact that these children tended to live with their mothers (and, often, not with their fathers), the decision to link the child&#8217;s ethnicity to the mother&#8217;s appears perfectly sensible. Alternatively, one might classify students as African-American only if both parents are African-American or if the child&#8217;s primary parental caretaker (usually the mother, but on a few occasions the father) is African-American.</p>
<p>Eschewing these alternatives, Krueger and Zhu used a unique classification scheme. They identify students of mixed heritage as African-American as long as either the mother or the father is African-American. If the mother was white but the father was African-American, the child was defined as &#8220;black, non-Hispanic.&#8221; Even if a child had a Hispanic mother and an African-American father, Krueger and Zhu still classified the child as &#8220;black, non-Hispanic.&#8221; Unless one departs from the standard practice of using mutually exclusive categories, students could not be classified as Hispanic or white if either parent was African-American. Krueger and Zhu defend this classification scheme on the grounds that it is &#8220;symmetrical.&#8221; But symmetry is hardly the word for a scheme that classifies Hispanics, whites, and African-Americans according to different principles.</p>
<p>Nevertheless, not much turns on how one defines a child&#8217;s ethnicity. Regardless of one&#8217;s definition, impacts after three years that range between 7 and 8 percentile points are observed for African-Americans in New York City (see Figure 1). If one classifies a student&#8217;s ethnicity by the mother&#8217;s (the approach we prefer), the effects are 8 percentile points; if one uses either the mother or the father (the approach favored by Krueger and Zhu) the effects are 7 percentile points, a result that is not significantly different from the one originally reported. By itself, altering the definition of a child&#8217;s ethnicity provides no basis whatsoever for concluding that effects disappear.</p>
<p style="text-align: center"><img class="aligncenter" src="http://educationnext.org/files/ednext20042_petefig1.gif" border="0" alt="" width="639" height="322" /></p>
<p class="tocheading"><strong>Students without Baseline Test Scores</strong></p>
<p>Figure 1 presents results for students with baseline test-score information-the first bar reporting impacts for the definition of African-American originally used, the latter three bars for alternative definitions. The figure&#8217;s results are based on analyses that exclude from the study all kindergartners, none of whom were tested at baseline. Also excluded are the 10 percent of the students in grades 1-4 who were sick, who refused to take the test, or whose tests were lost in the administrative process.</p>
<p>Krueger and Zhu object to the exclusion of any students from the study, claiming that this constitutes the &#8220;most important&#8221; deficiency of our analysis, as well as that of Barnard. But even when all students are included in the analysis, African-American students who attended private schools scored significantly higher than their public school peers (see Figure 2).</p>
<p>Nonetheless, it is problematic to include students in a study if you don&#8217;t know what their achievement level was at the beginning. How well students perform on a test at, say, age seven, is tightly connected to how well they will do at age eight, nine, or ten. In fact, the correlations between baseline and follow-up test scores in New York consistently hover around 0.7. By comparison, the correlations between mother&#8217;s level of education and follow-up scores were only about 0.1.</p>
<p>Restricting the study to those students for whom baseline test scores are available affords a check on whether the lottery worked as intended and whether any problems arose downstream. For these students, all looks fine on both accounts.</p>
<p>When including all students, even those lacking baseline test scores, one can only hope that the two groups are similar with respect to this critical characteristic. Nonetheless, Krueger and Zhu defend their inclusion on the grounds that &#8220;because assignment to treatment status was random . . . a simple comparison of means between treatments and controls without conditioning on baseline scores provides an unbiased estimate of the average treatment effect.&#8221; This claim, says Barnard, &#8220;is simply false.&#8221;</p>
<p>If not quite false, the claim is at least dubious, because there were many ways for the treatment and control groups to become unbalanced. For example, about a third of the students did not remain in the study into the third year-a fairly standard rate of attrition from this kind of research protocol, but one that raises concerns that the treatment and control groups might have lost students with different baseline test scores. For this reason, we limited our analysis to those students for whom baseline scores were available, and hence for whom we were able to verify that the treatment and control groups did not become unbalanced.</p>
<p>But perhaps something else is to be gained from including all students, regardless of whether baseline information was available. Krueger and Zhu suggest that by adding these cases one can generalize findings to another grade level (kindergartners). Unfortunately, this is a hazardous generalization, given the fact that the results for kindergartners were significantly different from those for the older students. African-American students in grades 1-4 scored significantly higher if they attended private school, a result observed in all three years of the study. The results for kindergartners, meanwhile, were considerably more erratic; the effect of attending a private school for three years was a negative 13.9 percentile points. In the absence of baseline scores, we don&#8217;t know whether the findings for kindergartners are genuine or simply the result of errors in the administrative process.</p>
<p>Krueger and Zhu also note that their inclusion of all students in the sample generates more precise estimates. But gains in precision obtained by increasing the number of students observed will be offset by losses associated with failing to control for baseline test scores. One can assess the extent to which these competing forces balance each other by comparing the estimates&#8217; standard errors: the smaller the errors, the more precise the estimate. As it turns out, the standard errors are larger, not smaller, when estimating statistical models that include all students but do not control for baseline test scores.</p>
<p>A compromise strategy, suggested by Krueger and Zhu, includes all students and adjusts for baseline test scores whenever possible. This analytic approach generates more precise estimates, the results from which are presented in Figure 2. But since these analyses also introduce risks of bias (principally by including the kindergartners for whom no baseline scores were available), the results in Figure 2 are inferior to the results provided in Figure 1. Nonetheless, they still reveal significantly positive effects of attending private schools on African-American test scores. In other words, even if one includes kindergartners in the study, as Krueger and Zhu recommend, the essentials of our original finding remain intact.</p>
<p style="text-align: left"><img class="aligncenter" src="http://educationnext.org/files/ednext20042_petefig2.gif" border="0" alt="" width="639" height="333" /><br />
Krueger and Zhu have not accepted these findings, however. Instead, they have said that they cannot obtain equivalent results when they attempt to conduct an analysis identical to ours. But this claim is misleading. In fact, Krueger and Zhu&#8217;s results, available by correspondence, hardly differ from ours. As the first set of columns in Figure 3 shows, among students with baseline test scores, we both find that the estimated year three private school impact is 8.4 percentile points for all African-Americans (as defined by the mother&#8217;s ethnicity, our preferred definition). And, as shown in the second set of columns in Figure 3, we both find an impact of 7.6 percentile points for African-Americans when using Krueger and Zhu&#8217;s preferred definition of African-American (students whose mother or father is African-American). Moreover, when students without baseline scores are added to the analysis, they obtain results that are, once again, virtually indistinguishable from ours (see the last two sets of columns in Figure 3).</p>
<p style="text-align: left">In other words, Krueger and Zhu also now report consistently positive results for African-Americans, regardless of how ethnicity is defined, even when kindergartners are included in the analysis-as long as baseline scores (and only baseline scores) are taken into account in the statistical estimation of programmatic effects.</p>
<p style="text-align: center"><img class="aligncenter" src="http://educationnext.org/files/ednext20042_petefig3.gif" border="0" alt="" width="640" height="325" /></p>
<p class="tocheading">
<p class="tocheading"><strong>To Ignore or Not to Ignore Baseline Test Scores</strong></p>
<p>Neither changing the definition of African-American nor adding students for whom baseline test scores are missing appreciably changes the results we originally reported. To get different results, still a third methodological step is required. Krueger and Zhu argue that, to avoid a biased estimate, one must ignore baseline test scores, even for those students for whom these are available. But if including baseline scores introduced bias, the magnitude of the effect would change substantially. It does not. Adding baseline scores shifts estimated effects by less than half a percentile point.</p>
<p>Not only is no bias introduced, but including baseline test scores has the advantage of yielding more precise results, allowing researchers to reach firmer conclusions about the efficacy of a programmatic intervention. Estimated impacts from models that control for baseline scores are significant at the .05 level (using the two-tail test), while the less-precise results in models that do not control for baseline scores are significant at only the .10 level, using a one-tail test (a significance level below the threshold Krueger and Zhu find acceptable).</p>
<p>In sum, Krueger and Zhu take three methodological steps to generate results that are not statistically significant: 1) changing the definition of the group to be studied, 2) adding students without baseline test scores, and 3) ignoring the available information on baseline test scores, even though this yields less precise results.</p>
<p>By contrast, Barnard agreed with our decisions to: 1) use the mother&#8217;s ethnicity as the basis for defining the child&#8217;s, 2) focus on those students in the grades for which baseline scores were available for most students, and 3) control for baseline scores, whenever possible. Using pioneering statistical techniques, Barnard reports similar findings, while Krueger and Zhu venture far afield to uncover contrary ones.</p>
<p class="tocheading"><strong>The Value of Randomization</strong></p>
<p>Given differences of opinion among researchers, it is easy to jump to the conclusion that randomized field trials are not the gold standard they are thought to be. If social scientists can reach opposite conclusions from the same data set, then research, even from randomized field trials, may do little to inform policy debates.</p>
<p>We take a different view. In New York, the results reported by all parties are consistently positive-only the magnitude of the effects and the level of statistical significance fluctuate. Furthermore, different statistical techniques generate roughly equivalent results. Moreover, the results that we report are consistent with past research on public and private schools. More than 25 years ago, James Coleman and his colleagues found that attending a private school was more beneficial for black students than for whites, as measured by test scores. More recently, Princeton economist Cecilia Rouse, after reviewing the research literature, concluded that &#8220;the overall impact of private schools is mixed, [but] it does appear that Catholic schools generate higher test scores for African-Americans.&#8221; Another literature review, conducted by economists Jeffrey Grogger and Derek Neal, found few clear-cut gains for white students, while &#8220;urban minorities in Catholic schools fare much better than similar students in public schools.&#8221;</p>
<p>Controversies surrounding randomized experiments can nonetheless be reduced by collecting baseline data on the outcome variable of greatest interest-in this case, students&#8217; test-score performance. In the absence of this information, experiments devolve into endless arguments over whether random assignment actually occurred and whether the two groups being compared are genuinely equivalent. Consider, for example, the recent skepticism directed toward Tennessee&#8217;s Project STAR study, a randomized field trial on class size that failed to collect baseline test-score data.</p>
<p>Still, whether or not one restricts the analysis to those cases where baseline test scores were available, results are clear. In New York, private-school attendance positively affected the test scores of African-American students, but not those of any other ethnic group. For this reason, we think that the evidence from New York continues to support the conclusion-also reached in a wide variety of earlier studies-that disadvantaged African-American students living in urban environments benefit from private schooling.</p>
<p><em>-Paul E. Peterson, the editor-in-chief of </em>Education Next<em>, and William G. Howell are professors at Harvard University. They are the principal authors of </em>The Education Gap: Vouchers and Urban Schools <em>(Brookings, 2002). To view the unabridged version of this article, log on to www.educationnext.org.</em></p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=3288426&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/voucherresearchcontroversy/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Reframing the Mind</title>
		<link>http://educationnext.org/reframing-the-mind/</link>
		<comments>http://educationnext.org/reframing-the-mind/#comments</comments>
		<pubDate>Fri, 30 Jun 2006 22:50:47 +0000</pubDate>
		<dc:creator> </dc:creator>
				<category><![CDATA[Check the Facts]]></category>
		<category><![CDATA[Curriculum]]></category>

		<guid isPermaLink="false">http://content.hks.harvard.edu/educationnext/?p=3261311</guid>
		<description><![CDATA[Howard Gardner and the theory of multiple intelligences ]]></description>
			<content:encoded><![CDATA[<p><img src="http://educationnext.org/files/ednext20043_18.jpg" border="0" alt="" hspace="2" vspace="2" width="280" height="360" align="right" /><strong><span class="tocheading">Frames of Mind: The Theory of Multiple Intelligences</span></strong><em><br />
(Basic Books, 1983)</em></p>
<p><strong><span class="tocheading">Multiple Intelligences: The Theory into Practice</span></strong><em><br />
(Basic Books, 1993)</em></p>
<p><strong><span class="tocheading">Intelligence Reframed: Multiple Intelligences for the 21st Century</span></strong><em><br />
(Basic Books, 1999)</em></p>
<p><em>By Howard Gardner</em></p>
<p><em>Checked by Daniel T. Willingham</em></p>
<p>What would you think if your child came home from school and reported that the language-arts lesson of the day included using twigs and leaves to spell words? The typical parent might react with curiosity tinged with suspicion: Is working with twigs and leaves supposed to help my child learn to spell? Yes, according to Thomas Armstrong, author of <em>Multiple Intelligences in the Classroom</em>, especially if your child is high in &#8220;naturalist&#8221; intelligence&#8211;one of eight distinct intelligences that Harvard University scholar Howard Gardner claims to have identified. However, if your child possesses a high degree of what Gardner terms &#8220;bodily-kinesthetic&#8221; intelligence, Armstrong suggests associating movement with spelling. For example, a teacher might try to connect sitting with consonants and standing with vowels.</p>
<p>Armstrong is far from alone in placing faith in Gardner&#8217;s theory of &#8220;multiple intelligences.&#8221; Gardner&#8217;s ideas have been a significant force in education for the past 20 years&#8211;significant enough that they bear close study. How does the scientific community regard the theory of multiple intelligences, and what impact should the theory have on education?</p>
<p class="tocheading"><strong>Central Claims</strong></p>
<p>Gardner first proposed his theory in 1983. Since then, it has undergone incremental but not fundamental change, including the addition of one intelligence (bringing the total to eight), the rejection of others, and consideration of the theory&#8217;s applications. The theory rests on three core claims:</p>
<p>• Gardner says that most psychometricians, those who devise and interpret tests as a way of probing the nature of intelligence, conceive of intelligence as unitary. In <em>Intelligence Reframed</em>, Gardner&#8217;s most recent restatement of his general theory, he writes, &#8220;In the ongoing debate among psychologists about this issue, the psychometric majority favors a general intelligence perspective.&#8221;</p>
<p>This is not an accurate characterization of the position taken by most psychometricians. As will be shown, the vast majority regard intelligence not as a single unified entity, but as a multifaceted phenomenon with a hierarchical structure.</p>
<p>• <em><strong>There are multiple, independent intelligences</strong>.</em> There are three parts to this claim, and it is important to appreciate all three. First, Gardner offers a new definition of <em>intelligence</em>, describing it as &#8220;a biopsychological potential to process information that can be activated in a cultural setting to solve problems or create products that are of value in a culture.&#8221; Previous definitions were limited to cognition or thought; one was intelligent to the extent that one could solve problems and adapt effectively to one&#8217;s environment using thinking skills. Gardner self-consciously broadens the definition to include effective use of the body and thinking skills relevant to the social world. He also extends the functionality of intelligence to include the crafting of useful products, not just the solving of problems. Second, Gardner claims to have identified some (but not all) of the several types of intelligence, which I describe below. Third, he claims that these multiple intelligences operate independently of one another.</p>
<p>• <em><strong>The multiple intelligences theory has applications to education</strong>. </em>Gardner has been careful to say that he has proposed a scientific theory that should not be mistaken for a prescription for schooling. He makes clear that the educational implications of children&#8217;s possessing multiple intelligences can and should be drawn, but he believes that many possible curricula and methods could be consistent with the theory. The sole general implication he supports is that children&#8217;s minds are different, and an education system should take account of those differences, a point developed in diverse ways by his many followers.</p>
<p class="tocheading"><strong>One Intelligence or Many?</strong></p>
<p>Let&#8217;s evaluate each of Gardner&#8217;s claims in turn, beginning with how psychometricians view intelligence. In the early 20th century, many psychometricians did in fact think of intelligence as a unitary trait, just as Gardner now claims. The thinking at that time was articulated by Charles Spearman, who suggested that a single factor (he called it <em>g</em>, for <em>general</em>) underlay all intelligent behavior. If you had a lot of <em>g,</em> you were smart; if you didn&#8217;t, you weren&#8217;t. However, by the 1930s some researchers (notably Louis L. Thurstone) were already arguing for a multifaceted view of intelligence. One might be intelligent in the use of words, for example, but unintelligent mathematically. From the 1950s on, many psychometricians proposed hierarchical models, which may be thought of as a mixture of the single-factor and multiple-factor models. Except for a few holdouts, most psychologists now favor the hierarchical model.</p>
<p>How can one use data from tests of cognitive ability to evaluate the number of intelligences? A straightforward approach entails administering a number of separate tests thought to rely on different hypothesized intelligences. Suppose tests 1 and 2 are different tests of verbal ability (for example, vocabulary and spelling), and tests 3 and 4 are different tests of mathematical ability. If there is one intelligence, <em>g</em>, then <em>g</em> should support performance on all four tests, as shown in diagram A of Figure 1 (this page). A high score on test 1 would indicate that the test-taker is high in <em>g,</em> and he or she should perform well on all of the other tests.</p>
<p><img src="http://educationnext.org/files/ednext20043_18fig1.gif" border="0" alt="" width="516" height="288" /></p>
<p>Suppose, however, that there are two intelligences&#8211;one verbal and one mathematical, as shown in diagram B of Figure 1. In that case, a high score on test 1 would predict a high score on test 2, but would tell us nothing about the individual&#8217;s performance on the math tests, 3 and 4. Performance on those tests would depend on mathematical intelligence, which is separate and independent of verbal intelligence.</p>
<p>The data support neither of these views. To continue with our hypothetical example, the data show that all of the test scores, 1 through 4, are somewhat related to one another, which is consistent with the existence of <em>g</em>. But scores from tests of math ability are more related to one another than they are to verbal scores; the same goes for verbal scores. A hierarchical model, shown in diagram C of Figure 1, fits this pattern. In this model, <em>g</em> influences both mathematical and verbal cognitive processes, so performance on math and verbal tests will be somewhat related. But mathematical competence is supported not just by <em>g,</em> but by the efficacy of a mathematical intelligence that is separate and independent of a verbal intelligence. That&#8217;s why math scores are more related to each other than they are to verbal scores. It also explains how it is possible for someone to be quite good in math, but just mediocre verbally. This logic applies not only to the restricted example used here (math and verbal) but also to a broad spectrum of tests of intellectual ability.</p>
<p>The hierarchical view of intelligence received a strong boost from a landmark review of the published data collected over the course of 60 years from some 130,000 people around the world. That massive review, performed by the late University of North Carolina scholar John Carroll, concluded that the hierarchical view best fits the data. Researchers still debate the exact organization of the hierarchy, but there is a general consensus around the hierarchical view of intelligence. Thus Gardner&#8217;s first claim&#8211;that most psychometricians believe that intelligence is unitary&#8211;is inaccurate.</p>
<p class="tocheading"><strong>What Are the Intelligences?</strong></p>
<p>Gardner&#8217;s second claim is that individuals possess at least eight independent types of intelligence. The following list includes a definition of each along with examples Gardner has provided of professions that draw heavily on that particular intelligence.</p>
<p>• <em><strong>Linguistic</strong>:</em> facility with verbal materials (writer, attorney).</p>
<p>• <em><strong>Logico-mathematical</strong>:</em> the ability to use logical methods and to solve mathematical problems (mathematician, scientist).</p>
<p>•<strong> </strong><em><strong>Spatial</strong>:</em> the ability to use and manipulate space (sculptor,</p>
<p>architect).</p>
<p>• <em><strong>Musical</strong>:</em> the ability to create, perform, and appreciate music (performer, composer).</p>
<p>• <em><strong>Bodily-kinesthetic</strong>:</em> the ability to use one&#8217;s body (athlete, dancer).</p>
<p>• <em><strong>Interpersonal</strong>:</em> the ability to understand others&#8217; needs, intentions, and motivations (salesperson, politician).</p>
<p>• <em><strong>Intrapersonal</strong>:</em> the ability to understand one&#8217;s own motivations and emotions (novelist, therapist with self-insight).</p>
<p>• <em><strong>Naturalist</strong>:</em> the ability to recognize, identify, and classify flora and fauna or other classes of objects (naturalist, cook).</p>
<p>Gardner claims that everyone has all eight intelligences to some degree, but each individual has his or her own pattern of stronger and weaker intelligences. Gardner also argues that most tasks require more than one intelligence working together. For example, the conductor of a symphony obviously uses musical intelligence, but also must use interpersonal intelligence as a group leader and bodily-kinesthetic intelligence to move in a way that is informative to the orchestra. The claim of separate and independent intelligences is, of course, central to Gardner&#8217;s theory. How do we know that these intelligences are independent?</p>
<p>It is important to bear in mind that the hierarchical model described in the previous section is not a theory, but a <em>pattern of data</em>. It is a description of how test scores are correlated. A theory of intelligence must be consistent with these data; the pattern of data is not itself a theory. For example, the data do not tell us what <em>g</em> is or how it works. The data tell us only that there is <em>some </em>factor that contributes to many intellectual tasks, and if your theory does not include such a factor, it is inconsistent with existing data. Gardner&#8217;s theory has that problem.</p>
<p>Setting <em>g </em>aside, the claim of independence among the eight intelligences is also a problem. Data collected over the past 100 years consistently show that performances on intellectual tasks are correlated. Even if Gardner&#8217;s theory did not include some general factor, it should at least provide a way to account for this correlation. The theory did not, and it was widely criticized for this failure. In some later writings, Gardner has said that he questions the explanatory power of <em>g</em>, not whether it exists&#8211;in other words, he doubts whether <em>g</em> makes much of a contribution to abilities Gardner deems important. He has also deemphasized the importance in his theory of whether the intelligences are truly independent.</p>
<p>Let&#8217;s allow, then, that the intelligences Gardner has identified are not independent, but that there are a number of distinguishable (but correlated) intellectual capabilities in addition to <em>g</em>. Has Gardner done a good job of cataloguing them? It is instructive to examine the criteria by which Gardner determines whether an ability is an intelligence. The criteria are shown in the table on page 22.</p>
<p><img src="http://educationnext.org/files/ednext20043_18chart.gif" border="0" alt="" width="590" height="305" /></p>
<p>Gardner&#8217;s eight criteria appear to be quite rigorous: the psychometric criterion described in the previous section and seven others that span different domains of investigation. But Gardner weakens them by demanding that only a majority be satisfied, and some are rather easy to satisfy. The psychometric criterion is the most rigorous of the eight, but Gardner has largely ignored it. The remaining criteria are so weak that they cannot restrain a researcher with a zest for discovering new intelligences.</p>
<p>For instance, a <em>humor intelligence</em> and a <em>memory intelligence</em> certainly meet a majority of the criteria. Humor and memory can be used to solve problems and create valued products in many cultures and so meet Gardner&#8217;s definition of <em>intelligence</em>. Both can be isolated by brain damage, each has a distinct developmental history, and there is evidence for the psychological separability of each. Some individuals show exceptional memory or sense of humor but no other remarkable mental abilities. The evolutionary plausibility of each intelligence is easy to defend as well. Humor would certainly be adaptive in a social species such as ours, and the adaptive nature of memory should be self-evident.</p>
<p>By these criteria I am also prepared to defend an <em>olfactory intelligence</em> and a <em>spelling intelligence</em> and to subdivide Gardner&#8217;s spatial intelligence into <em>near-space intelligence</em> and <em>far-space intelligence</em>, thus bringing the total number of intelligences to 13. (Gardner, for reasons that are not clear to me, excludes sensory systems as potential intelligences, but not action systems such as bodily-kinesthetic.)</p>
<p>The issue of criteria by which new intelligences are posited is crucial, and it is in the selection of criteria that Gardner has made a fundamental mistake. Gardner&#8217;s criteria make sense if one assumes extreme modularity in the mind, meaning that the mind is a confederation of largely independent, self-sufficient processes. Gardner argues that neuroscience bears out this assumption, but that is an oversimplification.</p>
<p>For example, suppose that mathematical and spatial intelligence have the structure depicted in Figure 2, where each letter represents a cognitive process. Mathematical reasoning requires the cognitive processes A through E. Spatial reasoning requires the processes B through F. Are math and spatial reasoning separate?</p>
<p><img src="http://educationnext.org/files/ednext20043_18fig2.gif" border="0" alt="" width="336" height="306" /></p>
<p>Most people would agree that they are not identical, but they are largely overlapping and don&#8217;t merit being called separate. By Gardner&#8217;s criteria, however, they likely would be. If we assume that each process (A through F) is localized in a different part of the brain, then if the part of the brain supporting process A were damaged, math ability would be compromised, but spatial ability would not, so the brain criterion would be met. If process A or process F had a different developmental progression than the others, the developmental criterion would be met. If A and F differ in their need for attentional resources, the experimental psychological criterion would be met. The criteria that Gardner mentions can be useful, but they do not signal <em>necessarily </em>separate systems. In fact, the one criterion that Gardner has routinely ignored&#8211;the psychometric&#8211;is the one best suited to the question posed: Are cognitive processes underlying a putative intelligence independent of other cognitive processes?</p>
<p>Gardner&#8217;s second claim&#8211;that he has described multiple, independent varieties of intelligence&#8211;is not true. Intellectual abilities are correlated, not independent. Distinguishable abilities do exist, but Gardner&#8217;s description of them is not well supported.</p>
<p class="tocheading"><strong>Should Theory Become Practice?</strong></p>
<p>For the educator this debate may be, as Shakespeare wrote, sound and fury, signifying nothing. What matters is whether and how the theory inspires changes in teaching methods or curriculum. The extent to which multiple intelligence ideas are applied is difficult to determine because few hard data exist to describe what teachers actually do in the classroom. Even statements of schools&#8217; missions are of limited usefulness, although dozens of schools claim to center their curriculum on the theory. An administrator might insert multiple intelligences language in an effort to seem progressive. Or an administrator&#8217;s enthusiasm may be sincere, but if the teachers are not supportive, the classroom impact will be minimal.</p>
<p>We are left with indirect measures. Textbooks for teachers in training generally offer extensive coverage of the theory, with little or no criticism. Furthermore, the ready availability of multiple intelligences classroom materials (books, lesson plans, and activities) leaves the impression that there is a market for such materials. The applications they suggest generally fall into two broad categories: curricular expansion and pedagogical stratagem.</p>
<p>Curriculum expansion suggests that schools should appeal to all of the intelligences. Some educators have called for a more inclusive approach that does not glorify any one of the intelligences at the expense of the others. The theory has also been viewed as providing a pedagogical stratagem&#8211;namely, to teach content by tapping all of the intelligences. For example, to help students learn punctuation, a teacher might have them form punctuation marks with their bodies (bodily-kinesthetic intelligence), assign an animal sound to each punctuation mark (naturalist intelligence), and sort sentences according to the required punctuation (logical-mathematical intelligence). The motive may be that students will most enjoy or appreciate the material when it is embedded in an intelligence that is their strength. In this sense, intelligences may be translatable. The student who is linguistically weak but musically strong may improve his spelling through a musical presentation.</p>
<p>Gardner has criticized both ideas. Regarding curriculum, Gardner argues that the goals of education should be set independently of the multiple intelligences theory, and the theory should be used to help reach those goals. In other words, he does not believe that status as an &#8220;intelligence&#8221; necessarily means that that intelligence should be schooled. This objection is doubly true if you doubt that Gardner has categorized the intelligences correctly.</p>
<p>On the subject of pedagogy, Gardner sees no benefit in attempting to teach all subjects using all of the intelligences. He also expresses concern that some educators have a shallow understanding of what it takes to really engage an intelligence. Gardner writes, &#8220;It may well be easier to remember a list if one sings it (or dances to it). However, these uses of the &#8216;materials&#8217; of an intelligence are essentially trivial. What is not trivial is the capacity to think musically.&#8221; It is therefore surprising that Gardner wrote the preface for Thomas Armstrong&#8217;s book, <em>Multiple Intelligences in the Classroom</em>, which includes many such trivial ideas, such as singing spellings and spelling with leaves and twigs, as mentioned earlier. In the preface Gardner says that Armstrong provides &#8220;a reliable and readable account of my work.&#8221; The inconsistency in Gardner&#8217;s views is difficult to understand, but I believe he is right in calling some applications trivial.</p>
<p>Gardner also writes that intelligences are not fungible; the individual low in logico-mathematical intelligence but high in musical intelligence cannot somehow substitute the latter for the former and understand math through music. An alternative presentation may serve as a helpful metaphor, but the musically minded student must eventually use the appropriate representation to understand math. Gardner is on solid ground here. There is no evidence that subject-matter substitution is possible.</p>
<p>Gardner offers his own ideas of how multiple intelligences theory might be applied to education. Teachers should introduce a topic with different <em>entry points</em>, each of which taps primarily one intelligence. For example, the narrational entry point uses a story (and taps linguistic intelligence), whereas the logical entry point encourages the use of deductive logic in first thinking about a topic. Entry points are designed to intrigue the student via a presentation in an intelligence that is a particular strength for him or her. Gardner also believes that a thorough understanding of a topic is achieved only through multiple representations using different intelligences. Hence significant time must be invested to approach a topic from many different perspectives, and topics should be important enough to merit close study.</p>
<p>How effective are Gardner&#8217;s suggested applications? Again, hard data are scarce. The most comprehensive study was a three-year examination of 41 schools that claim to use multiple intelligences. It was conducted by Mindy Kornhaber, a long-time Gardner collaborator. The results, unfortunately, are difficult to interpret. They reported that standardized test scores increased in 78 percent of the schools, but they failed to indicate whether the increase in each school was statistically significant. If not, then we would expect scores to increase in half the schools by chance. Moreover, there was no control group, and thus no basis for comparison with other schools in their districts. Furthermore, there is no way of knowing to what extent changes in the school are due to the implementation of ideas of multiple intelligences rather than, for example, the energizing thrill of adopting a new schoolwide program, new statewide standards, or some other unknown factor.</p>
<p>What is perhaps most surprising about Gardner&#8217;s view of education is that it is not more surprising. Many experienced educators probably suspected that different materials (songs, stories) engage different students and that sustained study using different materials engenders deep knowledge.</p>
<p class="tocheading"><strong>Multiple Talents</strong></p>
<p>One may wonder how educators got so confused by Gardner&#8217;s theory. Why do they believe that intelligences are interchangeable or that all intelligences should be taught? The answer is traceable to the same thing that made the theory so successful: the naming of various abilities as <em>intelligences</em>.</p>
<p>Why, indeed, are we referring to musical, athletic, and interpersonal skills as <em>intelligences</em>? Gardner was certainly not the first psychologist to point out that humans have these abilities. Great intelligence researchers&#8211;Cyril Burt, Raymond Cattell, Louis Thurstone&#8211;discussed many human abilities, including aesthetic, athletic, musical, and so on. The difference was that they called them talents or abilities, whereas Gardner has renamed them intelligences. Gardner has pointed out on several occasions that the success of his book turned, in part, on this new label: &#8220;I am quite confident that if I had written a book called &#8216;Seven Talents&#8217; it would not have received the attention that <em>Frames of Mind </em>received.&#8221; Educators who embraced the theory might well have been indifferent to a theory outlining different talents&#8211;who didn&#8217;t know that some kids are good musicians, some are good athletes, and they may not be the same kids?</p>
<p>Gardner protests that there is no reason to differentiate&#8211;he would say aggrandize&#8211;linguistic and logico-mathematical intelligences by giving them a different label; either label will do, but they should be the same. He has written, &#8220;Call them all &#8216;talents&#8217; if you wish; or call them all &#8216;intelligences.&#8217;&#8221; By this Gardner means that the mind has many processing capabilities, of which those enabling linguistic, logical, and mathematical thought are just three examples. There is no compelling reason to &#8220;honor&#8221; them with a special name, in his view.</p>
<p>Gardner has ignored, however, the connotation of the term <em>intelligence</em>, which has led to confusion among his readers. The term <em>intelligence</em> has always connoted the kind of thinking skills that make one successful in school, perhaps because the first intelligence test was devised to predict likely success in school; if it was important in school, it was on the intelligence test. Readers made the natural assumption that Gardner&#8217;s new intelligences had roughly the same meaning and so drew the conclusion that if humans have a type of intelligence, then schools should teach it.</p>
<p>It is also understandable that readers believed that some of the intelligences must be at least partially interchangeable. No one would think that the musically talented child would necessarily be good at math. But refer to the child as possessing &#8220;high musical intelligence,&#8221; and it&#8217;s a short step to the upbeat idea that the mathematics deficit can be circumvented by the intelligence in another area&#8211;after all, both are intelligences.</p>
<p>In the end, Gardner&#8217;s theory is simply not all that helpful. For scientists, the theory of the mind is almost certainly incorrect. For educators, the daring applications forwarded by others in Gardner&#8217;s name (and of which he apparently disapproves) are unlikely to help students. Gardner&#8217;s applications are relatively uncontroversial, although hard data on their effects are lacking. The fact that the theory is an inaccurate description of the mind makes it likely that the more closely an application draws on the theory, the less likely the application is to be effective. All in all, educators would likely do well to turn their time and attention elsewhere.</p>
<p><em>-Daniel T. Willingham is a professor of psychology at the University of Virginia.</em></p>
<p><img src="../images/shim.gif" border="0" alt="" width="500" height="1" /></p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=3261311&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/reframing-the-mind/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Gray Lady Wheezing</title>
		<link>http://educationnext.org/grayladywheezing/</link>
		<comments>http://educationnext.org/grayladywheezing/#comments</comments>
		<pubDate>Fri, 30 Jun 2006 17:05:48 +0000</pubDate>
		<dc:creator>William Howell</dc:creator>
				<category><![CDATA[Check the Facts]]></category>

		<guid isPermaLink="false">http://content.hks.harvard.edu/educationnext/?p=3258826</guid>
		<description><![CDATA[The AFT hoodwinks the Times ]]></description>
			<content:encoded><![CDATA[<p><em>F. Howard Nelson, Bella Rosenberg, and Nancy Van Meter, &#8220;Charter School Achievement on the 2003 National Assessment of Educational Progress,&#8221; American Federation of Teachers, August 2004</em></p>
<p><em>Diana Jean Schemo, &#8220;Nation&#8217;s Charter Schools Lagging Behind, U.S. Test Scores Reveal,&#8221; </em>New York Times<em>, August 17, 2004, page A1</em></p>
<p>It is not unusual for interest groups to issue reports that further their own political agendas&#8211;and to muddle the facts in the process. For this reason, newspapers generally ignore them, treat them with great skepticism, or make sure they properly vet the research with independent observers.</p>
<p>Not so in the case of the study of charter schools leaked by the American Federation of Teachers (AFT) to the <em>New York Times</em>, which then placed it in the right-hand column of the front page of its August 17 edition&#8211;a slot typically reserved for the day&#8217;s biggest story. Headlined &#8220;Nation&#8217;s Charter Schools Lagging Behind, U.S. Test Scores Reveal,&#8221; the story sent shock waves through the charter school movement and left more than a few education reformers scrambling for cover.</p>
<p>Using the data tool on the National Center for Education Statistics website,  the authors of the AFT study called up some basic numbers on the performance of students from a nationally representative sample of charter schools. Their conclusion: &#8220;Charter schools are underperforming.&#8221; Their evidence: data from the National Assessment of Educational Progress (NAEP), often called the nation&#8217;s report card, showing students in charter schools doing less well than students in other public schools nationally, as well as in a small number of more focused comparisons.</p>
<p>The <em>Times</em> had a field day with the news. The AFT&#8217;s findings, the paper reported, &#8220;dealt a blow to supporters of the charter school movement, including the Bush administration&#8221;&#8211;a blow made all the more powerful (and credible) by the fact that the AFT had &#8220;historically supported charter schools.&#8221; Amy Stuart Wells, a sociology professor at Columbia University Teachers College, was quoted as saying the data were &#8220;really, really important&#8221; as they &#8220;confirm what a lot of people who study charter schools have been worried about.&#8221; Would that it were that simple.</p>
<p>Where do we begin to sort out the outlandish claims of the AFT study&#8211;and those made by others on its behalf? For starters, saying the AFT has historically supported charter schools is like saying that the Chicago Cubs are historically a World Series champion baseball team. While technically true (legendary AFT president Albert Shanker helped introduce the concept in a 1988 speech to the National Press Club), the union&#8217;s position on the issue has changed so markedly that it is now one of the staunchest opponents of charter schools around the nation. In recent years the AFT has criticized charter schools in a series of reports, of which August&#8217;s was only the latest and best publicized.</p>
<p>But hardly the most sophisticated. Indeed, on a methodological level, the AFT analyses are sufficiently pedestrian to be laughable. And most mainstream newspapers around the country&#8211;once the <em>Times</em> had made it the story of the hour&#8211;had the good sense to present a more critical view of the study&#8217;s import. In the title and lead paragraph of its coverage, <em>USA Today</em> noted that &#8220;achievement [is] not so simply measured&#8221; and that critics had already pointed out that &#8220;the report is hardly a fair look at whether charter schools help kids improve.&#8221; The <em>Seattle Times</em> quoted University of Washington researcher Mary Beth Celio&#8217;s dismissal of the study as &#8220;one of the most unsophisticated, low-level analyses I&#8217;ve ever seen.&#8221; The editorial board at the <em>Chicago Tribune</em> went further, deeming the AFT findings &#8220;about as new as a lava lamp, as revelatory as an old sock, and as significant as a belch.&#8221;</p>
<p class="tocheading"><strong>A Flawed Report</strong></p>
<p>What&#8217;s wrong with the study? The basic problem is straightforward: raw comparisons showing charter school students scoring lower than public school students on standardized tests may simply reflect the fact that charter schools serve students in low-performing districts with high concentrations of poor and minority children. Many states allow charter schools to form only where students are having difficulties, and many charter schools are then asked to accept the most challenging of students. Any credible analysis of their effectiveness must account for these facts on the ground.</p>
<p>Indeed, if the AFT believes its own findings, it must also concede that private religious schools outperform public schools (see Figure 1). According to the same NAEP data that are the basis for the new AFT study, religious private schools outperformed the public schools nationwide by between 9 and 17 points, a gap at least as large as the public school-charter school difference that the AFT&#8211;with considerable help from the <em>Times</em>&#8211;is trumpeting. On past occasions, the AFT has objected vehemently to interpreting such findings as evidence that religious schools are superior on the grounds that they attract an especially able group of students. But for charter schools, it seems, the problem of selection effects need only be addressed in the most superficial of ways.</p>
<p align="center"><img style="border: 0pt none;margin-left: 95px;margin-right: 95px" src="http://educationnext.org/files/ednext20051_74fig1.gif" border="0" alt="" width="499" height="446" /></p>
<p>The authors&#8217; sole strategy to &#8220;enhance the fairness of the analysis&#8221; was to look separately at students in 14 categories, including those from six different states, those who qualified for the federal free-lunch program (and those who didn&#8217;t), those from different ethnic backgrounds, and those living inside and outside a central city. As a strategy to control for the background characteristics that differentiate students in charter and traditional public schools, this approach is feeble. At best it can eliminate the effects of differences with respect to one background characteristic at a time. But it may not even be effective for that purpose if, for instance, the students eligible for a free lunch who attend charter schools come from even poorer families than eligible students in traditional public schools.</p>
<p>Even so, in most of the comparisons holding just one characteristic constant, the performance differences between charter and traditional public school students attenuate to the point of statistical insignificance. Twenty-one of the 28 comparisons the AFT conducted using 4th-grade average scale scores are statistically insignificant. As previous research has found that ethnic differences in achievement are large, it is especially noteworthy that all comparisons within ethnic groups in the NAEP charter school data cut against the AFT&#8217;s overall conclusions. The small differences that remain when looking separately at white, African-American, or Hispanic children are all statistically insignificant&#8211;a fact that is not apparent in either the <em>Times</em> story&#8217;s text or the tables that accompanied it.</p>
<p>But do any of these comparisons&#8211;within ethnic groups or otherwise&#8211;tell us anything meaningful about the quality of traditional public, charter, or religious private schools? Not a bit.</p>
<p>Plainly, to account adequately for the influences of a child&#8217;s family, home environment, and community on his or her learning capacity, one must do much more than look separately at students grouped by free-lunch status, ethnicity, or school location. At a minimum, it is essential to gather detailed data on students&#8217; background characteristics and to put them to good use. Control variables now standard in education research include parents&#8217; education and marital status, household income, and the quality of learning resources in the home, to name but a few. And rather than using aggregate comparisons within subgroups to eliminate the effects of differences in one background characteristic at a time, as the AFT has done, the influence of all of these factors must be addressed simultaneously.</p>
<p>But all this may just scratch the surface. As schools of choice, charters are likely to attract students who are not doing well in their traditional public schools. Moreover, many charter schools explicitly target &#8220;at-risk&#8221; students. Both of these facts would lead you to expect students in charter schools to perform at a low level even after taking into account their observable back.ground characteristics.</p>
<p>Ideally, one would therefore study charter schools in the context of a randomized field trial, assigning students randomly to attend either a charter or a traditional public school, gathering data on their performance at baseline, and tracking their progress over time. In the absence of that possibility, it is vital to use data from multiple years to track the learning trajectory of students in both charter and traditional public schools.</p>
<p>Yet another critical flaw in the AFT&#8217;s analysis is its failure to account for the length of time that a charter school has been in place&#8211;a factor known to affect any school&#8217;s performance. Having just hired new staff and teachers, implemented new curricula, and acquired building facilities, new schools often face considerable start-up problems. Almost one-third of the charter schools nationwide were less than two years old when the 2003 NAEP was administered, raising doubts about whether even meaningful findings about charter school performance would apply when more of them are well established.</p>
<p>Encouragingly, research on charter schools using more reliable methods to gauge school quality is under way. Nonetheless, it will be some time before definitive conclusions about the merits of one of the nation&#8217;s most prominent, and popular, reform strategies can be drawn. In the meantime, the AFT&#8217;s study does not even amount to a good interim report.</p>
<p class="tocheading"><strong>Why All the Fuss?</strong></p>
<p>Given all of these problems, why would the <em>Times</em> see fit to bestow instant credibility on the AFT study by granting it glowing, page-one coverage? While we have no special insight into the motives of the newspaper&#8217;s editorial staff, the coverage itself suggests two factors that are important.</p>
<p>The first concerns alleged chicanery by the U.S. Department of Education, which, reported the <em>Times</em>, had buried the flawed charter school findings in &#8220;mountains of data . . . released without public announcement.&#8221; According to the authors of the AFT study, &#8220;a combination of intuition, prior knowledge, considerable digging, and luck&#8221; was required just to locate the data. Such sleuthing makes for dramatic storytelling&#8211;for the next best thing to doing it oneself, in the newspaper business, is reporting (exclusively, one hopes, so you can break the news) on someone else&#8217;s discovery of a cover-up.</p>
<p>As Bella Rosenberg, one of the report&#8217;s three authors, explained to the press, &#8220;Analyses are always welcome, but first things first. . . . Surely the interests of children are better served by timely and straightforward information about whether charter school performance measures up to the claims made for it.&#8221; In a letter to the <em>Times</em>, educational psychologist Howard Gardner praised the AFT for its act of public service in issuing the study and then asserted that the Department of Education&#8217;s decision not to highlight the findings was ideologically driven: &#8220;If the results had been positive, the Education Department would doubtless have heralded them. Across the policy spectrum, the pattern of the administration is all too clear: Call for evidence-based results, tout them when supportive, hide them when not, spin them when possible.&#8221;</p>
<p>Perhaps. But we draw a slightly different conclusion. Timeliness and transparency are important, but bad information is worse than none. And uncovering misleading information and presenting it out of context does a greater disservice to the &#8220;interests of children&#8221; than the Department of Education&#8217;s decision not to issue a report that does not control for student background characteristics. From this perspective, the AFT study and the <em>Times</em>&#8216;s breathless coverage of it only made a bad situation worse.</p>
<p>The second probable reason for the prominent attention the <em>Times</em> gave the study stems from the fact that charter schools represent one of several remedies for schools deemed chronically failing under George W. Bush&#8217;s No Child Left Behind Act. (Other remedies include replacing much of the school&#8217;s staff or turning its operations over to the state or to a private company.) Thus the story&#8217;s import was magnified by the politics of education reform: it suggested a flaw in the Bush administration&#8217;s game plan. The very next day, the lead <em>Times</em> editorial heralded the report as &#8220;a devastating setback&#8221; to the Bush administration&#8217;s education program.</p>
<p>Ironically, however, it is not at all clear that political cleavages over charter schools follow strictly partisan lines. Indeed, federal financial support for the charter school movement has its origins in the Clinton era. Democratic presidential candidate John Kerry was an enthusiastic supporter of charter schools. And while Secretary of Education Rod Paige was a vocal proponent of charter schools, President Bush said hardly a word about charters on the campaign trail&#8211;nor, for that matter, did he say much about them from the White House.</p>
<p class="tocheading"><strong>What the NAEP Data Do Tell Us</strong></p>
<p>While the statistics on the nation&#8217;s charter schools currently available from the NAEP are not at all useful for assessing these schools&#8217; effectiveness, they do offer, for the first time, a glimpse of the makeup of a nationally representative sample of the students who attend them. As a result, one important fact about charter schools now appears incontrovertible: they are not bastions of wealth and privilege.</p>
<p>As Figure 2 shows, almost 62 percent of the roughly 3,000 4th graders in the NAEP charter school sample attend a school located in a central city, compared with just 32 percent of NAEP 4th graders in traditional public schools. Roughly 33 percent of the charter school students are African-American, compared with only 18 percent of the public school students. Fifty-four percent of elementary charter school students qualify for free or reduced-price lunch programs, compared with 46 percent of public school students. The analogous differences for the 8th graders tested by the NAEP are even more pronounced, perhaps reflecting the fact that a large number of middle and high school charters target at-risk students.</p>
<p align="center"><img src="http://educationnext.org/files/ednext20051_74fig2.gif" border="0" alt="" width="699" height="402" /></p>
<p>Given the conditions under which states and districts accept charter schools, the language of their mandates, and the characteristics of families most eager for alternatives to traditional public schools, these differences can hardly come as a surprise. For the foreseeable future, charter schools are likely to serve high concentrations of poor and underprivileged students. What remains unclear is how much they can do for this population. Sadly&#8211;and despite the impression given by the gray lady of American journalism&#8211;the AFT study tells us nothing about that.</p>
<p><em>William G. Howell is an assistant professor of government at Harvard University. Martin R. West is a research fellow at the Program on Education Policy and Governance at Harvard University and the research editor of </em>Education Next<em>.</em></p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=3258826&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/grayladywheezing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>No Distortion Left Behind</title>
		<link>http://educationnext.org/nodistortionleftbehind/</link>
		<comments>http://educationnext.org/nodistortionleftbehind/#comments</comments>
		<pubDate>Fri, 30 Jun 2006 16:50:20 +0000</pubDate>
		<dc:creator>Andy Rotherham</dc:creator>
				<category><![CDATA[Check the Facts]]></category>

		<guid isPermaLink="false">http://content.hks.harvard.edu/educationnext/?p=3258736</guid>
		<description><![CDATA[The New York Times education columnist gets it wrong ]]></description>
			<content:encoded><![CDATA[<p>Let&#8217;s stipulate that the No Child Left Behind Act (NCLB), the federal education law signed by President Bush in January of 2002, is a complicated piece of legislation. The law&#8217;s official conference report runs to 1,080 pages and covers a host of issues, many not even related to the law&#8217;s central thrust. But let&#8217;s also stipulate that many, many other laws-from taxes to environmental regulation-are no less challenging to understand and interpret, which is why journalists at the nation&#8217;s best news outlets often have areas of specific expertise.</p>
<p>So, is it asking too much to expect those in the media charged with writing about education and NCLB to make some effort to describe them accurately? And shouldn&#8217;t we expect one of the nation&#8217;s most visible and influential education journalists to get it right?</p>
<p>I&#8217;m sympathetic to the myriad challenges that journalists face, but NCLB&#8217;s heft and the political battles around it are no excuse for someone like Michael Winerip, who writes the weekly &#8220;On Education&#8221; column for the <em>New York Times</em> (he is currently on a sabbatical to write a book), to distort the law into a vague semblance of reality. Take his September 24, 2003, column, for instance. Under the headline &#8220;On Front Lines, Casualties Tied to New U.S. Law,&#8221; Winerip reported that NCLB funding shortfalls were &#8220;devastating&#8221; for New York City. But he neglected to mention that the city had received more than $260 million in new dollars for poor students <em>alone</em> under NCLB in the previous two years (see &#8220;<a href="http://educationnext.org/who-got-the-raw-deal-in-gotham/">Who Got the Raw Deal in Gotham</a>?&#8221; page 72).</p>
<p>It is especially problematic when the distortion is in the nation&#8217;s putative &#8220;newspaper of record.&#8221; As former <em>Los Angeles Times </em>education writer Richard Colvin, who now heads the Hechinger Institute on Education and the Media, says, Winerip&#8217;s &#8220;On Education&#8221; is &#8220;agenda setting&#8221; because of its influence on policymakers: &#8220;Editors across the country read what&#8217;s in that column and it informs their decisions.&#8221;</p>
<p>Winerip&#8217;s misleading writing about NCLB is particularly surprising because he has produced an impressive body of important journalism in a career of more than 20 years at the <em>Times</em>, reporting on everything from city politics in the 1980s to vital national issues in the 1990s. In 2000, Winerip was part of a <em>Times</em> team that produced a Pulitzer Prize-winning series on race in America. Many of his education columns are outstanding because of his clear eye for important subjects, including those that go unnoticed. For instance, he is one of the few reporters writing about the difficulties gay students face in school. And he can turn out delightfully quirky and interesting profiles, like his April 21, 2004, column about an 81-year-old woman returning to law school alongside hypercompetitive 20-somethings.</p>
<p class="tocheading"><strong>Obsessed with NCLB</strong></p>
<p>But the very skills that produced this career of important work are missing in his writing about NCLB, which, needless to say, is an enormously important issue in education right now. In fact, since his first &#8220;On Education&#8221; column (January 8, 2003) to his last before going on sabbatical (May 26, 2004), Winerip devoted 15 columns-23 percent of his total-to NCLB, which he opined (October 1, 2003), &#8220;may go down in history as the most unpopular piece of education legislation ever created.&#8221;</p>
<p>It&#8217;s hard to know why Winerip became a knee-jerk NCLB-basher. But it&#8217;s not hard to see that he is one; nor need one be a partisan of the Bush administration (which I certainly am not) to grasp that his stance is costing the <em>Times</em> a chance to engage constructively in the debate about the law. There <em>are</em> problems with NCLB that scream for thoughtful explanation, rigorous attention, and public concern. This is not surprising with a federal law as complicated as NCLB. Any writer seeking to focus on it enjoys what the military would describe as a &#8220;target rich environment&#8221; and could make a genuine contribution to the debate on school improvement by engaging in a discussion of those liabilities. Winerip, however, has managed to avoid almost completely an accurate description of the act&#8217;s most important issues.</p>
<p>A good example is a February 19, 2003, column about Gonzales Elementary School in Tolleson, Arizona, a school that Winerip implies had been designated, because of NCLB, as &#8220;underperforming.&#8221; The <em>Times</em> columnist described a Kafkaesque series of hoops the school&#8217;s principal had to jump through because of the designation, including devising a detailed school improvement plan. But Winerip conflated the state&#8217;s accountability system and NCLB. The school would have received the same designation and been required to take the same steps in the absence of NCLB, a fact that Winerip omitted, while writing, &#8220;Unfortunately, last year the 5th grade did not make adequate yearly progress on the state competency exams. And that&#8217;s all it takes under the great new federal law.&#8221; Incidentally, whatever steps the school took seem to have worked. It did fine the following year under both Arizona&#8217;s accountability system and No Child Left Behind.</p>
<p class="tocheading"><strong>Are the Feds to Blame?</strong></p>
<p>Routinely rushing to lay blame for local administrative upset at the feet of the feds, Winerip frequently fails to differentiate between state and federal requirements, much less offer readers information to understand why states differ from one another in their policies. For instance, in an April 28, 2004, column, Winerip described a school in Florida as unfairly penalized by NCLB, but he failed to mention that the school reported low overall test scores and had significant achievement gaps between white and minority students. He tartly noted that Texas (was the Lone Star state chosen to make a political point?) operates under different rules than Florida. But he failed to mention that Texas officials simply designed a different accountability plan, and that Florida could have done the same thing.</p>
<p>Likewise, in a September 3, 2003, column examining the differences between state and federal accountability systems, Winerip looked at North Carolina, where, he said, some schools that were doing just fine under the state&#8217;s previous accountability system were now being flagged as needing improvement under NCLB. He cited this as evidence of the folly of NCLB. What <em>Times</em> readers were not told, however, was that before NCLB, North Carolina, like almost every state, did not hold schools accountable for the performance of various subgroups, like minorities and special-needs students. Thus schools with good overall scores often had distinct groups of students that lagged far behind-a glaring inequity that the new federal initiative was explicitly designed to detect, but information Winerip conveniently left out of his story.</p>
<p>Winerip returned to North Carolina in his October 8, 2003, column, this time to bemoan the arbitrary demographic subgroup size of 40 (the minimum number of students that must be in the subgroup for the school to be held accountable for that group). But he failed to tell readers that it was the state, not the feds, that chose that number, that many states have even smaller subgroup sizes (and some larger, too), or that many advocates for poor, minority, and disabled students want small subgroup sizes in order to ensure that students are not lost in overall averages. Instead, without any context, Winerip readers are again left to conclude, on the columnist&#8217;s authority, that there is something arbitrary, if not downright crooked, about all these federal rules.</p>
<p class="tocheading"><strong>Missing the Point</strong></p>
<p>Moreover, the inequity teased out by the subgroup rule is the core difference between NCLB and the 1994 reauthorization of the Elementary and Secondary Education Act. Understanding this is integral to making sense of the apparent conflict between state and federal systems. But Winerip referred only in passing to a &#8220;totally different federal formula.&#8221; Regrettably, through all the NCLB trashing, Winerip never described these issues so that lay readers, almost certainly unfamiliar with this context, could at least understand the current environment let alone learn about the complexity of the law and the challenges of designing policy for our decentralized system of schooling.</p>
<p>It&#8217;s not that Winerip doesn&#8217;t understand this context himself. Two of his most interesting columns focus on the challenges of the racial achievement gap. Yet inexplicably he never makes the leap from the problems he eloquently describes to NLCB&#8217;s intent, or potential, to help ferret out and rectify those same racial inequities. Similarly, No Child Left Behind&#8217;s left-leaning supporters like the Education Trust, Citizens&#8217; Commission on Civil Rights, and Council of the Great City Schools, which are all concerned about the achievement gap, do not enter the Winerip conversation either.</p>
<p>Context is also lacking in his September 3 column, where he noted, &#8220;The federal system uses a single yearly proficiency goal-for North Carolina, 68 percent of students reading on grade level this year-and requires all schools to make that number.&#8221; In fact, the &#8220;safe harbor&#8221; provisions in NCLB mean that all schools do not have to meet fixed targets across the board each year, but only make some improvement in order to make adequate yearly progress. Certainly, provisions like this are somewhat arcane. But a reporter with Winerip&#8217;s skills could surely make them understandable for readers.</p>
<p>Besides, North Carolina&#8217;s relatively high bar is the result of a now somewhat problematic NCLB provision known as the &#8220;20 percent rule,&#8221; which was inserted into the law&#8217;s &#8220;adequate yearly progress&#8221; provisions. The concern was that some states would be starting with such low percentages of minority students at grade level that just requiring that as a starting point would subject the law to ridicule for having embarrassingly low standards. However, to ensure rigor, the alternative formula figured a state&#8217;s starting point as a function of 20 percent of the state&#8217;s grade-level population of students. In practice the provision has allowed some laggard states, at least initially, to get off easier than states that have been doing the right thing. Thus, paradoxically, many states that have been working to improve their school systems have more schools identified as failing to make adequate yearly progress under NCLB than trailing states. Though this is a complicated issue, it is a classic example of the trade-offs that federal policymakers routinely face and one that a thoughtful education writer with a national platform seemingly could explore for readers.</p>
<p class="tocheading"><strong>No Room for Subtlety</strong></p>
<p>But, unfortunately, this sort of nuance finds no home in Winerip&#8217;s NCLB writing. He criticizes the federal law for basing school accountability on a single year&#8217;s test scores and holding schools accountable for the performance of transient students. In fact, NCLB does not require states to base school accountability on a single year&#8217;s test scores, but instead allows scores to be averaged over multiple years and permits states to use various statistical tools to help ensure the validity of those numbers. Nor does the law hold schools accountable for recent transfers. Similarly, Winerip wrote several times about testing disabled students, blaming requirements about assessments for special-needs students on the &#8220;Washington Brain Trust.&#8221; Yet, as mentioned earlier, he neglects to point out that many groups representing disabled students want these students included in state accountability systems. He also fails to share with <em>Times</em> readers any perspective on why it can be important to do so to help ensure that these students receive a quality education. Moreover, while criticizing NCLB for requiring new assessments for profoundly retarded students, Winerip makes little effort to explain what alternative assessments for such students would entail, to point out that there was ongoing regulatory debate about how best to design the federal policy in this area, or even to note that these students are a small fraction of the overall special-education population (most of whom benefit from access to mainstream standards).</p>
<p>Through all this, Winerip himself doesn&#8217;t do what policymakers must do every day-offer solutions to, or ideas about, vexing problems. Aren&#8217;t any NCLB problems being solved somewhere? What are the alternatives? What&#8217;s the most creative and serious thinking about addressing equity problems through means other than NCLB? Instead, it&#8217;s all griping and disparagement. Granted, he&#8217;s a columnist, not a policymaker, but many columnists regularly offer policy ideas, particularly in the midst of continuous slashing criticisms.</p>
<p>Whether bias matters in a news analysis column like this is debatable and, obviously, for the <em>Times</em> to decide. Hechinger&#8217;s Colvin argues that while news analysis does not require the same balance as a news article and can have a viewpoint, &#8220;that perspective needs to be backed up with reporting and context.&#8221; Thomas Toch, a longtime education journalist now working as writer-in-residence at the National Center on Education and the Economy, goes further. Toch notes the importance of the venue, but says Winerip should &#8220;be given a lot of leeway to interpret the education landscape as he sees it, to exert his voice in the debate.&#8221; Either way, there is a difference between viewpoint or voice and a relentlessly misleading presentation.</p>
<p>It&#8217;s important to note that the problem is not editorializing in the news coverage. In fact, the <em>Times&#8217;</em>s editorial position is largely supportive of NCLB. Instead, the problem with Winerip&#8217;s NCLB columns is that they often turn on incomplete, outlandishly selective, and even inaccurate presentations of the facts.</p>
<p>In the end it is beside the point to parse the motives of Winerip&#8217;s anti-NCLB mania. There are plenty of excellent education journalists out there who could offer readers less slash and burn and more nuance and context about a debate as important as this. In fact, Samuel Freedman, who has been writing &#8220;On Education&#8221; during Winerip&#8217;s book leave, is a terrific example. His writing is varied and thought-provoking, but also apparently without ideological blind spots and tendentious selective presentation. In other words, he brings to the page everything an important column like this should regularly deliver.</p>
<p><em>-Andrew J. Rotherham is director of education policy at the Progressive Policy Institute in Washington, D.C., and editor of www.eduwonk.com.</em></p>
<img src="http://educationnext.org/?ak_action=api_record_view&id=3258736&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://educationnext.org/nodistortionleftbehind/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
