There’s been a bit of a kerfuffle recently in the edu-Twittersphere, since Marc Tucker suggested that civil rights leaders ought to reconsider their support for annual testing . Kati Haycock and Jonah Edelman wrote impassioned responses, which Tucker has just dismissed as not responding to his substantive arguments. He ends with this paragraph:
The facts ought to count for something. What both of these critiques come down to is an assertion that I don’t have any business urging established leaders of the civil rights community to reconsider the issue, that I simply don’t understand the obvious—that annual accountability testing is essential to justice for poor and minority students, that anyone who thinks otherwise must be in the pocket of the teachers unions. Well, it is not obvious. Indeed, all the evidence says it is not true. And anyone who knows me knows that I am in no one’s pocket. I know the leaders of the civil rights community to be people of great integrity. They aren’t in anyone’s pocket, either. I think they want what is best for the people they represent. And I do not think that is annual testing.
I think Mr. Tucker greatly overstates the evidence in his initial post, so I’m going to do my best to give a very brief and direct response to the substantive arguments he makes there. I do this not to defend Haycock and Edelman (whom I do not really know), but to defend the policy, which I believe is unfairly maligned in Tucker’s posts.
Let me start by saying that I am generally in favor of annual testing, though I am probably not as fervid in that support as some others in the “reform” camp. I do not believe that annual accountability testing is essential to justice for poor and minority students, but I do think high-quality tests at reasonable intervals would almost certainly be beneficial to them.
Okay, here goes.
1) In his initial post, Marc Tucker says,
First of all, the data show that, although the performance of poor and minority students improved after passage of the No Child Left Behind Act, it was actually improving at a faster rate before the passage of the No Child Left Behind Act.
That link is to a NAEP report that indeed provides descriptive evidence supporting Tucker’s point. However, there are at least two peer-reviewed articles using NAEP data that show positive causal impacts of NCLB using high-quality quasi-experimental design, one on fourth grade math achievement only and the other on fourth and eighth grade math achievement and (suggestively) fourth grade reading. The latter is, to my eye, the most rigorous analysis that yet exists on this topic. There is a third article that uses cross-state NAEP data and does not find an impact, but again the most recent analysis by Wong seems to me to be the most methodologically sophisticated of the lot and, therefore, the most trustworthy. I think if Tucker wants to talk NAEP data, he has to provide evidence of this quality that supports his position of “no effect” (or even “harm,” as he appears to be suggesting). Is there a quality analysis using a strong design that shows a negative impact on the slope of achievement gains caused by NCLB? I do not know of one.
I should also note that there are beaucoup within-state studies of the impacts of accountability policies that use regression discontinuity designs and find causal impacts. For instance: in North Carolina, in Florida, and in Wisconsin. In short: I don’t see any way to read the causal literature on school accountability and conclude that it has negative impacts on student achievement. I don’t even see any way to conclude it has neutral impacts, given the large number of studies finding positive impacts relative to those with strong designs that find no impacts.
2) Next, Tucker says:
Over the 15-year history of the No Child Left Behind Act, there is no data to show that it contributed to improved student performance for poor and minority students at the high school level, which is where it counts.
Here I think Mark is moving the goalposts a bit. Is high school performance of poor and minority students the target? Then I guess we may as well throw out all the above-cited studies. I know of no causal studies that directly investigate the impact on this particular outcome, so I think the best he’s got is the NAEP trends. And sure, trends in high school performance are relatively flat.
I’m not one to engage in misNAEPery, however, so I wouldn’t make too much of this. Nor would I make too much of the fact that high school graduation rates have increased for all groups (meaning tons more low-performing students who in days gone by would have dropped out are still around to take the NAEP in 12th grade, among other things). But I would make quite a bit of the fact that the above-cited causal studies obviously also apply to historically underserved groups (that is, while they rarely directly test the impact of accountability on achievement gaps, they very often test the impacts for different groups and find that all groups see the positive effects). And I would also note some evidence from North Carolina of direct narrowing effects on black-white gaps.
3) Next, we have:
Many nations that have no annual accountability testing requirements have higher average performance for poor and minority students and smaller gaps between their performance and the performance of majority students than we do here in the United States. How can annual testing be a civil right if that is so?
There’s not much to say about this. It’s not based on any study I know of, certainly none that would suggest a causal impact one way or the other. But he’s right that we’re relatively alone in our use of annual testing, and therefore that many higher-achieving nations don’t have annual testing. They also don’t have many other policies that we have, so I’m not sure what’s to be learned from this observation.
4) Now he moves on to claim:
It is not just that annual accountability testing with separate scores for poor and minority students does not help those students. The reality is that it actually hurts them. All that testing forces schools to buy cheap tests, because they have to administer so many of them. Cheap tests measure low-level basic skills, not the kind of high-level, complex skills most employers are looking for these days. Though students in wealthy communities are forced to take these tests, no one in those communities pays much attention to them. They expect much more from their students. It is the schools serving poor and minority students that feed the students an endless diet of drill and practice keyed to these low-level tests. The teachers are feeding these kids a dumbed down curriculum to match the dumbed down tests, a dumbed down curriculum the kids in the wealthier communities do not get.
This paragraph doesn’t have links, probably because it’s not well supported by the existing evidence. Certainly you hear this argument all the time, and I believe it may well be true that schools serving poor kids have worse curricula or more perverse responses to tests (even some of my own work suggests different kinds of instructional responses in different kinds of schools). But even if we grant that this impact is real, the literature on achievement effects certainly does not suggest harm. And the fact that graduation rates are skyrocketing certainly does not suggest harm. If he’s going to claim harm, he has to provide clear, compelling evidence of harm. This ain’t it. And finally here, a small point. I hate when people say schools are “forced” to do anything. States, districts, and schools were not forced to buy bad tests before. They have priorities, and they have prioritized cheap and fast. That’s a choice, not a matter of force.
5) Next, Tucker claims:
Second, the teachers in the schools serving mainly poor and minority kids have figured out that, from an accountability standpoint, it does them no good to focus on the kids who are likely to pass the tests, because the school will get no credit for it. At the same time, it does them no good to focus on the kids who are not likely to pass no matter what the teacher does, because the school will get no credit for that either. As a result, the faculty has a big incentive to focus mainly on the kids who are just below the pass point, leaving the others to twist in the wind.
I am certainly familiar with the literature cited here, and I don’t dispute any of it. Quite the contrary, I acknowledge the conclusion that the students who are targeted by the accountability system see the greatest gains. This has been shown in many well-designed studies, such as here, here, here, and here. But this an argument about accountability policy design, not about annual testing. It simply speaks to the need for better accountability policies. For instance, suppose we thought the “bubble kids” problem was a bad one that needed solving. We could solve it tomorrow–simply create a system where all that matters is growth. Voila, no bubble kids! Of course there would be tradeoffs to that decision, so probably some mixture is better.
6) Then Tucker moves on to discuss the teaching force:
Not only is it true that annual accountability testing does not improve the performance of poor and minority students, as I just explained, but it is also true that annual accountability testing is making a major contribution to the destruction of the quality of our teaching force.
There’s no evidence for this. I know of not a single study that suggests that there is even a descriptive decrease in the quality of our teaching force in recent years. Certainly not one with a causal design of any kind that implicates annual accountability testing. And there is recent evidence that suggests improvements in the quality of the workforce, at least in certain areas such as New York and Washington.
7) Next, he takes on the distribution of teacher quality:
One of the most important features of these accountability systems is that they operate in such a way as to make teachers of poor and minority students most vulnerable. And the result of that is that more and more capable teachers are much less likely to teach in schools serving poor and minority students.
It is absolutely true that the lowest quality teachers are disproportionately likely to serve the most disadvantaged students. But I know of not a single piece of evidence that this is caused by (or even made worse by) annual testing and accountability policies. My hunch is that this has always been true, but that’s just a hunch. If Tucker has evidence, he should provide it.
8) The final point is one that hits close to home:
Applications to our schools of education are plummeting and deans of education are reporting that one of the reasons is that high school graduates who have alternatives are not selecting teaching because it looks like a battleground, a battleground created by the heavy-handed accountability systems promoted by the U.S. Department of Education and sustained by annual accountability testing.
As someone employed at a school of education, I can say the first clause here is completely true. And we’re quite worried about it. But again, I know of not a single piece of even descriptive evidence that suggests this is due to annual accountability testing. Annual accountability testing has been around for well over a decade. Why would the impact be happening right now?
I think these are the main arguments in Tucker’s piece, and I have provided evidence or argumentation here that suggests that not one of them is supported by the best academic research that exists today. Perhaps the strongest argument of the eight is the second one, but again I know of no quality research that attributes our relative stagnation on 12th grade NAEP to annual accountability testing. That does not mean Tucker is wrong. But it does mean that he is the one who should bear the burden of providing evidence to support his positions, not Haycock and Edelman. I don’t believe he can produce such evidence, because I don’t believe it exists.
 I think it’s almost universally a bad idea to tell civil rights leaders what to do.
9 thoughts on “A (quick, direct, 2000 word) response to Tucker on testing”
I mentioned on Twitter that I think qualitative data would add nuance to this conversation. I was particularly thinking in terms of #4. I’m not sure about Tucker’s “cheap tests” argument specifically, but the broader concern about the lowering of expectations in low-performing schools serving mostly poor kids of color has been well substantiated. (Granted, Tucker did not provide links). Perhaps saying schools are “forced” to do this is not the best phrasing, but I think saying they have “choice” may be an oversimplification. Low-performing schools have incentives to avoid sanctions (e.g., closure) for the students’ sake; qualitative studies suggest that most ground-level stakeholders do not perceive they have much choice. From the perspective of many educators in struggling schools, raising test scores– whether that translates to buying bad tests or subjecting students to low-level test prep– is the only option.
Qualitative and mixed methods research highlights ways that these tensions manifest in struggling schools and the negative implications for poor kids of color in particular. Scholars like Diamond, Spillane, Darling Hammond, Valenzuela and others have raised equity concerns associated with high-stakes testing, often related to differential instructional quality and opportunities to learn. http://www.tcrecord.org.libproxy1.usc.edu/library/content.asp?contentid=11569
Click to access 0013124511431569.full.pdf
There’s also some evidence that college-going needs are sacrificed, particularly for low-income Latina/o students:
These are potential challenges that are not captured in the achievement effects literature because they are the unintended consequences of schools’ efforts to increase scores. So I think considering qualitative data is important to make sense of what is going on equity-wise. This is not to say that we shouldn’t collect yearly assessment data by any means– just that what we mean by “harm” when we talk about low-income students of color in the current policy context is not entirely clear-cut. As you mentioned in your blog with respect to the bubble kids issue, many of these problems are products not of testing per se but rather accountability policy designs. People like Tucker oppose testing because of the policies associated with test scores. It seems to me that any debate about civil rights shouldn’t be about whether we test, but what we do with the scores.
What is a “descriptive decrease” (in teacher quality)? Haven’t heard that term before.
Ha, that’s a coined-by-me term meaning not even a decrease of any kind (even one that’s not assumed to be causally related to annual testing).
You acknowledged the instructional responses to testing, such as changes to curricula. I have found the literature on this to be quite clear–schools have generally narrowed curricula, with less time for arts instruction (other non-tested subjects, too, but I study arts education policy). Would you not consider this “harm”?
There’s no question I’d prefer a less narrowed curriculum. That said, even NAEP in non-core subjects (e.g., geography, social studies) has shown, at worst, stability (and in many grades/subjects, continued growth) since the advent of NCLB. So I guess I’d say if I had evidence that kids’ music and arts knowledge had been harmed, then I would view this as a harm in the same way that I view growth in math and reading achievement as a benefit. But I have not seen that evidence.
And also, Tucker didn’t make that point, so I didn’t really address it in my post. That point is quite a bit more defensible than most of the points he did make.
LikeLiked by 1 person
[…] offered a response to the points made by Haycock and Edelman. But those, in turn, were then addressed thoroughly, honestly, directly, and completely by Morgan Polikoff of University of Southern […]
[…] to give my two cents). Third, it’s a trend that actively worries me as someone who believes research clearly shows that tests and accountability have been beneficial overall. I don’t really see much […]
[…] For instance, here’s a few summaries I’ve written on testing and accountability, and here’s a nice review chapter. These all conclude, rightly, that […]