Testing my patience

PBS is out with a truly awful report on testing/opt out/Common Core. You can watch it here and read one takedown here.

I’m not going to do a full takedown, but I’ll highlight a few points that weren’t made by Will Ragland.

  1. Hagopian says testing is a multi-billion dollar industry. That’s true but overwrought and misleading. We have 50 million kids in school–spend $20 a kid per year and you’re at a billion. Yes, we spend billions on evaluating how well kids are learning. That’s far less than 1% of our total education dollars, in order to offer some evaluation of how our system is doing. Seems like a perfectly reasonable amount to me (if anything, it’s too little, and our limited spending on assessment has resulted in some of the poor quality tests we’ve seen over the years). Saving that <<1% wouldn’t really do anything to reduce class sizes or boost teacher salaries or whatever else Hagopian would like us to do, even if we cut testing expenses to 0.
  2. There’s an almost farcically absurd analogy that testing proponents think a kid with hypothermia just needs to have his temperature taken over and over again, whereas teachers just know to wrap the kid in the blanket. First of all, given horrendous outcomes for many kids, it seems like at least a handful of educators (or perhaps more accurately, the system as a whole) has neglected their blanketing duties more often than we’d care to note. Second, these test data are used in dozens of ways to help support and improve schools, especially in states that have waivers (which, admittedly, Washington is not one).
  3. Complaining about a test-and-punish philosophy in Washington State is pretty laughable, since there’s no exit exam for kids [CORRECTION: there appears to be some new exit exam requirements being rolled out in the state, though students did not opt out of these exams; apologies that I did not catch these earlier; I was referring to old data], no high-stakes teacher evaluation, and less accountability for schools than there was during the NCLB era (though parents did get a letter about their school’s performance …). Who, exactly, is being punished, and how?
  4. Finally, the report lumps together Common Core with all kinds of things that are not related to Common Core, such as the 100+ standardized test argument and the MAP test. Common Core says literally nothing at all about testing, and it certainly doesn’t have anything to do with a district-level benchmark test.

It shouldn’t be asking that much for a respected news organization to get very basic details about major education policies that have existed for 4+ year correct. Instead, we get misleading, unbalanced nonsense that will contribute to the tremendous levels of misinformation we see among voters about education policy.

Friends don’t let friends misuse NAEP data

At some point the next few weeks, the results from the 2015 administration of the National Assessment of Educational Progress (NAEP) will be released. I can all but guarantee you that the results will be misused and abused in ways that scream misNAEPery. My warning in advance is twofold. First, do not misuse these results yourself. Second, do not share or promote the misuse of these results by others who happen to agree with your policy predilections. This warning applies of course to academics, but also to policy advocates and, perhaps most importantly of all, to education journalists.

Here are some common types of misused or unhelpful NAEP analyses to look out for and avoid. I think this is pretty comprehensive, but let me know in the comments or on Twitter if I’ve forgotten anything.

  • Pre-post comparisons involving the whole nation or a handful of individual states to claim causal evidence for particular policies. This approach is used by both proponents and opponents of current reforms (including, sadly, our very own outgoing Secretary of Education). Simply put, while it’s possible to approach causal inference using NAEP data, that’s not accomplished by taking pre-post differences in a couple of states and calling it a day. You need to have sophisticated designs that look at changes in trends and levels and that attempt to poke as many holes as possible in their results before claiming a causal effect.
  • Cherry-picked analyses that focus only on certain subjects or grades rather than presenting the complete picture across subjects and grades. This is most often employed by folks with ideological agendas (using 12th grade data, typically), but it’s also used by prominent presidential candidates who want to argue their reforms worked. Simply put, if you’re going to present only some subjects and grades and not others, you need to offer a compelling rationale for why.
  • Correlational results that look at levels of NAEP scores and particular policies (e.g., states that have unions have higher NAEP scores, states that score better on some reformy charter school index have lower NAEP scores). It should be obvious why correlations of test score levels are not indicative of any kinds of causal effects given the tremendous demographic and structural differences across states that can’t be controlled in these naïve analyses.
  • Analyses that simply point to low proficiency levels on NAEP (spoiler alert: the results will show many kids are not proficient in all subjects and grades) to say that we’re a disaster zone and a) the whole system needs to be blown up or b) our recent policies clearly aren’t working.
  • (Edit, suggested by Ed Fuller) Analyses that primarily rely on percentages of students at various performance levels, instead of using the scale scores, which are readily available and provide much more information.
  • More generally, “research” that doesn’t even attempt to account for things like demographic changes in states over time (hint: these data are readily available, and analyses that account for demographic changes will almost certainly show more positive results than those that do not).

Having ruled out all of your favorite kinds of NAEP-related fun, what kind of NAEP reporting and analysis would I say is appropriate immediately after the results come out?

  • Descriptive summaries of trends in state average NAEP scores, not just across a two NAEP waves but across multiple waves, grades, and subjects. These might be used to generate hypotheses for future investigation but should not (ever (no really, never)) be used naively to claim some policies work and others don’t.
  • Analyses that look at trends for different subgroups and the narrowing or closing of gaps (while noting that some of the category definitions change over time).
  • Analyses that specifically point out that it’s probably too early to examine the impact of particular policies we’d like to evaluate and that even if we could, it’s more complicated than taking 2015 scores and subtracting 2013 scores and calling it a day.

The long and the short of it is that any stories that come out in the weeks after NAEP scores are released should be, at best, tentative and hypothesis-generating (as opposed to definitive and causal effect-claiming). And smart people should know better than to promote inappropriate uses of these data, because folks have been writing about this kind of misuse for quite a while now.

Rather, the kind of NAEP analysis that we should be promoting is the kind that’s carefully done, that’s vetted by researchers, and that’s designed in a way that brings us much closer to the causal inferences we all want to make. It’s my hope that our work in the C-SAIL center will be of this type. But you can bet our results won’t be out the day the NAEP scores hit. That kind of thoughtful research designed to inform rather than mislead takes more than a day to put together (but hopefully not so much time that the results cannot inform subsequent policy decisions). It’s a delicate balance, for sure. But everyone’s goal, first and foremost, should be to get the answer right.

Any way you slice it, PDK’s results on Common Core are odd

I’ve written previously about recent polling on Common Core, noting that PDK/Gallup’s recent poll result on that topic is way out of whack with what other polls have found. One common argument you hear to explain this result is that PDK has a different wording than other polls. I always found this argument a little suspect, because I doubted that such hugely disparate results could be explained by the PDK wording (which, to me, seems relatively neutral).

In the 2015 PACE/USC Rossier poll, we designed questions to test the impact of Common Core poll wording on respondents’ views toward the standards. Specifically, we split our sample of 2400 four ways, randomly assigning each group one of four Common Core questions.

  1. To what extent do you approve or disapprove of the Common Core State Standards? (neutral wording)
  2. To what extent do you support or oppose having teachers in your community use the Common Core State Standards to guide what they teach? (PDK)
  3. As you may know, over the past few years states have been deciding whether or not to implement the Common Core State Standards, which are national standards for reading, writing, and math. In the states that have these standards, they will be used to hold public schools accountable for their performance. To what extent do you support or oppose the use of the Common Core Standards in California? (Education Next)
  4. A version of a PACE/USC Rossier legacy question that provides a pro- and an anti- CCSS explanation and asks respondents to pick one.

This design allows us to explicitly compare the results from wordings used in multiple national polls, and it also allows us to compare California-specific results to national figures. So, what did we learn?

First, we learned that the wording of Education Next and PDK did indeed affect the results they obtained. Using the Education Next wording, we saw support leading opposition 52/29. In contrast, using both the neutral wording (26/31) and PDK (24/27) wordings, we saw support trailing opposition [1]. Clearly, how you word the question affects what results you get.

But second, we saw that the PDK results almost certainly cannot be entirely explained by question wording. To see how we reached this conclusion, consider the difference between the support we observed using the Education Next question and the results they saw: 52/29 vs. 49/35. Those results are quite close–just a few points difference on both support and opposition–and the difference is likely attributable to the fact that California voters are more liberal than national averages and the state has seen less Common Core controversy than some others.

In contrast, our results using the PDK wording are wildly different from the results PDK reported: 24/27 vs. 24/54. Those results are substantially different in two main ways. First, many more people offered a response to this question on the PDK poll than on our poll, suggesting more people feel informed enough to opine in their sample (probably marginal people who know quite little about the topic). Second, while the proportion supporting is the same, the proportion opposing is twice as high (!) in the PDK poll sample.

How could it be that our results differed from EdNext’s by just a few points but differed from PDK’s by 27 points? I think these results suggest that question wording alone cannot fully explain these differences. So what are the possible explanations? I see two most likely:

First, it’s possible there’s something wrong with PDK’s sample or pollster. Though Gallup has a strong national reputation, they’ve been criticized in the past by some notable polling experts. It could be that those problems are occurring here, too.

Second, there’s something about the ordering of questions that’s affecting support on the PDK poll. In particular, PDK asked 9 questions about standardized tests before they got to the Common Core question (at least, to the extent that I can discern their ordering from their released documentation). In contrast, we asked neutral right track/wrong track questions about the governor, the president, and California schools, and Education Next asked about support for schools, topics covered in schools, and school spending. Perhaps that ordering had something to do with the results.

Either way, I think these results add further support to the conclusion that PDK’s results (certainly on Common Core, but probably in general) shouldn’t be taken as the gospel. Quite the contrary; they’re an outlier, and their results should be treated as such until they demonstrate findings more in line with what we know about public opinion.

[1] I wasn’t expecting PDK to be as close to the neutral result as they were.

The more people know about Common Core, the less they know about Common Core

Today marks the release of the second half of the PACE/USC Rossier poll on education, our annual barometer on all things education policy in California [1]. This half focuses on two issues near and dear to my heart: Common Core and testing. Over the coming days and weeks I’ll be parsing out some interesting tidbits I’ve uncovered in analyzing results from this year’s poll.

The first finding worth mentioning has to do with Common Core support and knowledge. We’ve all read arguments like “The more people know about Common Core, the less they like it”. For instance, we see that claim from NPR, Republican legislators, and hackish tea party faux-news sites. This claim is generally based on the finding from several polls that people who say they know more about the standards are less likely to support it (or more generally, the trend that reported knowledge has increased over time, as has opposition). It turns out, however, that this may not be as true as you think.

To test knowledge of Common Core, we first asked people to tell us how much they know about the Common Core (a lot, some, a little, nothing at all). Then, we asked them a series of factual and opinion questions about the standards, to test whether they really did know as much as they said they did. The results were quite illuminating.

It turns out that people who said they knew a lot about Common Core were actually the most likely group to report misconceptions about the standards, and the group that had the highest level of net misconceptions (misconceptions – correct conceptions). For instance, 51.5% of people who said they knew “a lot” about Common Core, incorrectly said it was false that Common Core included only math and ELA standards. In contrast, just 31.7% of this group correctly answered this statement (for a net misconception index of -20). For people who only reported knowing a little about the standards, their net misconceptions were just -11 (33% misconception, 22% correct conception).

Another area on which Common Core-“knowledgable” people were more likely to be incorrect was in agreeing that Common Core required more testing than previous state standards. 57% of this group wrongly said this was true, while just 31% correctly said it was false (net misconceptions -26). All groups had net misconceptions on this item, but the margin was -19 for the “some” knowledge group, -16 for the “a little” group, and -11 for the “none” group.

In terms of raw proportions of misinformed individuals, the “a lot” of knowledge group is also the most misinformed group about the Obama administration’s role in creating the standards and the federal government’s role in requiring adoption.

In short, yes, individuals who say they know more about the standards are less likely to support the standards. But, as it turns out, that’s not because they actually know more (they don’t). Rather, it’s likely because they “know” things that are false and that are almost certainly driving their opposition.

So the next time you see someone claiming that “the more people know about Common Core, the less they like it,” feel free to correct them.

[1] Part 1 of this year’s poll was already released–it focused on Local Control Funding and overall attitudes toward education in California. You can read more about it here.

A brief post on a disappointing brief

Over the weekend, the Network for Public Education put out a brief published by Julian Vasquez Heilig and others titled “Should Louisiana and the RSD receive accolades for being last and nearly last?” The release of this brief was presumably timed to coincide with the 10-year anniversary of Hurricane Katrina, a horrific disaster that killed many and destroyed much of New Orleans. This report comes on the heels of several other NOLA-related publications, such as:

  • Doug Harris’s brief on the achievement gains post-Katrina, which uses longitudinal statewide student-level data and (to my knowledge) the most advanced quasi-experimental techniques one could use given existing data limitations. That report concludes “We are not aware of any other districts that have made such large improvements in such a short time.”
  • CREDO’s urban charters report that uses sophisticated matching methods that, while not uncontroversial in some quarters, are at least a very reasonable attempt to solve the inherent problems in charter/traditional public comparisons (you can read a defense of their methods here. I wouldn’t say I’m a strong enough methodologist to really adjudicate these issues, but it’s worth noting that Dr. Vasquez Heilig has, in the past, promoted findings from CREDO as producing causal estimates when it was convenient for him to do so). That report concludes that charters in New Orleans outperform students in traditional public schools by about one-tenth of a standard deviation.

The new brief is notable because it finds, across multiple measures, that Louisiana/New Orleans/The RSD are woefully underperforming. Here’s their conclusion, for example:

In summary, the NAEP scores have risen in reading and math, but Louisiana’s ranking relative to the nation has remained the same in math and dropped one spot in reading. The new NAEP research in this brief shows that Louisiana charter schools perform worse than any other state when compared to traditional schools. This finding is highly problematic for the conventional narrative of charter success in Louisiana and the RSD. Also, the RSD dropout, push out, and graduation rates are of concern— placing last and nearly last in the state. After ten years of education reform it is a disappointment that only 5% of RSD students score high enough on AP tests to get credit. The review of data also demonstrates that neither the Louisiana ACT nor RSD ACT scores are positive evidence of success.

In conclusion, the national comparative data suggest that there is a dearth of evidence supporting a decade of test-score-driven, state-takeover, charter-conversion model as being implemented in New Orleans. The predominance of the data suggest that the top-down, privately controlled education reforms imposed on New Orleans have failed. The state and RSD place last and nearly last in national and federal data. These results do not deserve accolades.

Now, I am not an expert on New Orleans, nor do I have a particular horse in this race. I want what’s best for the kids of New Orleans. And I want research that helps us decide what’s working for New Orleans’ kids and what’s not. Unfortunately, I don’t think the NPE brief helps us in that regard. In fact, the brief provides no evidence whatsoever about the effects of New Orleans’ reforms (and certainly less than is provided by the ERA and CREDO studies (and others)). [1] I’m not going to do a full-scale critique here, but I will point out a few ways in which the report is fatally flawed.

  • Probably the most obvious issue is that the NAEP data that the authors use are not suited to an analysis of NOLA’s performance. New Orleans is NOT one of the districts that participates in NAEP’s Trial Urban District Assessment, so the statewide results from Louisiana tell us nothing at all about New Orleans’ performance. The Louisiana charter sample is not necessarily from New Orleans, anyway (as the authors point out, 30% of LA’s charters are outside New Orleans). This alone would be fatal for an analysis seeking to understand the effectiveness of NOLA reforms. And it would be puzzling to conduct such a study, given the obvious data limitations and the evidence we already have from studies that have access to superior data. But setting it aside …
  • The design of the study is not appropriate for any investigation that seeks to have high internal validity (that is, for us to trust the authors’ conclusions about cause and effect). If we were to take the authors’ analysis as implying cause and effect, that would be logically equivalent to simply ranking the states on their 2013-2003 gains and recommending whatever education policies were in place in the top-gaining states. As it happens, other folks (including our very own Secretary of Education) have done that already, arguing that Tennessee’s and DC’s policies were the ones we should be adopting. And they were wrong to do that, too. There’s even a term for this kind of highly questionable use of NAEP scores to make policy recommendations–misNAEPery. To be sure, there are appropriate uses of NAEP data to attempt to answer cause and effect questions (e.g., here and here). But these use much more sophisticated econometric techniques and go through dozens of falsification exercises, which doesn’t appear to have been done here.
  • The authors seem to use only data from a couple time points, for some reason ignoring all the other years of data that exist. Given all of the things that happened, both in Louisiana and in other states, between 2003 and 2013, it is inappropriate to simply take the difference in scores and attribute it to the impact of “reform.”
  • The authors need to create strong comparison groups from observational data, and they resort to regression-based approaches. These would only create a fair charter/public comparison if the models adequately controlled for all observed and unobserved factors that contribute to the charter enrollment decision and affect the outcome. The very limited number of statistical controls in their model (e.g., only racial-ethnic composition and poverty at the school level) are almost certainly not up to the task. Observational charter studies using limited controls simply do not produce results that are consistent with studies using more advanced methods.
  • The authors seem to ignore the other literature on this topic, such as the two studies cited above. At a minimum, they could offer justification for why their methods should be preferred over ERA’s and CREDO’s methods (perhaps they did not offer such a justification because none exists).

There are many other critiques, and I’m sure others will make them. The main conclusion is that this report provides literally zero information about the causal effect of the New Orleans reforms on student outcomes, and should therefore be ignored. There are lots of hard questions about New Orleans reforms, and we need good evidence on them. Even folks who are far from reform-friendly agree that this brief provides no such evidence.

EDIT: If you want to see Dr. Vasquez Heilig’s response, in which he essentially acknowledges his report has zero internal validity, check it out here.

[1] That this kind of work would be promoted by groups that claim to “[use] research to inform education policy discussions & promote democratic deliberation” is a shame, and a topic for another conversation.

Common Core goes postmodern

A quick post today.

Mike Petrilli tweeted about the new IES standards center (of which I am a part) at Jay Greene and Rick Hess, asking them if they might be convinced of CCSS effectiveness by the results of such a study. To be clear, the study design is the same as several previously published analyses of the impact of NCLB, which are published in top journals and are widely cited. We are simply using CITS designs to look at the causal impact of CCSS adoption and then exploring the possible mediating factor of state implementation.

Jay respondedNo. Low N and choosing and implementing CC are endogenous.

Rick agreed: “Nah, the methodology on the link isn’t compelling- which fuels my skepticism. As Jay said: low n, endogeneity. Ugh.”

I’m fine with the attitudes expressed here, so long as they are taken to their logical conclusion, which is that we cannot ever know the impact of Common Core adoption or implementation (in which case why are we still talking about it?). I don’t see how, if the best-designed empirical research can’t be trusted, that we can ever hope to know whether Common Core has had any impacts at all. So if Jay and Rick believe that, by all means.

I suspect, however, that Jay and Rick don’t believe that. For starters, they’ve routinely amplified work that has at least as serious methodological problems as our yet-to-be-conducted work. In that case, however, the findings (standards don’t matter much) happened to agree with their priors.

Furthermore, both have written repeatedly about the negative impacts of Common Core. For instance, Common Core implementation causes opt out. Common core implementation is causing a retreat on standards and accountability. Common Core implementation is causing restricted options for parents. Common Core implementation is causing the crumbling of teacher evaluation reform. [1] How can we know any of these things are caused by Common Core if even the best-designed causal research can’t be trusted?

The answer is we can’t. So Rick and Jay (and others who have made up their minds that a policy doesn’t work before it has even been evaluated) should take a step back, let research run its course, and then decide if their snap judgments were right. Or, they should conclude that no research on this topic can produce credible causal estimates, in which case they should stop talking about it. I’ll end with a response from Matt Barnum, which I think says everything I just said, but in thousands fewer characters:

So are people (finally) acknowledging that their position on CC is non-falsifiable?”


[1] Note: I believe at least some of these claims may be true. But that’s not hypocritical, because I’m not pretending to believe there is no truth with regard to the impact of Common Core.

On Common Core, can two polls this different both be right?

It’s everyone’s favorite time of year! No, not Christmas (though this lapsed Jew increasingly finds the Christmas season enchanting). It’s Education Poll Season!

A few weeks ago we had Education Next’s annual poll. Yesterday was Phi Delta Kappan/Gallup. And over the next couple weeks there will be results from the less heralded but no-less-awesome poll put out by USC Rossier and Policy Analysis for California Education [1]. It’s great that all of these polls come out at once because:

  1. It’s so easy to directly compare the results across the polls (at least when they ask similar enough questions).
  2. It’s so easy to spot hilariously (and presumably, maliciously) bad poll-related chicanery.

In today’s analysis, I’m going to discuss results from these and other polls pertaining to public support for the Common Core standards. I’ve done a little of this in the past, but I think there are important lessons to be learned from the newest poll results.

Finding 1. Support for the Common Core is probably decreasing. Education Next asked about Common Core in the same way in consecutive years. Last year they found a 54/26 margin in favor; this year it was 49/35. PDK asked about Common Core last year and saw 60/33 opposition; this year it was 54/24. In both cases the opposition margin has increased, though not by much in PDK. The PACE/USC Rossier poll will add to this by tracking approval using the same questions we have used in previous years.

Finding 2. Voters still don’t know much about Common Core. In PDK, 39% of voters reported having heard just a little or not at all about Common Core (I’m also counting “don’t know” here, which seems to me to have a very similar meaning to “not at all”). In Education Next, 58% of respondents did not know whether Common Core was being implemented in their district, an even more direct test of knowledge. While neither of the polls this year also asked respondents factual questions about the standards to gauge misconceptions, I’m quite confident they’re still high given what polls found last year. The PACE/USC Rossier Poll will add to this by testing the prevalence of a variety of misconceptions about the standards.

Finding 3. Folks continue to like almost everything about Common Core other than the name. For instance, Education Next finds that voters overwhelmingly support using the same standardized test in each state (61/22), which aligns with the federal government’s efforts in supporting the consortia to build new assessments. Voters also are quite favorable toward math and reading standards that are the same across states (54/30). Finally, PDK finds that voters are much more likely to say their state’s academic standards are too low (39%) than too high (6%), which supports the decisions states are making with respect to new Common Core cut scores.

Finding 4It seems likely that the wording of Common Core questions matters for the support level reported, but we don’t have enough good evidence to say for sure. Education Next was criticized last year for the wording of their Common Core question, which was

As you may know, in the last few years states have been deciding whether or not to use the Common Core, which are standards for reading and math that are the same across the states. In the states that have these standards, they will be used to hold public schools accountable for their performance. Do you support or oppose the use of the Common Core standards in your state?

The question was criticized for invoking accountability, which most folks are in favor of. Because the folks at Education Next are savvy and responsive to criticism, they tested the effect of invoking accountability, asking the same question but without the “In the states …” question and found support fell to 40/37. Though PDK was criticized last year for their question, they appear to have stuck with the same questionable item. The PACE/Rossier poll directly tests both the 2014 PDK and Education Next questions, plus two other support/opposition questions, in order to clearly identify the impact of question wording on support.

Finding 5. As compared to every other reasonably scientific poll I’ve seen that asks about Common Core, PDK produces the most extreme negative results. Here are all the polls I have found from the last two years and their support/opposition numbers (sorted in order from most to least favorable):

Public Policy Institute of California 2014 (CA): 69/22 (+47)

Education Next 2014: 54/26 (+28)

NBC News 2014: 59/31 (+28)

Public Policy Institute of California 2015 (CA): 47/31 (+16)

Education Next 2015: 49/35 (+14)

Friedman Foundation 2015: 40/39 (+1)

University of Connecticut 2014: 38/44 (-6)

PACE/USC Rossier 2014 (CA): 38/44 or 32/41, depending on question (-6, -9)

Louisiana State University 2015 (LA): 39/51 (-12)

Monmouth University 2015 (NJ): 19/37 (-18)

Times Union/Siena College 2014 (NY): 23/46 (-23)

Fairleigh Dickinson 2015: 17/40 (-23)

PDK 2014: 33/60 (-27)

PDK 2015: 24/54 (-30)

Only one other national poll in the past two years comes within 20 points (!) of the negative margin found by PDK – anything else that’s that negative comes out of a state that’s had particularly chaotic or controversial implementation. Now, it could be that PDK’s results are right and everyone else’s are wrong, but when you stack them up with the others it sure looks like there’s something strange in those findings. It might be the question wording (again, since PACE/USC Rossier is using their exact wording, we can test this), but my guess is it’s something about the sample or the questions they ask before this one. This result just seems too far outside the mainstream to be believed, in my opinion.

Finding 6. The usual suspects of course pounced on the PDK poll to score points. Randi Weingarten used the results on Twitter to make some point about toxic testing (the use of a buzzphrase like that is a pretty clear sign that your analysis isn’t so serious). At the opposite end of the spectrum (which, increasingly, is the same end of the spectrum), Neal McCluskey said the results showed Common Core was getting clobbered (though, to his credit, he questioned the strange item wording and also wrote about Education Next last week, albeit in a somewhat slanted way).

So there we have it. Common Core support is down. But if you don’t call it Common Core and you ask people what they want, they want something very Common Core-like. They still haven’t heard much about Common Core, and most of what they think they know is wrong. And they almost certainly aren’t as opposed as PDK finds them to be. That’s the state of play on Common Core polling as of now. Our poll, coming out in a couple weeks, will address some of the major gaps described above and contribute to a deeper understanding of the support for the standards.

[1] Disclosure: Along with David Plank and Julie Marsh, I’m one of the three main architects of this poll.

Testing the tests

It’s been radio silence here for a couple weeks. Sorry about that. The truth is, all of my projects are coming to a head between now and mid-September, and there’s just not been any time. Today I thought I’d give a quick update about one of the projects I haven’t mentioned much, and which I think will probably make a big splash later this year.

Obviously, I’m a fan of standards. I think we have good evidence that students’ opportunities to learn are all-too-often driven by where they live or which teacher they happened to draw than by what we know (or at least think we know) about what they need to know in order to be successful in life. These opportunity to learn gaps existing not just across states, but within states, within districts, within schools, and very likely within classrooms. I also think as a practical matter that it’s remarkably inefficient for there to be 3 million unique interpretations of what’s important for kids to know. So I think standards are an important starting point for solving some of these problems.

Of course, I also think curriculum matters, and that’s why I’m studying textbooks and talking with teachers and district leaders about how they’re thinking about curriculum in this brave new world brought to us by the interwebs. Curricula bring standards to life and help concretize what can otherwise be sometimes frustratingly abstract language in standards. My hope is that students have equal access to good curriculum materials that offer faithful interpretations of the standards–whether that is the case remains to be seen.

The third leg of this little instructional tripod is the tests. The tests are intended to reinforce the content messages of the standards and to give teachers and parents accurate feedback about students’ performance. They’re also often used to make decisions about schools, (slightly less often) teachers, and (much less often) students. Now, we’ve known for quite a while that our tests weren’t that good. They’ve been made cheap, to test low-level skills using primarily (or exclusively) multiple choice items. And they haven’t offered useful feedback to anyone because results have generally arrived too late. The result is that the tests we’ve had have undermined the standards, rather than supporting them, leading to more reductionist responses from educators.

There are promising signs that the tests are looking better. The federal government pumped a large amount of money into the consortia and both PARCC and SBAC brought on the best of the best to help them build better tests. At the same time, other experienced players have gotten into the game, such as ACT. These tests are competing against each other, and they’re also competing against the best of the old state tests, such as Massachusetts MCAS. And they’re doing it for little to no more money than the mediocre state tests we had for years. While everything I’ve heard and read suggests these tests are indeed a pretty substantial step forward from what they’re replacing, states are in full retreat mode, tossing them aside left and right. To date, while there has been promising hints about the new tests, there just hasn’t been the kind of deep analysis of them as you might have hoped if your goal was ensuring the best evidence about the quality of the tests got into the hands of policymakers.

It’s in this context that I’m working with the Thomas B. Fordham Institute and HumRRO on a study evaluating these four assessments–PARCC, SBAC, ACT Aspire, and MCAS. We’re bringing together expert reviewers (educators and content experts) from around the country starting tomorrow for an intensive review of these tests’ content (their actual forms!), documentation, transparency, and accessibility. Later evaluations will examine their technical psychometric evidence. Our methodology was developed by the Center for Assessment and reviewed extensively by the project teams and by experts in measurement and assessment over the last year. It is based on the CCSSO’s Criteria for High Quality Assessments.

I’m so excited for this study starting tomorrow, not just because I’ll get my hands on real student test forms in a way that few folks have been fortunate enough to do, but also because it’ll be the first study to directly compare these tests to each other and against the research-based framework for what a good test should look like. I also happen to think the report, whatever it finds, is going to be useful to policymakers in states nationwide.

In the end, I’m sure that none of these tests will come out looking perfect. It’s my hope that they’ll all be strong along most dimensions so that we can say to states “these would be good choices if your goal was giving students a fair test that adequately covered the standards and gave teachers the right kinds of instructional messages.” If they end up looking no better than what we had before, it will further erode the already tenuous support the standards have among educators and the public, and it will likely do serious damage to the hope that a standards-based reform can really improve opportunity to learn for our kids.

Observations from abroad

For the last two weeks I have been traveling in Europe (not that I’d call it a “vacation”—it was more like “slightly less work than usual, but in a series of lovely, historic cities surrounded by 13th century churches”). While I try not to talk shop with the strangers I meet while traveling, it often ends up coming up. And basically regardless of where these strangers are from—on this trip I talked to folks from France, Belgium, the Netherlands, and Australia—their reaction to the structure of the U.S. education system is the same: it makes no sense.

Now, this could be because of how I describe things, or it could be that they sense my position on these matters and agree so as to not be disagreeable, or it could be because I only attract like-minded socialists when having conversations with strangers. But when I describe, for instance, our set of 50 state standards under No Child Left Behind (or the fact that a few decades before that we didn’t really have even state standards to speak of), our 10+ thousand school districts each operating with their own policies and procedures, or the fierce resistance to even the slightest effort to create more uniformity in our systems in order to improve equity, they uniformly respond with incredulity.

Of course it makes no sense to have different math standards in every state (let alone every district or every school, which is what many want). Of course that kind of system exacerbates inequality rather than ameliorates it. Of course our system is wildly, hopelessly inefficient. It’s just so obvious to them, as it should be to all of us. They also tend to think our testing system is odd—especially the fact that our tests mostly have stakes for schools and perhaps teachers but not students [1].

Anyway, there’s no great revelation here, just something I have noticed repeatedly when I talk to people from around the world. It doesn’t mean that they’re necessarily right and we should make our system into France’s, but it does underline the already serious questions in my mind about the possibility of systemically improving a system that is structured as ours is. And of course, none of this changes the fact that the folks I talk to all love America (at least to visit) and recognize that, even though we have many problems, we’re still a unique and important nation that profoundly influences the rest of the world (especially culturally).

Glad to be home, and back to my regularly scheduled blogging.

[1] The other two things that are most obvious to everyone outside the country that I talked to are a) Obama has been a great President and we don’t give him the credit he deserves, and b) the single clearest example of our craziness as a nation is our gun issues.

Research you should read: On the distribution of teachers

Today’s installment of “Research you should read” comes to us from Educational Researcher. The paper is “Uneven playing field? Assessing the teacher quality gap between advantaged and disadvantaged students,” and it’s by Dan Goldhaber and colleagues. This is a beautifully done analysis that accomplishes several goals:

  1. It quantifies the degree of teacher sorting based on multiple teacher characteristics, including both input (e.g., credentials) and output (e.g., estimates of effectiveness) measures.
  2. It examines that sorting across multiple indicators of student disadvantage.
  3. It does (1) and (2) for an entire state.
  4. It identifies the sources of the inequitable distribution (e.g., is it mostly due to between-school or within-school sorting?).

The results are intensely sobering, if not at all surprising:

We demonstrate that in Washington state elementary school, middle school, and high school classrooms, virtually every measure of teacher quality—experience, licensure exam score, and value-added estimates of effectiveness—is inequitably distributed across every indicator of student disadvantage—free/reduced-price lunch status (FRL), underrepresented minority (URM), and low prior academic performance (the sole exception being licensure exam scores in high school math classrooms).

In short, poor kids, kids of color, and low-achieving kids systematically get access to lower quality teachers, any way you define “quality” [1].

The authors also note that most of the sorting is between schools and between districts, rather than within schools, at least for most of these measures. This is also not surprising, but it of course makes addressing this problem all the more difficult. It’s one thing to reassign teachers within schools (though even that is probably much easier said than done). It’s an entirely different thing to find ways to redistribute teachers across schools or districts without raising the hackles of the broad swath of the electorate who wants government to get their hands off the public education system.

There are undoubtedly many causes of this (frankly, abhorrent) set of findings. The authors list or suggest several:

  • Higher-quality teachers are more likely to leave districts serving more disadvantaged kids, likely because of both pay and working conditions.
  • Existing pay structures create little incentive to work in more disadvantaged settings (often it’s the opposite–the more disadvantaged districts pay less than the tonier suburban districts).
  • Student teaching may contribute to sorting, with the most advantaged districts snatching up the most qualified candidates.
  • Collective bargaining agreements often give more senior teachers preference in terms of teaching assignments, which they use to make within-district transfers from more to less disadvantaged schools.
  • School leaders may give their best or most experienced teachers within-school preferences in terms of teaching assignments.

These are not easily remedied, but certainly there are policy innovations that might help. The most obvious is that we should pay teachers who teach in more disadvantaged settings more, not less. This certainly is true between districts, but it ought to be true within districts as well. The authors cite evidence that these bonuses can induce desirable behaviors. Another is that we really need to work on the underlying challenges of working in more disadvantaged schools, including working conditions. Several recent studies have shown the powerful influence of working conditions on teachers’ employment decisions and their improvement as professionals.

I do not know whether state or federal policymakers should get involved in this issue. As a big government guy who is concerned about the way our school system treats those who are most disadvantaged, my inclination is to say yes. My hope is that some states can lead the way, creating new laws and systems that, at a minimum, make it equally likely that a poor kid and a rich one in a public school can get access to a good teacher. The status quo on this issue clearly is not working for our most disadvantaged kids.

[1] Of course there could be some other undefined measure of quality that’s not distributed this way, but I’ve not seen any evidence of that.