Friends don’t let friends misuse NAEP data

At some point the next few weeks, the results from the 2015 administration of the National Assessment of Educational Progress (NAEP) will be released. I can all but guarantee you that the results will be misused and abused in ways that scream misNAEPery. My warning in advance is twofold. First, do not misuse these results yourself. Second, do not share or promote the misuse of these results by others who happen to agree with your policy predilections. This warning applies of course to academics, but also to policy advocates and, perhaps most importantly of all, to education journalists.

Here are some common types of misused or unhelpful NAEP analyses to look out for and avoid. I think this is pretty comprehensive, but let me know in the comments or on Twitter if I’ve forgotten anything.

  • Pre-post comparisons involving the whole nation or a handful of individual states to claim causal evidence for particular policies. This approach is used by both proponents and opponents of current reforms (including, sadly, our very own outgoing Secretary of Education). Simply put, while it’s possible to approach causal inference using NAEP data, that’s not accomplished by taking pre-post differences in a couple of states and calling it a day. You need to have sophisticated designs that look at changes in trends and levels and that attempt to poke as many holes as possible in their results before claiming a causal effect.
  • Cherry-picked analyses that focus only on certain subjects or grades rather than presenting the complete picture across subjects and grades. This is most often employed by folks with ideological agendas (using 12th grade data, typically), but it’s also used by prominent presidential candidates who want to argue their reforms worked. Simply put, if you’re going to present only some subjects and grades and not others, you need to offer a compelling rationale for why.
  • Correlational results that look at levels of NAEP scores and particular policies (e.g., states that have unions have higher NAEP scores, states that score better on some reformy charter school index have lower NAEP scores). It should be obvious why correlations of test score levels are not indicative of any kinds of causal effects given the tremendous demographic and structural differences across states that can’t be controlled in these naïve analyses.
  • Analyses that simply point to low proficiency levels on NAEP (spoiler alert: the results will show many kids are not proficient in all subjects and grades) to say that we’re a disaster zone and a) the whole system needs to be blown up or b) our recent policies clearly aren’t working.
  • (Edit, suggested by Ed Fuller) Analyses that primarily rely on percentages of students at various performance levels, instead of using the scale scores, which are readily available and provide much more information.
  • More generally, “research” that doesn’t even attempt to account for things like demographic changes in states over time (hint: these data are readily available, and analyses that account for demographic changes will almost certainly show more positive results than those that do not).

Having ruled out all of your favorite kinds of NAEP-related fun, what kind of NAEP reporting and analysis would I say is appropriate immediately after the results come out?

  • Descriptive summaries of trends in state average NAEP scores, not just across a two NAEP waves but across multiple waves, grades, and subjects. These might be used to generate hypotheses for future investigation but should not (ever (no really, never)) be used naively to claim some policies work and others don’t.
  • Analyses that look at trends for different subgroups and the narrowing or closing of gaps (while noting that some of the category definitions change over time).
  • Analyses that specifically point out that it’s probably too early to examine the impact of particular policies we’d like to evaluate and that even if we could, it’s more complicated than taking 2015 scores and subtracting 2013 scores and calling it a day.

The long and the short of it is that any stories that come out in the weeks after NAEP scores are released should be, at best, tentative and hypothesis-generating (as opposed to definitive and causal effect-claiming). And smart people should know better than to promote inappropriate uses of these data, because folks have been writing about this kind of misuse for quite a while now.

Rather, the kind of NAEP analysis that we should be promoting is the kind that’s carefully done, that’s vetted by researchers, and that’s designed in a way that brings us much closer to the causal inferences we all want to make. It’s my hope that our work in the C-SAIL center will be of this type. But you can bet our results won’t be out the day the NAEP scores hit. That kind of thoughtful research designed to inform rather than mislead takes more than a day to put together (but hopefully not so much time that the results cannot inform subsequent policy decisions). It’s a delicate balance, for sure. But everyone’s goal, first and foremost, should be to get the answer right.

Any way you slice it, PDK’s results on Common Core are odd

I’ve written previously about recent polling on Common Core, noting that PDK/Gallup’s recent poll result on that topic is way out of whack with what other polls have found. One common argument you hear to explain this result is that PDK has a different wording than other polls. I always found this argument a little suspect, because I doubted that such hugely disparate results could be explained by the PDK wording (which, to me, seems relatively neutral).

In the 2015 PACE/USC Rossier poll, we designed questions to test the impact of Common Core poll wording on respondents’ views toward the standards. Specifically, we split our sample of 2400 four ways, randomly assigning each group one of four Common Core questions.

  1. To what extent do you approve or disapprove of the Common Core State Standards? (neutral wording)
  2. To what extent do you support or oppose having teachers in your community use the Common Core State Standards to guide what they teach? (PDK)
  3. As you may know, over the past few years states have been deciding whether or not to implement the Common Core State Standards, which are national standards for reading, writing, and math. In the states that have these standards, they will be used to hold public schools accountable for their performance. To what extent do you support or oppose the use of the Common Core Standards in California? (Education Next)
  4. A version of a PACE/USC Rossier legacy question that provides a pro- and an anti- CCSS explanation and asks respondents to pick one.

This design allows us to explicitly compare the results from wordings used in multiple national polls, and it also allows us to compare California-specific results to national figures. So, what did we learn?

First, we learned that the wording of Education Next and PDK did indeed affect the results they obtained. Using the Education Next wording, we saw support leading opposition 52/29. In contrast, using both the neutral wording (26/31) and PDK (24/27) wordings, we saw support trailing opposition [1]. Clearly, how you word the question affects what results you get.

But second, we saw that the PDK results almost certainly cannot be entirely explained by question wording. To see how we reached this conclusion, consider the difference between the support we observed using the Education Next question and the results they saw: 52/29 vs. 49/35. Those results are quite close–just a few points difference on both support and opposition–and the difference is likely attributable to the fact that California voters are more liberal than national averages and the state has seen less Common Core controversy than some others.

In contrast, our results using the PDK wording are wildly different from the results PDK reported: 24/27 vs. 24/54. Those results are substantially different in two main ways. First, many more people offered a response to this question on the PDK poll than on our poll, suggesting more people feel informed enough to opine in their sample (probably marginal people who know quite little about the topic). Second, while the proportion supporting is the same, the proportion opposing is twice as high (!) in the PDK poll sample.

How could it be that our results differed from EdNext’s by just a few points but differed from PDK’s by 27 points? I think these results suggest that question wording alone cannot fully explain these differences. So what are the possible explanations? I see two most likely:

First, it’s possible there’s something wrong with PDK’s sample or pollster. Though Gallup has a strong national reputation, they’ve been criticized in the past by some notable polling experts. It could be that those problems are occurring here, too.

Second, there’s something about the ordering of questions that’s affecting support on the PDK poll. In particular, PDK asked 9 questions about standardized tests before they got to the Common Core question (at least, to the extent that I can discern their ordering from their released documentation). In contrast, we asked neutral right track/wrong track questions about the governor, the president, and California schools, and Education Next asked about support for schools, topics covered in schools, and school spending. Perhaps that ordering had something to do with the results.

Either way, I think these results add further support to the conclusion that PDK’s results (certainly on Common Core, but probably in general) shouldn’t be taken as the gospel. Quite the contrary; they’re an outlier, and their results should be treated as such until they demonstrate findings more in line with what we know about public opinion.


[1] I wasn’t expecting PDK to be as close to the neutral result as they were.

The more people know about Common Core, the less they know about Common Core

Today marks the release of the second half of the PACE/USC Rossier poll on education, our annual barometer on all things education policy in California [1]. This half focuses on two issues near and dear to my heart: Common Core and testing. Over the coming days and weeks I’ll be parsing out some interesting tidbits I’ve uncovered in analyzing results from this year’s poll.

The first finding worth mentioning has to do with Common Core support and knowledge. We’ve all read arguments like “The more people know about Common Core, the less they like it”. For instance, we see that claim from NPR, Republican legislators, and hackish tea party faux-news sites. This claim is generally based on the finding from several polls that people who say they know more about the standards are less likely to support it (or more generally, the trend that reported knowledge has increased over time, as has opposition). It turns out, however, that this may not be as true as you think.

To test knowledge of Common Core, we first asked people to tell us how much they know about the Common Core (a lot, some, a little, nothing at all). Then, we asked them a series of factual and opinion questions about the standards, to test whether they really did know as much as they said they did. The results were quite illuminating.

It turns out that people who said they knew a lot about Common Core were actually the most likely group to report misconceptions about the standards, and the group that had the highest level of net misconceptions (misconceptions – correct conceptions). For instance, 51.5% of people who said they knew “a lot” about Common Core, incorrectly said it was false that Common Core included only math and ELA standards. In contrast, just 31.7% of this group correctly answered this statement (for a net misconception index of -20). For people who only reported knowing a little about the standards, their net misconceptions were just -11 (33% misconception, 22% correct conception).

Another area on which Common Core-“knowledgable” people were more likely to be incorrect was in agreeing that Common Core required more testing than previous state standards. 57% of this group wrongly said this was true, while just 31% correctly said it was false (net misconceptions -26). All groups had net misconceptions on this item, but the margin was -19 for the “some” knowledge group, -16 for the “a little” group, and -11 for the “none” group.

In terms of raw proportions of misinformed individuals, the “a lot” of knowledge group is also the most misinformed group about the Obama administration’s role in creating the standards and the federal government’s role in requiring adoption.

In short, yes, individuals who say they know more about the standards are less likely to support the standards. But, as it turns out, that’s not because they actually know more (they don’t). Rather, it’s likely because they “know” things that are false and that are almost certainly driving their opposition.

So the next time you see someone claiming that “the more people know about Common Core, the less they like it,” feel free to correct them.


[1] Part 1 of this year’s poll was already released–it focused on Local Control Funding and overall attitudes toward education in California. You can read more about it here.

A brief post on a disappointing brief

Over the weekend, the Network for Public Education put out a brief published by Julian Vasquez Heilig and others titled “Should Louisiana and the RSD receive accolades for being last and nearly last?” The release of this brief was presumably timed to coincide with the 10-year anniversary of Hurricane Katrina, a horrific disaster that killed many and destroyed much of New Orleans. This report comes on the heels of several other NOLA-related publications, such as:

  • Doug Harris’s brief on the achievement gains post-Katrina, which uses longitudinal statewide student-level data and (to my knowledge) the most advanced quasi-experimental techniques one could use given existing data limitations. That report concludes “We are not aware of any other districts that have made such large improvements in such a short time.”
  • CREDO’s urban charters report that uses sophisticated matching methods that, while not uncontroversial in some quarters, are at least a very reasonable attempt to solve the inherent problems in charter/traditional public comparisons (you can read a defense of their methods here. I wouldn’t say I’m a strong enough methodologist to really adjudicate these issues, but it’s worth noting that Dr. Vasquez Heilig has, in the past, promoted findings from CREDO as producing causal estimates when it was convenient for him to do so). That report concludes that charters in New Orleans outperform students in traditional public schools by about one-tenth of a standard deviation.

The new brief is notable because it finds, across multiple measures, that Louisiana/New Orleans/The RSD are woefully underperforming. Here’s their conclusion, for example:

In summary, the NAEP scores have risen in reading and math, but Louisiana’s ranking relative to the nation has remained the same in math and dropped one spot in reading. The new NAEP research in this brief shows that Louisiana charter schools perform worse than any other state when compared to traditional schools. This finding is highly problematic for the conventional narrative of charter success in Louisiana and the RSD. Also, the RSD dropout, push out, and graduation rates are of concern— placing last and nearly last in the state. After ten years of education reform it is a disappointment that only 5% of RSD students score high enough on AP tests to get credit. The review of data also demonstrates that neither the Louisiana ACT nor RSD ACT scores are positive evidence of success.

In conclusion, the national comparative data suggest that there is a dearth of evidence supporting a decade of test-score-driven, state-takeover, charter-conversion model as being implemented in New Orleans. The predominance of the data suggest that the top-down, privately controlled education reforms imposed on New Orleans have failed. The state and RSD place last and nearly last in national and federal data. These results do not deserve accolades.

Now, I am not an expert on New Orleans, nor do I have a particular horse in this race. I want what’s best for the kids of New Orleans. And I want research that helps us decide what’s working for New Orleans’ kids and what’s not. Unfortunately, I don’t think the NPE brief helps us in that regard. In fact, the brief provides no evidence whatsoever about the effects of New Orleans’ reforms (and certainly less than is provided by the ERA and CREDO studies (and others)). [1] I’m not going to do a full-scale critique here, but I will point out a few ways in which the report is fatally flawed.

  • Probably the most obvious issue is that the NAEP data that the authors use are not suited to an analysis of NOLA’s performance. New Orleans is NOT one of the districts that participates in NAEP’s Trial Urban District Assessment, so the statewide results from Louisiana tell us nothing at all about New Orleans’ performance. The Louisiana charter sample is not necessarily from New Orleans, anyway (as the authors point out, 30% of LA’s charters are outside New Orleans). This alone would be fatal for an analysis seeking to understand the effectiveness of NOLA reforms. And it would be puzzling to conduct such a study, given the obvious data limitations and the evidence we already have from studies that have access to superior data. But setting it aside …
  • The design of the study is not appropriate for any investigation that seeks to have high internal validity (that is, for us to trust the authors’ conclusions about cause and effect). If we were to take the authors’ analysis as implying cause and effect, that would be logically equivalent to simply ranking the states on their 2013-2003 gains and recommending whatever education policies were in place in the top-gaining states. As it happens, other folks (including our very own Secretary of Education) have done that already, arguing that Tennessee’s and DC’s policies were the ones we should be adopting. And they were wrong to do that, too. There’s even a term for this kind of highly questionable use of NAEP scores to make policy recommendations–misNAEPery. To be sure, there are appropriate uses of NAEP data to attempt to answer cause and effect questions (e.g., here and here). But these use much more sophisticated econometric techniques and go through dozens of falsification exercises, which doesn’t appear to have been done here.
  • The authors seem to use only data from a couple time points, for some reason ignoring all the other years of data that exist. Given all of the things that happened, both in Louisiana and in other states, between 2003 and 2013, it is inappropriate to simply take the difference in scores and attribute it to the impact of “reform.”
  • The authors need to create strong comparison groups from observational data, and they resort to regression-based approaches. These would only create a fair charter/public comparison if the models adequately controlled for all observed and unobserved factors that contribute to the charter enrollment decision and affect the outcome. The very limited number of statistical controls in their model (e.g., only racial-ethnic composition and poverty at the school level) are almost certainly not up to the task. Observational charter studies using limited controls simply do not produce results that are consistent with studies using more advanced methods.
  • The authors seem to ignore the other literature on this topic, such as the two studies cited above. At a minimum, they could offer justification for why their methods should be preferred over ERA’s and CREDO’s methods (perhaps they did not offer such a justification because none exists).

There are many other critiques, and I’m sure others will make them. The main conclusion is that this report provides literally zero information about the causal effect of the New Orleans reforms on student outcomes, and should therefore be ignored. There are lots of hard questions about New Orleans reforms, and we need good evidence on them. Even folks who are far from reform-friendly agree that this brief provides no such evidence.

EDIT: If you want to see Dr. Vasquez Heilig’s response, in which he essentially acknowledges his report has zero internal validity, check it out here.


[1] That this kind of work would be promoted by groups that claim to “[use] research to inform education policy discussions & promote democratic deliberation” is a shame, and a topic for another conversation.

Common Core goes postmodern

A quick post today.

Mike Petrilli tweeted about the new IES standards center (of which I am a part) at Jay Greene and Rick Hess, asking them if they might be convinced of CCSS effectiveness by the results of such a study. To be clear, the study design is the same as several previously published analyses of the impact of NCLB, which are published in top journals and are widely cited. We are simply using CITS designs to look at the causal impact of CCSS adoption and then exploring the possible mediating factor of state implementation.

Jay respondedNo. Low N and choosing and implementing CC are endogenous.

Rick agreed: “Nah, the methodology on the link isn’t compelling- which fuels my skepticism. As Jay said: low n, endogeneity. Ugh.”

I’m fine with the attitudes expressed here, so long as they are taken to their logical conclusion, which is that we cannot ever know the impact of Common Core adoption or implementation (in which case why are we still talking about it?). I don’t see how, if the best-designed empirical research can’t be trusted, that we can ever hope to know whether Common Core has had any impacts at all. So if Jay and Rick believe that, by all means.

I suspect, however, that Jay and Rick don’t believe that. For starters, they’ve routinely amplified work that has at least as serious methodological problems as our yet-to-be-conducted work. In that case, however, the findings (standards don’t matter much) happened to agree with their priors.

Furthermore, both have written repeatedly about the negative impacts of Common Core. For instance, Common Core implementation causes opt out. Common core implementation is causing a retreat on standards and accountability. Common Core implementation is causing restricted options for parents. Common Core implementation is causing the crumbling of teacher evaluation reform. [1] How can we know any of these things are caused by Common Core if even the best-designed causal research can’t be trusted?

The answer is we can’t. So Rick and Jay (and others who have made up their minds that a policy doesn’t work before it has even been evaluated) should take a step back, let research run its course, and then decide if their snap judgments were right. Or, they should conclude that no research on this topic can produce credible causal estimates, in which case they should stop talking about it. I’ll end with a response from Matt Barnum, which I think says everything I just said, but in thousands fewer characters:

So are people (finally) acknowledging that their position on CC is non-falsifiable?”

Apparently.


[1] Note: I believe at least some of these claims may be true. But that’s not hypocritical, because I’m not pretending to believe there is no truth with regard to the impact of Common Core.

On Common Core, can two polls this different both be right?

It’s everyone’s favorite time of year! No, not Christmas (though this lapsed Jew increasingly finds the Christmas season enchanting). It’s Education Poll Season!

A few weeks ago we had Education Next’s annual poll. Yesterday was Phi Delta Kappan/Gallup. And over the next couple weeks there will be results from the less heralded but no-less-awesome poll put out by USC Rossier and Policy Analysis for California Education [1]. It’s great that all of these polls come out at once because:

  1. It’s so easy to directly compare the results across the polls (at least when they ask similar enough questions).
  2. It’s so easy to spot hilariously (and presumably, maliciously) bad poll-related chicanery.

In today’s analysis, I’m going to discuss results from these and other polls pertaining to public support for the Common Core standards. I’ve done a little of this in the past, but I think there are important lessons to be learned from the newest poll results.

Finding 1. Support for the Common Core is probably decreasing. Education Next asked about Common Core in the same way in consecutive years. Last year they found a 54/26 margin in favor; this year it was 49/35. PDK asked about Common Core last year and saw 60/33 opposition; this year it was 54/24. In both cases the opposition margin has increased, though not by much in PDK. The PACE/USC Rossier poll will add to this by tracking approval using the same questions we have used in previous years.

Finding 2. Voters still don’t know much about Common Core. In PDK, 39% of voters reported having heard just a little or not at all about Common Core (I’m also counting “don’t know” here, which seems to me to have a very similar meaning to “not at all”). In Education Next, 58% of respondents did not know whether Common Core was being implemented in their district, an even more direct test of knowledge. While neither of the polls this year also asked respondents factual questions about the standards to gauge misconceptions, I’m quite confident they’re still high given what polls found last year. The PACE/USC Rossier Poll will add to this by testing the prevalence of a variety of misconceptions about the standards.

Finding 3. Folks continue to like almost everything about Common Core other than the name. For instance, Education Next finds that voters overwhelmingly support using the same standardized test in each state (61/22), which aligns with the federal government’s efforts in supporting the consortia to build new assessments. Voters also are quite favorable toward math and reading standards that are the same across states (54/30). Finally, PDK finds that voters are much more likely to say their state’s academic standards are too low (39%) than too high (6%), which supports the decisions states are making with respect to new Common Core cut scores.

Finding 4It seems likely that the wording of Common Core questions matters for the support level reported, but we don’t have enough good evidence to say for sure. Education Next was criticized last year for the wording of their Common Core question, which was

As you may know, in the last few years states have been deciding whether or not to use the Common Core, which are standards for reading and math that are the same across the states. In the states that have these standards, they will be used to hold public schools accountable for their performance. Do you support or oppose the use of the Common Core standards in your state?

The question was criticized for invoking accountability, which most folks are in favor of. Because the folks at Education Next are savvy and responsive to criticism, they tested the effect of invoking accountability, asking the same question but without the “In the states …” question and found support fell to 40/37. Though PDK was criticized last year for their question, they appear to have stuck with the same questionable item. The PACE/Rossier poll directly tests both the 2014 PDK and Education Next questions, plus two other support/opposition questions, in order to clearly identify the impact of question wording on support.

Finding 5. As compared to every other reasonably scientific poll I’ve seen that asks about Common Core, PDK produces the most extreme negative results. Here are all the polls I have found from the last two years and their support/opposition numbers (sorted in order from most to least favorable):

Public Policy Institute of California 2014 (CA): 69/22 (+47)

Education Next 2014: 54/26 (+28)

NBC News 2014: 59/31 (+28)

Public Policy Institute of California 2015 (CA): 47/31 (+16)

Education Next 2015: 49/35 (+14)

Friedman Foundation 2015: 40/39 (+1)

University of Connecticut 2014: 38/44 (-6)

PACE/USC Rossier 2014 (CA): 38/44 or 32/41, depending on question (-6, -9)

Louisiana State University 2015 (LA): 39/51 (-12)

Monmouth University 2015 (NJ): 19/37 (-18)

Times Union/Siena College 2014 (NY): 23/46 (-23)

Fairleigh Dickinson 2015: 17/40 (-23)

PDK 2014: 33/60 (-27)

PDK 2015: 24/54 (-30)

Only one other national poll in the past two years comes within 20 points (!) of the negative margin found by PDK – anything else that’s that negative comes out of a state that’s had particularly chaotic or controversial implementation. Now, it could be that PDK’s results are right and everyone else’s are wrong, but when you stack them up with the others it sure looks like there’s something strange in those findings. It might be the question wording (again, since PACE/USC Rossier is using their exact wording, we can test this), but my guess is it’s something about the sample or the questions they ask before this one. This result just seems too far outside the mainstream to be believed, in my opinion.

Finding 6. The usual suspects of course pounced on the PDK poll to score points. Randi Weingarten used the results on Twitter to make some point about toxic testing (the use of a buzzphrase like that is a pretty clear sign that your analysis isn’t so serious). At the opposite end of the spectrum (which, increasingly, is the same end of the spectrum), Neal McCluskey said the results showed Common Core was getting clobbered (though, to his credit, he questioned the strange item wording and also wrote about Education Next last week, albeit in a somewhat slanted way).

So there we have it. Common Core support is down. But if you don’t call it Common Core and you ask people what they want, they want something very Common Core-like. They still haven’t heard much about Common Core, and most of what they think they know is wrong. And they almost certainly aren’t as opposed as PDK finds them to be. That’s the state of play on Common Core polling as of now. Our poll, coming out in a couple weeks, will address some of the major gaps described above and contribute to a deeper understanding of the support for the standards.


[1] Disclosure: Along with David Plank and Julie Marsh, I’m one of the three main architects of this poll.

Testing the tests

It’s been radio silence here for a couple weeks. Sorry about that. The truth is, all of my projects are coming to a head between now and mid-September, and there’s just not been any time. Today I thought I’d give a quick update about one of the projects I haven’t mentioned much, and which I think will probably make a big splash later this year.

Obviously, I’m a fan of standards. I think we have good evidence that students’ opportunities to learn are all-too-often driven by where they live or which teacher they happened to draw than by what we know (or at least think we know) about what they need to know in order to be successful in life. These opportunity to learn gaps existing not just across states, but within states, within districts, within schools, and very likely within classrooms. I also think as a practical matter that it’s remarkably inefficient for there to be 3 million unique interpretations of what’s important for kids to know. So I think standards are an important starting point for solving some of these problems.

Of course, I also think curriculum matters, and that’s why I’m studying textbooks and talking with teachers and district leaders about how they’re thinking about curriculum in this brave new world brought to us by the interwebs. Curricula bring standards to life and help concretize what can otherwise be sometimes frustratingly abstract language in standards. My hope is that students have equal access to good curriculum materials that offer faithful interpretations of the standards–whether that is the case remains to be seen.

The third leg of this little instructional tripod is the tests. The tests are intended to reinforce the content messages of the standards and to give teachers and parents accurate feedback about students’ performance. They’re also often used to make decisions about schools, (slightly less often) teachers, and (much less often) students. Now, we’ve known for quite a while that our tests weren’t that good. They’ve been made cheap, to test low-level skills using primarily (or exclusively) multiple choice items. And they haven’t offered useful feedback to anyone because results have generally arrived too late. The result is that the tests we’ve had have undermined the standards, rather than supporting them, leading to more reductionist responses from educators.

There are promising signs that the tests are looking better. The federal government pumped a large amount of money into the consortia and both PARCC and SBAC brought on the best of the best to help them build better tests. At the same time, other experienced players have gotten into the game, such as ACT. These tests are competing against each other, and they’re also competing against the best of the old state tests, such as Massachusetts MCAS. And they’re doing it for little to no more money than the mediocre state tests we had for years. While everything I’ve heard and read suggests these tests are indeed a pretty substantial step forward from what they’re replacing, states are in full retreat mode, tossing them aside left and right. To date, while there has been promising hints about the new tests, there just hasn’t been the kind of deep analysis of them as you might have hoped if your goal was ensuring the best evidence about the quality of the tests got into the hands of policymakers.

It’s in this context that I’m working with the Thomas B. Fordham Institute and HumRRO on a study evaluating these four assessments–PARCC, SBAC, ACT Aspire, and MCAS. We’re bringing together expert reviewers (educators and content experts) from around the country starting tomorrow for an intensive review of these tests’ content (their actual forms!), documentation, transparency, and accessibility. Later evaluations will examine their technical psychometric evidence. Our methodology was developed by the Center for Assessment and reviewed extensively by the project teams and by experts in measurement and assessment over the last year. It is based on the CCSSO’s Criteria for High Quality Assessments.

I’m so excited for this study starting tomorrow, not just because I’ll get my hands on real student test forms in a way that few folks have been fortunate enough to do, but also because it’ll be the first study to directly compare these tests to each other and against the research-based framework for what a good test should look like. I also happen to think the report, whatever it finds, is going to be useful to policymakers in states nationwide.

In the end, I’m sure that none of these tests will come out looking perfect. It’s my hope that they’ll all be strong along most dimensions so that we can say to states “these would be good choices if your goal was giving students a fair test that adequately covered the standards and gave teachers the right kinds of instructional messages.” If they end up looking no better than what we had before, it will further erode the already tenuous support the standards have among educators and the public, and it will likely do serious damage to the hope that a standards-based reform can really improve opportunity to learn for our kids.

Observations from abroad

For the last two weeks I have been traveling in Europe (not that I’d call it a “vacation”—it was more like “slightly less work than usual, but in a series of lovely, historic cities surrounded by 13th century churches”). While I try not to talk shop with the strangers I meet while traveling, it often ends up coming up. And basically regardless of where these strangers are from—on this trip I talked to folks from France, Belgium, the Netherlands, and Australia—their reaction to the structure of the U.S. education system is the same: it makes no sense.

Now, this could be because of how I describe things, or it could be that they sense my position on these matters and agree so as to not be disagreeable, or it could be because I only attract like-minded socialists when having conversations with strangers. But when I describe, for instance, our set of 50 state standards under No Child Left Behind (or the fact that a few decades before that we didn’t really have even state standards to speak of), our 10+ thousand school districts each operating with their own policies and procedures, or the fierce resistance to even the slightest effort to create more uniformity in our systems in order to improve equity, they uniformly respond with incredulity.

Of course it makes no sense to have different math standards in every state (let alone every district or every school, which is what many want). Of course that kind of system exacerbates inequality rather than ameliorates it. Of course our system is wildly, hopelessly inefficient. It’s just so obvious to them, as it should be to all of us. They also tend to think our testing system is odd—especially the fact that our tests mostly have stakes for schools and perhaps teachers but not students [1].

Anyway, there’s no great revelation here, just something I have noticed repeatedly when I talk to people from around the world. It doesn’t mean that they’re necessarily right and we should make our system into France’s, but it does underline the already serious questions in my mind about the possibility of systemically improving a system that is structured as ours is. And of course, none of this changes the fact that the folks I talk to all love America (at least to visit) and recognize that, even though we have many problems, we’re still a unique and important nation that profoundly influences the rest of the world (especially culturally).

Glad to be home, and back to my regularly scheduled blogging.


[1] The other two things that are most obvious to everyone outside the country that I talked to are a) Obama has been a great President and we don’t give him the credit he deserves, and b) the single clearest example of our craziness as a nation is our gun issues.

Research you should read: On the distribution of teachers

Today’s installment of “Research you should read” comes to us from Educational Researcher. The paper is “Uneven playing field? Assessing the teacher quality gap between advantaged and disadvantaged students,” and it’s by Dan Goldhaber and colleagues. This is a beautifully done analysis that accomplishes several goals:

  1. It quantifies the degree of teacher sorting based on multiple teacher characteristics, including both input (e.g., credentials) and output (e.g., estimates of effectiveness) measures.
  2. It examines that sorting across multiple indicators of student disadvantage.
  3. It does (1) and (2) for an entire state.
  4. It identifies the sources of the inequitable distribution (e.g., is it mostly due to between-school or within-school sorting?).

The results are intensely sobering, if not at all surprising:

We demonstrate that in Washington state elementary school, middle school, and high school classrooms, virtually every measure of teacher quality—experience, licensure exam score, and value-added estimates of effectiveness—is inequitably distributed across every indicator of student disadvantage—free/reduced-price lunch status (FRL), underrepresented minority (URM), and low prior academic performance (the sole exception being licensure exam scores in high school math classrooms).

In short, poor kids, kids of color, and low-achieving kids systematically get access to lower quality teachers, any way you define “quality” [1].

The authors also note that most of the sorting is between schools and between districts, rather than within schools, at least for most of these measures. This is also not surprising, but it of course makes addressing this problem all the more difficult. It’s one thing to reassign teachers within schools (though even that is probably much easier said than done). It’s an entirely different thing to find ways to redistribute teachers across schools or districts without raising the hackles of the broad swath of the electorate who wants government to get their hands off the public education system.

There are undoubtedly many causes of this (frankly, abhorrent) set of findings. The authors list or suggest several:

  • Higher-quality teachers are more likely to leave districts serving more disadvantaged kids, likely because of both pay and working conditions.
  • Existing pay structures create little incentive to work in more disadvantaged settings (often it’s the opposite–the more disadvantaged districts pay less than the tonier suburban districts).
  • Student teaching may contribute to sorting, with the most advantaged districts snatching up the most qualified candidates.
  • Collective bargaining agreements often give more senior teachers preference in terms of teaching assignments, which they use to make within-district transfers from more to less disadvantaged schools.
  • School leaders may give their best or most experienced teachers within-school preferences in terms of teaching assignments.

These are not easily remedied, but certainly there are policy innovations that might help. The most obvious is that we should pay teachers who teach in more disadvantaged settings more, not less. This certainly is true between districts, but it ought to be true within districts as well. The authors cite evidence that these bonuses can induce desirable behaviors. Another is that we really need to work on the underlying challenges of working in more disadvantaged schools, including working conditions. Several recent studies have shown the powerful influence of working conditions on teachers’ employment decisions and their improvement as professionals.

I do not know whether state or federal policymakers should get involved in this issue. As a big government guy who is concerned about the way our school system treats those who are most disadvantaged, my inclination is to say yes. My hope is that some states can lead the way, creating new laws and systems that, at a minimum, make it equally likely that a poor kid and a rich one in a public school can get access to a good teacher. The status quo on this issue clearly is not working for our most disadvantaged kids.


[1] Of course there could be some other undefined measure of quality that’s not distributed this way, but I’ve not seen any evidence of that.

What the marriage equality ruling REALLY means for education

Rick Hess is out with an analysis of the implications of the Supreme Court’s ruling in Obergefell v. Hodges, last week’s landmark ruling that legalized same-sex marriage nationwide. While I think Rick is generally thoughtful, and he tells me that he is personally not opposed to marriage equality, this is among the more hysterical (in all senses of that word) posts I’ve read with respect to any education issue by someone as prominent as Rick. I hate to fall back on the same old technique of parsing each line of other people’s writing, but this piece simply demands that treatment. Suffice it to say that I think his piece is stunningly paranoid, and while I suspect a few of his post-apocalyptic fantasies about schools post-#LoveWins may come to fruition, most will not (and the ones that will are things that absolutely should happen and will benefit children).

He starts:

Like fascists, Communists, and boy-band producers, the American Left has always believed it could fine-tune human nature if it could only “get ’em while they’re young.” That’s why the Left works so hard to impose its will on schools and universities.

I mean, I don’t know where to begin. Yes, of course liberal people try to persuade young folks that our positions are better than conservative positions (as, in fact, they are). Conservatives do this too. What’s your point? And on the issue of marriage equality, the conservative position has lost very, very badly, and that’s with virtually no school-based indoctrination that I can think of (if anyone’s to blame for this extremely positive outcome, it’s probably the media).

As John Dewey, America’s high priest of educational progressivism, explained in 1897, the student must “emerge from his original narrowness” in order “to conceive of himself” as a cog in the larger social order.

I don’t know what this means. But it sounds spooky.

Last week’s gay-marriage ruling will yield a new wave of liberal efforts to ensure that schools do their part to combat wrong-headed “narrowness.” Justice Anthony Kennedy’s sweeping 5–4 decision in Obergefell v. Hodges opened by declaring, “The Constitution promises liberty to all within its reach, a liberty that includes certain specific rights that allow persons, within a lawful realm, to define and express their identity.” Kennedy took pains to opine that marriage “draws meaning from related rights of childrearing, procreation, and education.” In finding that the Fourteenth Amendment secures the right to “define and express [one’s] identity,” the Obergefell majority has issued a radical marker. (If gay marriage had been established by democratic process, things might have played out in a more measured manner.)

This “democratic process” thing is a canard, plain and simple. First off, public support was already on our side. Second off, that’s not how you do equal rights. You don’t wait around and let people vote to see if a minority gets a fair shake. Couples in 13 states were waiting. Suppose one of them dropped dead while we were waiting around for the majority to give them their rights, and as a result they were denied spousal estate benefits. I guess the answer from the Alito camp is “fuck ’em,” but I think most people would say that it’s abhorrent to sit on our hands and wait until the majority decides it’s finally time to grant people their constitutional rights.

Justice Samuel Alito predicted, “Today’s decision . . . will be used to vilify Americans who are unwilling to assent to the new orthodoxy,” and “they will risk being labeled as bigots and treated as such by governments, employers, and schools.” Alito is almost assuredly right, and that poses serious questions for schools and colleges.

Alito is indeed right. I’ve been calling opponents of same-sex marriage bigots for a long time, because that’s a bigoted view. Though I don’t generally call them that to their faces, because I find that’s not a strong debate tactic. We all have the right to express ourselves, but we don’t have the right to be absolved of the consequences of that expression. If I said I thought interracial marriage shouldn’t be allowed, I would rightly be called a bigot. So, yes, people and businesses and states that do things that are bigoted will probably see negative responses.

At the collegiate level, the implications are pretty clear — especially for religious institutions. Christian colleges are going to find their nonprofit tax status under assault unless they agree to embrace gay marriage. (The relevant precedent is the 1983 Supreme Court ruling that enabled the IRS to strip Bob Jones University’s tax-exempt status because of the school’s ban on interracial dating.)

Well, yes, institutions that take federal funds and use them to violate the Constitution shouldn’t get those funds anymore. I doubt there will be a huge rush to hold gay weddings in Bob Jones’ chapel, but I certainly could be wrong. If there is, and if a religious institution decides it can’t avoid violating my constitutional rights, then that institution should be prosecuted. That’s how this works (though I’m no Constitutional scholar).

Policies regarding “family housing,” employee benefits, use of chapels for marriages — all will come under fire. And then we’ll start getting to questions of readings, campus programs, and curriculum, where familiar First Amendment rights will clash with the new Fourteenth Amendment right to “define and express [one’s] identity.” For religious colleges stripped of their nonprofit status, many — if not most — will be compelled to close their doors. (It’s safe to say that plenty of progressives would regard this development as a bonus).

I am agnostic on whether any particular college stays open or is closed. I don’t think it’s likely that the federal government will suddenly become closely involved in colleges’ readings or curricula–is that something that happens now?

More broadly, the Chronicle of Higher Education reports that gay-rights advocates believe the decision will “help them move on to other issues, such as access to higher education and mental-health concerns for young LGBTQ students of color and transgender students of color.” Shane Windmeyer, executive director of Campus Pride, said, “I’m hopeful we can now say we won one game; now the next game is looking at trans rights, how we treat queer people of color, especially first-generation LGBTQ students of color.”

Mental health care for students?!? The horror!

LBGT crusaders are also pushing for big changes in K–12 public schooling. Education Week’s legal-affairs reporter noted that the decisions “holds various implications for the nation’s schools, including in the areas of employee benefits, parental rights of access, and the effect on school atmosphere for gay youths.” I can’t say with certainty what’s coming. But here are four things to watch for. Educators have long celebrated “diversity.” Now they can expect heightened pressure to do more, and to ensure that nothing stymies a student’s “identity.” When a tiny handful of social crusaders complain that this play feels too stereotypically masculine or that those stories don’t include enough LGBT students, they’re going to pull Obergefell out of their pocket. Things will prove particularly contentious in history, where a dearth of gay marriages and nontraditional families will invite creative efforts to “balance” things out.

This is some really bogeyman stuff. I guess it’s bad if we have a curriculum that represents the diversity of our students? I fail to see how a ruling that my love is the same as Rick’s will have ripple effects in terms of causing schools to make dramatic curriculum changes in favor of more gay inclusion. That trend is probably already happening in liberal places. But again, even if that did happen, it’s almost certainly a good thing, especially for gay kids who are more likely to be bullied and commit suicide. Still, I’d bet that in the vast majority of non-ultra-liberal places, nothing like this will come to pass any time soon.

School leaders have judged that American flag T-shirts are unacceptably provocative when worn on Cinco de Mayo. Clothing and artifacts perceived as hostile to another’s “defined and expressed” identity, such as badges of religiosity, may well come under the closest of scrutiny. After all, the Court has long held that freedom of speech and religion may be circumscribed in educational settings. Now, protestations on behalf of free expression and free speech can be answered with Fourteenth Amendment claims.

If he’s talking about a student wearing something that displays a cross, I can’t see that coming under any more fire than it would have pre-Obergefell. If he’s talking about a shirt that says “homosexuality is not okay,” then yes, that shouldn’t be worn in a school (just as we would not allow a student to wear a shirt that says “women belong in the kitchen,” “men are rapists” or “white people are racists”). Getting those things out of schools will absolutely make schools a better place for children.

Expect demands for schools to amp up their efforts to feature “nontraditional” families in all kinds of contexts. Schools may be scrutinized for the mixture of families that wind up in posters, brochures, student art displays, instructional materials, and the rest. Failure to include a satisfactory percentage of gay parents (or other nontraditional family groupings) may be judged evidence of a hostile environment.

The first sentence is probably true, though a continuation of existing trends (have you seen advertising recently? Big corporations were ALL OVER this ruling, very clearly showing the business community thinks this decision was the right one [perhaps for their bottom line, but whatever]). The rest of this is absurd. Companies naturally want to include images of diversity on their products because, you know, we’re a really diverse country (and it will probably also result in better sales). There will not be a gay family gestapo that goes around counting posters with straight vs. gay couples in them.

And casual language will have to change. Teachers may instinctively ask a volunteer father about his wife or mention mothers and fathers; when they do, it won’t be long until a sensitive parent decides that this kind of “heteronormativity” is an unconstitutional violation of their identity. Pity the poor assistant principal who knows two parents are attending a meeting and mistakenly asks the woman sitting in the office if her “husband” is running late — rather than asking about her “spouse.” In the wrong circumstances, that could be a career-ender. Minimizing such mistakes means schools will soon be at pains to replace the terminology of “moms and dads” with that of genderless dyads.

Yes, language will slowly change, as people stop assuming things about other people. That’s a good thing, obviously. Speaking as someone who gets asked about my wife all the time (no, really, and I think I’m about a 12 on the Kinsey scale), I can tell you it doesn’t bother me in the slightest when it happens. I simply respond with “My husband does XYZ,” and the person realizes that I’m, in fact, married to man. And it’s no big deal, because I’m an adult and people make assumptions. No one’s going to get fired because they accidentally use the H word rather than the W word. Now if they respond with “Oh, you’re married to a man? That’s disgusting, you sinful pervert,” and if I have authority over their job, then yes, it might be a problem for them. But otherwise, come on. This is again totally paranoid and simply will not happen in any reasonable number of cases (and certainly not more than the number of gay people who are discriminated against in employment every day in this country, because that’s perfectly legal in the majority of states).

America’s principals, superintendents, and school boards generally don’t have a lot of stomach for waging these fights. Even those who hate being bullied don’t want the exhausting slog or public criticism. Far more likely is that they’ll pack it in, lending Justice Kennedy’s rhetorical flourishes a practical import even he may not have imagined.

Translation: “Bigoted people will realize that being bigoted and suffering the consequences probably isn’t worth it, so they’ll be less bigoted or just internalize their bigotry.” Another positive outcome! And, actually, I would be almost certain that support for this decision is higher among educators than the general public, as I think the vast majority of educators do not hold bigoted views.

The long and the short of it is that there’s really no “there” there with any of this stuff. Most of it simply will not happen, and the stuff that will happen will make our schools better for kids. And more to the point, what the consequences are for schools are mostly irrelevant to the merits of the case. And on that, we are moving toward consensus–that my marriage is the same as Rick’s. Hopefully the readers at NRO will soon join the 60% of us who already know that to be true.