The more people know about Common Core, the less they know about Common Core

September 8, 2015September 8, 2015 Morgan Polikoff California, Common Core, polling, public opinion

Today marks the release of the second half of the PACE/USC Rossier poll on education, our annual barometer on all things education policy in California [1]. This half focuses on two issues near and dear to my heart: Common Core and testing. Over the coming days and weeks I’ll be parsing out some interesting tidbits I’ve uncovered in analyzing results from this year’s poll.

The first finding worth mentioning has to do with Common Core support and knowledge. We’ve all read arguments like “The more people know about Common Core, the less they like it”. For instance, we see that claim from NPR, Republican legislators, and hackish tea party faux-news sites. This claim is generally based on the finding from several polls that people who say they know more about the standards are less likely to support it (or more generally, the trend that reported knowledge has increased over time, as has opposition). It turns out, however, that this may not be as true as you think.

To test knowledge of Common Core, we first asked people to tell us how much they know about the Common Core (a lot, some, a little, nothing at all). Then, we asked them a series of factual and opinion questions about the standards, to test whether they really did know as much as they said they did. The results were quite illuminating.

It turns out that people who said they knew a lot about Common Core were actually the most likely group to report misconceptions about the standards, and the group that had the highest level of net misconceptions (misconceptions – correct conceptions). For instance, 51.5% of people who said they knew “a lot” about Common Core, incorrectly said it was false that Common Core included only math and ELA standards. In contrast, just 31.7% of this group correctly answered this statement (for a net misconception index of -20). For people who only reported knowing a little about the standards, their net misconceptions were just -11 (33% misconception, 22% correct conception).

Another area on which Common Core-“knowledgable” people were more likely to be incorrect was in agreeing that Common Core required more testing than previous state standards. 57% of this group wrongly said this was true, while just 31% correctly said it was false (net misconceptions -26). All groups had net misconceptions on this item, but the margin was -19 for the “some” knowledge group, -16 for the “a little” group, and -11 for the “none” group.

In terms of raw proportions of misinformed individuals, the “a lot” of knowledge group is also the most misinformed group about the Obama administration’s role in creating the standards and the federal government’s role in requiring adoption.

In short, yes, individuals who say they know more about the standards are less likely to support the standards. But, as it turns out, that’s not because they actually know more (they don’t). Rather, it’s likely because they “know” things that are false and that are almost certainly driving their opposition.

So the next time you see someone claiming that “the more people know about Common Core, the less they like it,” feel free to correct them.

[1] Part 1 of this year’s poll was already released–it focused on Local Control Funding and overall attitudes toward education in California. You can read more about it here.

Common Core goes postmodern

August 27, 2015August 30, 2015 Morgan Polikoff Common Core, My research, Twitter

A quick post today.

Mike Petrilli tweeted about the new IES standards center (of which I am a part) at Jay Greene and Rick Hess, asking them if they might be convinced of CCSS effectiveness by the results of such a study. To be clear, the study design is the same as several previously published analyses of the impact of NCLB, which are published in top journals and are widely cited. We are simply using CITS designs to look at the causal impact of CCSS adoption and then exploring the possible mediating factor of state implementation.

Jay responded “No. Low N and choosing and implementing CC are endogenous.”

Rick agreed: “Nah, the methodology on the link isn’t compelling- which fuels my skepticism. As Jay said: low n, endogeneity. Ugh.”

I’m fine with the attitudes expressed here, so long as they are taken to their logical conclusion, which is that we cannot ever know the impact of Common Core adoption or implementation (in which case why are we still talking about it?). I don’t see how, if the best-designed empirical research can’t be trusted, that we can ever hope to know whether Common Core has had any impacts at all. So if Jay and Rick believe that, by all means.

I suspect, however, that Jay and Rick don’t believe that. For starters, they’ve routinely amplified work that has at least as serious methodological problems as our yet-to-be-conducted work. In that case, however, the findings (standards don’t matter much) happened to agree with their priors.

Furthermore, both have written repeatedly about the negative impacts of Common Core. For instance, Common Core implementation causes opt out. Common core implementation is causing a retreat on standards and accountability. Common Core implementation is causing restricted options for parents. Common Core implementation is causing the crumbling of teacher evaluation reform. [1] How can we know any of these things are caused by Common Core if even the best-designed causal research can’t be trusted?

The answer is we can’t. So Rick and Jay (and others who have made up their minds that a policy doesn’t work before it has even been evaluated) should take a step back, let research run its course, and then decide if their snap judgments were right. Or, they should conclude that no research on this topic can produce credible causal estimates, in which case they should stop talking about it. I’ll end with a response from Matt Barnum, which I think says everything I just said, but in thousands fewer characters:

“So are people (finally) acknowledging that their position on CC is non-falsifiable?”

Apparently.

[1] Note: I believe at least some of these claims may be true. But that’s not hypocritical, because I’m not pretending to believe there is no truth with regard to the impact of Common Core.

On Common Core, can two polls this different both be right?

August 25, 2015August 27, 2015 Morgan Polikoff Common Core, polling, public opinion

It’s everyone’s favorite time of year! No, not Christmas (though this lapsed Jew increasingly finds the Christmas season enchanting). It’s Education Poll Season!

A few weeks ago we had Education Next’s annual poll. Yesterday was Phi Delta Kappan/Gallup. And over the next couple weeks there will be results from the less heralded but no-less-awesome poll put out by USC Rossier and Policy Analysis for California Education [1]. It’s great that all of these polls come out at once because:

It’s so easy to directly compare the results across the polls (at least when they ask similar enough questions).
It’s so easy to spot hilariously (and presumably, maliciously) bad poll-related chicanery.

In today’s analysis, I’m going to discuss results from these and other polls pertaining to public support for the Common Core standards. I’ve done a little of this in the past, but I think there are important lessons to be learned from the newest poll results.

Finding 1. Support for the Common Core is probably decreasing. Education Next asked about Common Core in the same way in consecutive years. Last year they found a 54/26 margin in favor; this year it was 49/35. PDK asked about Common Core last year and saw 60/33 opposition; this year it was 54/24. In both cases the opposition margin has increased, though not by much in PDK. The PACE/USC Rossier poll will add to this by tracking approval using the same questions we have used in previous years.

Finding 2. Voters still don’t know much about Common Core. In PDK, 39% of voters reported having heard just a little or not at all about Common Core (I’m also counting “don’t know” here, which seems to me to have a very similar meaning to “not at all”). In Education Next, 58% of respondents did not know whether Common Core was being implemented in their district, an even more direct test of knowledge. While neither of the polls this year also asked respondents factual questions about the standards to gauge misconceptions, I’m quite confident they’re still high given what polls found last year. The PACE/USC Rossier Poll will add to this by testing the prevalence of a variety of misconceptions about the standards.

Finding 3. Folks continue to like almost everything about Common Core other than the name. For instance, Education Next finds that voters overwhelmingly support using the same standardized test in each state (61/22), which aligns with the federal government’s efforts in supporting the consortia to build new assessments. Voters also are quite favorable toward math and reading standards that are the same across states (54/30). Finally, PDK finds that voters are much more likely to say their state’s academic standards are too low (39%) than too high (6%), which supports the decisions states are making with respect to new Common Core cut scores.

Finding 4. It seems likely that the wording of Common Core questions matters for the support level reported, but we don’t have enough good evidence to say for sure. Education Next was criticized last year for the wording of their Common Core question, which was

As you may know, in the last few years states have been deciding whether or not to use the Common Core, which are standards for reading and math that are the same across the states. In the states that have these standards, they will be used to hold public schools accountable for their performance. Do you support or oppose the use of the Common Core standards in your state?

The question was criticized for invoking accountability, which most folks are in favor of. Because the folks at Education Next are savvy and responsive to criticism, they tested the effect of invoking accountability, asking the same question but without the “In the states …” question and found support fell to 40/37. Though PDK was criticized last year for their question, they appear to have stuck with the same questionable item. The PACE/Rossier poll directly tests both the 2014 PDK and Education Next questions, plus two other support/opposition questions, in order to clearly identify the impact of question wording on support.

Finding 5. As compared to every other reasonably scientific poll I’ve seen that asks about Common Core, PDK produces the most extreme negative results. Here are all the polls I have found from the last two years and their support/opposition numbers (sorted in order from most to least favorable):

Public Policy Institute of California 2014 (CA): 69/22 (+47)

Education Next 2014: 54/26 (+28)

NBC News 2014: 59/31 (+28)

Public Policy Institute of California 2015 (CA): 47/31 (+16)

Education Next 2015: 49/35 (+14)

Friedman Foundation 2015: 40/39 (+1)

University of Connecticut 2014: 38/44 (-6)

PACE/USC Rossier 2014 (CA): 38/44 or 32/41, depending on question (-6, -9)

Louisiana State University 2015 (LA): 39/51 (-12)

Monmouth University 2015 (NJ): 19/37 (-18)

Times Union/Siena College 2014 (NY): 23/46 (-23)

Fairleigh Dickinson 2015: 17/40 (-23)

PDK 2014: 33/60 (-27)

PDK 2015: 24/54 (-30)

Only one other national poll in the past two years comes within 20 points (!) of the negative margin found by PDK – anything else that’s that negative comes out of a state that’s had particularly chaotic or controversial implementation. Now, it could be that PDK’s results are right and everyone else’s are wrong, but when you stack them up with the others it sure looks like there’s something strange in those findings. It might be the question wording (again, since PACE/USC Rossier is using their exact wording, we can test this), but my guess is it’s something about the sample or the questions they ask before this one. This result just seems too far outside the mainstream to be believed, in my opinion.

Finding 6. The usual suspects of course pounced on the PDK poll to score points. Randi Weingarten used the results on Twitter to make some point about toxic testing (the use of a buzzphrase like that is a pretty clear sign that your analysis isn’t so serious). At the opposite end of the spectrum (which, increasingly, is the same end of the spectrum), Neal McCluskey said the results showed Common Core was getting clobbered (though, to his credit, he questioned the strange item wording and also wrote about Education Next last week, albeit in a somewhat slanted way).

So there we have it. Common Core support is down. But if you don’t call it Common Core and you ask people what they want, they want something very Common Core-like. They still haven’t heard much about Common Core, and most of what they think they know is wrong. And they almost certainly aren’t as opposed as PDK finds them to be. That’s the state of play on Common Core polling as of now. Our poll, coming out in a couple weeks, will address some of the major gaps described above and contribute to a deeper understanding of the support for the standards.

[1] Disclosure: Along with David Plank and Julie Marsh, I’m one of the three main architects of this poll.

Testing tradeoffs

June 23, 2015June 23, 2015 Morgan Polikoff Common Core, testing

Life is a series of tradeoffs. Perhaps nowhere in education is that clearer than in assessment policy.

What brings this to mind are Motoko Rich’s and Catherine Gewertz’s recent articles about scoring Common Core tests. I think both of these articles are good, and they both illustrate some of the challenges of doing what we’re trying to do at scale. But it’s also clear that some anti-test folks are using these very complicated issues as fodder for their agendas, and that’s disappointing (if totally expected). Here are some of the key quotes from Motoko’s article, and the tradeoffs they illustrate.

On Friday, in an unobtrusive office park northeast of downtown here, about 100 temporary employees of the testing giant Pearson worked in diligent silence scoring thousands of short essays written by third- and fifth-grade students from across the country. There was a onetime wedding planner, a retired medical technologist and a former Pearson saleswoman with a master’s degree in marital counseling. To get the job, like other scorers nationwide, they needed a four-year college degree with relevant coursework, but no teaching experience. They earned $12 to $14 an hour, with the possibility of small bonuses if they hit daily quality and volume targets.

Tradeoff: We think we want teachers to be involved in the scoring of these tests (presumably because we believe there is some special expertise that teachers possess) [1]. But teachers cost more than $12 to $14 an hour, and we’re in an era where every dollar spent on testing is endlessly scrutinized, so we have to instead use some educated people who are not teachers.

At times, the scoring process can evoke the way a restaurant chain monitors the work of its employees and the quality of its products. “From the standpoint of comparing us to a Starbucks or McDonald’s, where you go into those places you know exactly what you’re going to get,” said Bob Sanders, vice president of content and scoring management at Pearson North America, when asked whether such an analogy was apt.

Tradeoff: We have a huge system in this country, and we want results that are comparable across schools. But comparability in a large system requires some degree of standardization, and standardization at that level of scale requires processes that look, well, standardized and corporate.

For exams like the Advanced Placement tests given by the College Board, scorers must be current college professors or high school teachers who have at least three years of experience teaching the subject they are scoring.

Tradeoff: We want to test everyone. This means the volume for scoring is tremendously larger than the AP exam (about 12 million test takers vs. about 1 million), which again means we may not be able to find enough teachers to do the work.

“You’re asking people still, even with the best of rubrics and evidence and training, to make judgments about complex forms of cognition,” Mr. Pellegrino said. “The more we go towards the kinds of interesting thinking and problems and situations that tend to be more about open-ended answers, the harder it is to get objective agreement in scoring.”

Tradeoff: We want more challenging, open-ended, complex tasks. But scoring those tasks at scale is harder to do reliably.

There are of course other big tradeoffs that aren’t highlighted in these articles. For instance:

The tradeoff between test cost and transparency–building items is very expensive, so releasing items and having to create new ones every year would add to test costs while enhancing transparency.
The tradeoff between testing time and the nature of the task–multiple choice items are quicker to complete, but they may not fully tap the kinds of skills we want to measure.
The tradeoff between testing time and the comprehensiveness of the assessment–shorter tests can probably give us a reasonable estimate of overall math and reading proficiency, but they will not give us the fine-grained, actionable data we might want to make instructional responses (and they might contribute to “narrowing the curriculum” if they repeatedly sample the same content).
The tradeoffs of open-response items with fast scoring–multiple choice items, especially on computers, can be scored virtually instantaneously, whereas open-response items take time to score. So faster feedback may butt up against our desire for better items.
The tradeoffs associated with testing on computers–e.g., using money to purchase computers vs. other things, advantages of adaptive testing vs. needing to teach kids how to take tests on computers.

I will also note that this kind of reporting could, in my mind, be strengthened with more empirical evidence. For instance,

“Even as teachers, we’re still learning what the Common Core state standards are asking,” Ms. Siemens said. “So to take somebody who is not in the field and ask them to assess student progress or success seems a little iffy.”

Are teachers better scorers than non-teachers, or not? That’s an empirical question. I would be reasonably confident that Pearson has in place a good process for determining who are the best scorers from the standpoint of reliability. Some of the best scorers are teachers, and some are not.

Some teachers question whether scorers can grade fairly without knowing whether a student has struggled with learning difficulties or speaks English as a second language.

Is there evidence that the test scoring is biased against students with disabilities or ELLs, or not? That’s also an empirical question. Again I would guess that Pearson has in place a process to weed out construct-irrelevant variance to the maximum extent possible.

Overall, I think it’s great that writers like Motoko and Catherine are tackling these challenging issues. But I hope it’s not lost on readers that, like everything in life, testing requires tradeoffs that are not easily navigated.

[1] It’s not obvious to me this is true, though it may well be. Regardless, it would likely be a good professional development opportunity to score items.

A (quick, direct, 2000 word) response to Tucker on testing

June 11, 2015June 16, 2015 Morgan Polikoff Accountability, Common Core, NAEP, NCLB, testing

There’s been a bit of a kerfuffle recently in the edu-Twittersphere, since Marc Tucker suggested that civil rights leaders ought to reconsider their support for annual testing [1]. Kati Haycock and Jonah Edelman wrote impassioned responses, which Tucker has just dismissed as not responding to his substantive arguments. He ends with this paragraph:

The facts ought to count for something. What both of these critiques come down to is an assertion that I don’t have any business urging established leaders of the civil rights community to reconsider the issue, that I simply don’t understand the obvious—that annual accountability testing is essential to justice for poor and minority students, that anyone who thinks otherwise must be in the pocket of the teachers unions. Well, it is not obvious. Indeed, all the evidence says it is not true. And anyone who knows me knows that I am in no one’s pocket. I know the leaders of the civil rights community to be people of great integrity. They aren’t in anyone’s pocket, either. I think they want what is best for the people they represent. And I do not think that is annual testing.

I think Mr. Tucker greatly overstates the evidence in his initial post, so I’m going to do my best to give a very brief and direct response to the substantive arguments he makes there. I do this not to defend Haycock and Edelman (whom I do not really know), but to defend the policy, which I believe is unfairly maligned in Tucker’s posts.

Let me start by saying that I am generally in favor of annual testing, though I am probably not as fervid in that support as some others in the “reform” camp. I do not believe that annual accountability testing is essential to justice for poor and minority students, but I do think high-quality tests at reasonable intervals would almost certainly be beneficial to them.

Okay, here goes.

1) In his initial post, Marc Tucker says,

First of all, the data show that, although the performance of poor and minority students improved after passage of the No Child Left Behind Act, it was actually improving at a faster rate before the passage of the No Child Left Behind Act.

That link is to a NAEP report that indeed provides descriptive evidence supporting Tucker’s point. However, there are at least two peer-reviewed articles using NAEP data that show positive causal impacts of NCLB using high-quality quasi-experimental design, one on fourth grade math achievement only and the other on fourth and eighth grade math achievement and (suggestively) fourth grade reading. The latter is, to my eye, the most rigorous analysis that yet exists on this topic. There is a third article that uses cross-state NAEP data and does not find an impact, but again the most recent analysis by Wong seems to me to be the most methodologically sophisticated of the lot and, therefore, the most trustworthy. I think if Tucker wants to talk NAEP data, he has to provide evidence of this quality that supports his position of “no effect” (or even “harm,” as he appears to be suggesting). Is there a quality analysis using a strong design that shows a negative impact on the slope of achievement gains caused by NCLB? I do not know of one.

I should also note that there are beaucoup within-state studies of the impacts of accountability policies that use regression discontinuity designs and find causal impacts. For instance: in North Carolina, in Florida, and in Wisconsin. In short: I don’t see any way to read the causal literature on school accountability and conclude that it has negative impacts on student achievement. I don’t even see any way to conclude it has neutral impacts, given the large number of studies finding positive impacts relative to those with strong designs that find no impacts.

2) Next, Tucker says:

Over the 15-year history of the No Child Left Behind Act, there is no data to show that it contributed to improved student performance for poor and minority students at the high school level, which is where it counts.

Here I think Mark is moving the goalposts a bit. Is high school performance of poor and minority students the target? Then I guess we may as well throw out all the above-cited studies. I know of no causal studies that directly investigate the impact on this particular outcome, so I think the best he’s got is the NAEP trends. And sure, trends in high school performance are relatively flat.

I’m not one to engage in misNAEPery, however, so I wouldn’t make too much of this. Nor would I make too much of the fact that high school graduation rates have increased for all groups (meaning tons more low-performing students who in days gone by would have dropped out are still around to take the NAEP in 12th grade, among other things). But I would make quite a bit of the fact that the above-cited causal studies obviously also apply to historically underserved groups (that is, while they rarely directly test the impact of accountability on achievement gaps, they very often test the impacts for different groups and find that all groups see the positive effects). And I would also note some evidence from North Carolina of direct narrowing effects on black-white gaps.

3) Next, we have:

Many nations that have no annual accountability testing requirements have higher average performance for poor and minority students and smaller gaps between their performance and the performance of majority students than we do here in the United States. How can annual testing be a civil right if that is so?

There’s not much to say about this. It’s not based on any study I know of, certainly none that would suggest a causal impact one way or the other. But he’s right that we’re relatively alone in our use of annual testing, and therefore that many higher-achieving nations don’t have annual testing. They also don’t have many other policies that we have, so I’m not sure what’s to be learned from this observation.

4) Now he moves on to claim:

It is not just that annual accountability testing with separate scores for poor and minority students does not help those students. The reality is that it actually hurts them. All that testing forces schools to buy cheap tests, because they have to administer so many of them. Cheap tests measure low-level basic skills, not the kind of high-level, complex skills most employers are looking for these days. Though students in wealthy communities are forced to take these tests, no one in those communities pays much attention to them. They expect much more from their students. It is the schools serving poor and minority students that feed the students an endless diet of drill and practice keyed to these low-level tests. The teachers are feeding these kids a dumbed down curriculum to match the dumbed down tests, a dumbed down curriculum the kids in the wealthier communities do not get.

This paragraph doesn’t have links, probably because it’s not well supported by the existing evidence. Certainly you hear this argument all the time, and I believe it may well be true that schools serving poor kids have worse curricula or more perverse responses to tests (even some of my own work suggests different kinds of instructional responses in different kinds of schools). But even if we grant that this impact is real, the literature on achievement effects certainly does not suggest harm. And the fact that graduation rates are skyrocketing certainly does not suggest harm. If he’s going to claim harm, he has to provide clear, compelling evidence of harm. This ain’t it. And finally here, a small point. I hate when people say schools are “forced” to do anything. States, districts, and schools were not forced to buy bad tests before. They have priorities, and they have prioritized cheap and fast. That’s a choice, not a matter of force.

5) Next, Tucker claims:

Second, the teachers in the schools serving mainly poor and minority kids have figured out that, from an accountability standpoint, it does them no good to focus on the kids who are likely to pass the tests, because the school will get no credit for it. At the same time, it does them no good to focus on the kids who are not likely to pass no matter what the teacher does, because the school will get no credit for that either. As a result, the faculty has a big incentive to focus mainly on the kids who are just below the pass point, leaving the others to twist in the wind.

I am certainly familiar with the literature cited here, and I don’t dispute any of it. Quite the contrary, I acknowledge the conclusion that the students who are targeted by the accountability system see the greatest gains. This has been shown in many well-designed studies, such as here, here, here, and here. But this an argument about accountability policy design, not about annual testing. It simply speaks to the need for better accountability policies. For instance, suppose we thought the “bubble kids” problem was a bad one that needed solving. We could solve it tomorrow–simply create a system where all that matters is growth. Voila, no bubble kids! Of course there would be tradeoffs to that decision, so probably some mixture is better.

6) Then Tucker moves on to discuss the teaching force:

Not only is it true that annual accountability testing does not improve the performance of poor and minority students, as I just explained, but it is also true that annual accountability testing is making a major contribution to the destruction of the quality of our teaching force.

There’s no evidence for this. I know of not a single study that suggests that there is even a descriptive decrease in the quality of our teaching force in recent years. Certainly not one with a causal design of any kind that implicates annual accountability testing. And there is recent evidence that suggests improvements in the quality of the workforce, at least in certain areas such as New York and Washington.

7) Next, he takes on the distribution of teacher quality:

One of the most important features of these accountability systems is that they operate in such a way as to make teachers of poor and minority students most vulnerable. And the result of that is that more and more capable teachers are much less likely to teach in schools serving poor and minority students.

It is absolutely true that the lowest quality teachers are disproportionately likely to serve the most disadvantaged students. But I know of not a single piece of evidence that this is caused by (or even made worse by) annual testing and accountability policies. My hunch is that this has always been true, but that’s just a hunch. If Tucker has evidence, he should provide it.

8) The final point is one that hits close to home:

Applications to our schools of education are plummeting and deans of education are reporting that one of the reasons is that high school graduates who have alternatives are not selecting teaching because it looks like a battleground, a battleground created by the heavy-handed accountability systems promoted by the U.S. Department of Education and sustained by annual accountability testing.

As someone employed at a school of education, I can say the first clause here is completely true. And we’re quite worried about it. But again, I know of not a single piece of even descriptive evidence that suggests this is due to annual accountability testing. Annual accountability testing has been around for well over a decade. Why would the impact be happening right now?

I think these are the main arguments in Tucker’s piece, and I have provided evidence or argumentation here that suggests that not one of them is supported by the best academic research that exists today. Perhaps the strongest argument of the eight is the second one, but again I know of no quality research that attributes our relative stagnation on 12th grade NAEP to annual accountability testing. That does not mean Tucker is wrong. But it does mean that he is the one who should bear the burden of providing evidence to support his positions, not Haycock and Edelman. I don’t believe he can produce such evidence, because I don’t believe it exists.

[1] I think it’s almost universally a bad idea to tell civil rights leaders what to do.

Monday Morning Alignment Critiques

May 20, 2015May 20, 2015 Morgan Polikoff Common Core, methods, My research, Textbooks

As I’ve written about already, one of my main research interests these days is the quality and alignment of textbooks to standards. My recent work on this issue is among the first peer-reviewed studies (if not the first) to employ a widely-used alignment technique to rate the alignment of textbooks with standards. While I think the approach I use is great (or else I wouldn’t do it), it’s certainly not perfect. There are many ways to determine alignment; all of them are flawed.

Of course, there are others in this space as well. The two biggest players, by far, are Bill Schmidt and EdReports [1]. Both are well funded and have released ratings of textbook alignment. EdReports’ ratings have recently come under fire from many directions, including both publishers and, now, the National Council of Teachers of Mathematics. NCTM released a pretty scathing open letter, which was covered by Liana Heitin over at EdWeek, accusing EdReports of errors and methodological flaws.

I have three general comments about this response by NCTM.

The first is that there is no one right way to do an alignment analysis. While the EdReports “gateway” approach might not have been the method I’d have chosen, it seems to me to be a perfectly reasonable way to constrain the (very arduous) task of reading and rating a huge pile of textbooks. Perhaps they’d have gotten somewhat different results with a different method; who knows? But their results are generally in line with mine and Bill’s, so I doubt highly that their overall finding of mediocre alignment is driven by the method.

The second is that we need to always consider the other options when we’re evaluating criticisms like this. What kind of alignment information is out there currently? Basically you’ve got my piddly study of 7 books, Bill’s larger database, and EdReports [2]. Otherwise you have to either trust what the publisher says or come up with your own ratings. In that context, it’s not clear to me that EdReports is any worse than what else is available. And EdReports is almost certainly better than districts doing their own home-cooked analyses. The more information the better, I say.

The third point, and by far the most important, is that this kind of criticism is really not helpful in a time when schools and districts are desperate for quality information about curriculum materials. Schools and districts have been making decisions about these materials for years with virtually no information. Now we finally have some information (imperfect though it may be) and we’re nit-picking the methodological details? This completely misses the forest for the trees. If NCTM wants to be a leader here, they should be out in front on this issue offering their own evaluations to schools and districts. Otherwise it’s left to folks like EdReports or me to do what we can to fill this yawning gap by providing information that was needed years ago. Monday morning alignment critiques aren’t helpful. Actually getting in the game and giving educators information–that’d be a useful contribution.

[1] For the record, I participated in the webinar where EdReports’ results were released, but I have not been paid by them and don’t currently do any work with them.

[2] There’s probably other stuff out there I don’t know about.

The Impact of Common Core

May 20, 2015May 20, 2015 Morgan Polikoff Common Core, methods

It’s pretty much always a good idea to read Matt Di Carlo over at the Shankerblog. His posts are always thoughtful and middle-of-the-road, a refreshing antidote to usual advocacy blather. His recent post about the purpose and potential impact of the Common Core is no exception.

Here’s where I agree with Matt:

That standards alone are probably unlikely to have large impacts on student achievement.
That advocates of the standards do a disservice when they project such claims.
That making definitive statements about the impact of Common Core on student outcomes will be hard (and, I would say, causal research is almost certainly not worth doing at this point in the implementation process).

Here’s where I don’t agree with Matt. I don’t agree that standards are not meant to boost achievement. I believe that they most certainly are meant to boost achievement. Standards are intended to improve the likelihood that students will have access to a quality curriculum and, through that, learn more and better stuff. It’s a pretty straightforward theory of action, actually. Something like:

Standards (+ other policies) –> Improved, aligned instruction –> Student achievement

And I think we have pretty decent evidence on this theory of action. For instance, my work and the work of others makes it reasonably clear that standards can affect what and how teachers teach (albeit imperfectly). There’s a great deal of research on the very commonsense notion that what and how teachers teach affects what students learn (my study from last year notwithstanding). We don’t have studies that I’m aware of that draw the causal arrow directly from standards to achievement, but given the evidence on the indirect paths I believe this may well be due to the weaknesses of the data and designs more than the lack of an effect.

That said, I fully echo Matt’s concerns about overstating the case for quality standards, and I hope advocates take this warning to heart. What we need is not over-hyped claims and shoddy analyses designed to show positive impacts [1]. What we need at this point is thoughtful studies of implementation and cautious, tentative investigations of early effects. These are just the kind of studies that we are seeking in the “special issue” of AERA Open that I’m curating. My hope is that this issue will provide some of the first quality evidence about implementation and effects, in order to inform course corrections and begin building the evidence base about this reform.

[1] Edited to add: We also don’t need garbage studies by Common Core opponents using equally shoddy methods to conclude the standards aren’t working.

Gathering textbook adoption data (or: shouldn’t this be easier?)

May 15, 2015May 15, 2015 Morgan Polikoff Common Core, My research, Textbooks

Suppose you set out to study the impact of textbooks on teacher practice and student learning. The only way to begin such a study would be to pull together data on which textbooks were used in which schools.

You’d think this would be easy to do. After all, we live in a data-driven culture, and you can find just about any bit of information about your local school via a few seconds on Google (or the state department of ed website).

Well, you’d be wrong.

As I mentioned last post, I have a couple grants to study textbook adoptions. These grants are concentrated in the five largest US states by population (CA, TX, NY, FL, IL). Of these, only Florida keeps track of textbook adoptions at the district level. The other four states, comprising roughly 4,000 school districts, do not keep track at all [1].

This means that if you want to know which textbooks are being used in these 4,000 districts, you have to ask people. As far as I know, there’s no other way to do it. So that’s what we’ve done. We created a beautiful website where district personnel can go to report their textbooks. Then we gathered contact information for district personnel in all these districts and sent them a series of emails inviting them to participate and offering a chance at a $500 incentive to do so.

Suffice it to say the response rate was not what we hoped, even after several rounds. So we’re moving on to round two. We’re sending state-specific open-records request to every non-responsive school district in these states, pointing them to the website. And a couple of weeks after these requests arrive, a horde of USC undergraduate researchers will begin sending personalized emails and making phone calls to districts. Essentially, we hope to hammer all 3,000-ish non-California districts in our sample into submission.

I’m telling you all this not because it’s especially interesting (I probably should have picked a better topic for my early posts on the blog) but because it shows the absolutely absurd lengths one needs to go to in order to gather what should be a freely available, extremely basic piece of information about schools.

Of course my hope is that my projects are successful and that I can gather this information on almost all districts. But if I can’t, I at least hope I can convince some people that this is a piece of information we should be tracking. It costs essentially nothing to do, it does not endanger privacy in any way, and it’s very useful from both a research and an equity point of view.

[1] California actually does keep track to a certain extent; I’ll talk about the Golden State in a future post.

A textbook example of education research

May 15, 2015May 16, 2015 Morgan Polikoff Common Core, My research, Textbooks

One of my main research interests these days is the adoption and use of textbooks and other curriculum materials. Why would I possibly care about textbooks? Well, for starters, they’re incredibly cheap relative to other educational interventions, and they can have remarkably large causal effects (PDF) on student achievement. They also are just a skosh less politically treacherous than, say, radically altering teacher tenure policies.

This work began with a grant from an anonymous foundation to analyze the alignment of textbooks to the Common Core math standards. That investigation found overall weak alignment, with some common areas of misalignment across books (notably, they were excessively procedural relative to what’s in the standards) [1].

While that work was informative, it didn’t tell me much about who was using which textbooks, how, and to what effect. As a new set of standards rolls out, I’m guessing that curriculum materials may matter more than ever. So I set out to investigate these issues in a few different studies. The basic gist of this set of studies is to understand:

Which textbooks are being adopted in the core academic subjects in light of new standards?
What explains school and district textbook choices (qualitatively and quantitatively)?
How do teachers make use of textbooks in their teaching?
What are the impacts of textbook choices on student outcomes?

This work is funded by the National Science Foundation, the WT Grant Foundation (with co-PI Thad Domina), and by another anonymous foundation (with co-PI Cory Koedel).

In the coming days and months I’ll be talking quite a bit about this work and some of the lessons learned so far. The next post is going to highlight some of the things I’m learning as I’m trying to go through the (seemingly straightforward) task of simply gathering data on what textbooks schools and districts are using these days. Spoiler alert: it ain’t pretty.

[1] That work also identified some ways to make the process of analyzing textbooks (which turns out to be incredibly time- and labor-intensive) much simpler.