Actually, Parents Think Schools Are Doing a Good Job Handling COVID

Democrats are on their heels when it comes to education. There is an emerging narrative that school reopening issues during the COVID-19 pandemic hurt Virginia Governor Terry McAuliffe in his reelection bid and may do further damage to Democrats down the road. The argument goes that blue states—including Virginia—were too cautious in keeping schools closed for in-person learning over the course of the 2020-21 school year, and that this has angered parents who had to deal with the fallout from those closures.

There is no question that school closures during the pandemic were incredibly disruptive to families. The science also suggests that in-person schooling, especially with masking policies that are widely supported, has not had much of an effect on COVID transmission. And of course the number of children who have been hospitalized or died of COVID-19, while not zero, is extremely low by any reasonable standard (lower than a typical annual flu, for instance). So I agree with proponents who say that schools in blue states and urban areas closed down for too long and should basically be open now with close to no exceptions.

But the reality is that this narrative falls apart because parents are simply not dissatisfied with the performance of educational institutions. If anything, parents are more satisfied than I think they should be given what happened last year. Survey after survey finds strong parent support for how schools handled COVID-19 and are handling the ongoing recovery. The most recent nationally-representative data finds that about 80 to 90% of parents are satisfied with how schools are handling education issues during the reopening. Parents are satisfied with how schools are making up for students’ learning loss (82% satisfied), how they are meeting children’s mental health (85%) and social needs (87%), and how they are ensuring their children are on track to graduate (89%).

What’s interesting about parent satisfaction with schools is that, in contrast to almost everything else these days, there are close to zero demographic differences in these attitudes. Republicans can try to make school closures a partisan issue, but this is unlikely to be successful when almost everyone of all races, income levels, and party affiliations simply doesn’t believe it to be true.

Whether it is an accurate characterization or not, Democrats should be on the offense about the performance of educational systems during COVID. Parents think schools did mostly a good job, and they certainly think that things now—when everyone is back for in-person learning—are going well.

There are a few important risks to be aware of, though. The clearest risk is against future school closures. Study after study finds that the quality of online or hybrid learning was far lower than the quality of in-person learning, and that these options were therefore considerably less popular among parents. So protracted or highly inconvenient school closures moving forward could indeed drive public opinion on school performance south. There is no appetite for further school closures, and there is widespread support for reasonable policies to keep in-school COVID risks small.

Another risk is that leaders, both in the Democratic party but also in local schools and districts, get complacent in the face of these results. There is not likely to be much pressure for schools and districts to take aggressive measures to address COVID-related educational effects, but these effects are very real. Recent assessment data shows the bottom falling out of the student achievement distribution, with students falling far behind in elementary reading and mathematics. These are real problems that need real solutions, and pretending things are hunky dory because parents are satisfied is a road to educational ruin.

A third risk is that there is some real dissatisfaction with regard to other educational issues like the debate over how to handle issues of race in public schools and whether and how parents should be involved in school policy decisions around curriculum and other issues. These are probably places where Democrats are more vulnerable, so they shouldn’t assume that parents are supportive of the whole progressive education agenda just because they aren’t very upset about last year’s school closures.

So who are parents mad at, if not the Democrats or school leaders? I think the short answer is probably that parents aren’t a bloc that directs their anger consistently in one direction or another. Rather, I think parents—like many of us—mostly just want to move on and put the last 18 months behind them. The key thing is that their kids are back in school with their friends and teachers, learning reading and math and staying both safe and out of parents’ hair. State and district leaders—whether Democrat or Republican—should do everything they can to keep kids in school throughout this school year and beyond so that parents stay as satisfied as they are.


Letter to the CA State Board of Education

Now closed for signatures

Dear Dr. Kirst,

We write to you as researchers who study the design of school accountability systems and the construction of growth models. We read with interest the memorandum dated June 20, 2018, from Tom Torlakson and the CDE to you and the State Board. While we appreciate the analyses and effort underlying this memo, we have serious concerns about its claims and implications. Specifically, we believe the memo does not offer an accurate perspective on the strengths and limitations of various approaches to measuring schools’ contributions to student achievement. We are concerned that, if the State Board relies on this memo and keeps the current approach to measuring “change,” it will be producing and disseminating inadequate measure that will give California school leaders and educational stakeholders incorrect information about school effectiveness. In this brief response, we outline what we view as the shortcomings of the memo and make specific recommendations for an alternative approach to measuring school effects on student achievement.

We read the memo as making three main arguments against a Residual Gain (RG) model:

  • The memo’s authors are concerned that the proposed RG model does not indicate how much improvement is needed to bring the average student up to grade-level standards, or whether achievement gaps are closing.
  • The memo’s authors are concerned that the RG model allows schools to make positive “Change” from the prior year to current year, even if they make negative growth. The memo’s authors say this is counterintuitive and will confuse educators.
  • The memo’s authors say that a RG model is volatile and should therefore not be used to make decisions, since decisions made in one year might be contradicted by the next year’s growth data.

Here, we respond to each of these concerns.

First, it is true that a residual gain model does not indicate how much improvement is needed to bring the average student up to standards. Models that attempt to indicate this are called “growth-to-proficiency” models. While perhaps appealing at first glance, these models do not measure school effectiveness. The reason is that they conflate the socioeconomic conditions of an area with “school effectiveness,” which is readily apparent when one considers how they work—namely, students in schools in high-poverty areas are much more likely to be below proficient, and thus required to make larger gains, than their peers who attend schools in low-poverty areas. Thus, growth-to-proficiency actually conveys very similar information to the state’s status measure (distance from level 3), which is not desirable for measuring the effectiveness of schools (Barnum, 2017). Building a growth measure around this metric would be largely redundant with the status measure and not in the spirit of measuring effectiveness in the first place.

Rather, what the state should aim for is a growth measure that comes as close as possible to capturing the true causal effect of schools on student achievement. For this goal, the most appropriate measures compare socioeconomically and demographically similar schools, and identify which schools produce students whose test scores improve the most (Barlevy and Neal, 2012; Ehlert et al., 2014, 2016).

Second, it is true that a residual gain model could provide different information from a “change” model that simply subtracts last year’s average score from this year’s. That is a feature of the system, not a bug. Among other reasons for this discrepancy, a “change” model does not adjust for changes in school composition. This is a problem with the “change” model and highlights an advantage of the residual gain model. There are many clear examples of how student growth models can be explained to educators and the general public, such as Castellano and Ho (2013). The state should follow these models.

Third, it is true that residual gain and other growth models can fluctuate somewhat from year to year. However, the year-to-year correlations of such models are positive and of a reasonable magnitude[1], indicating they provide consistent information. As long as high stakes decisions are not made on a single year’s scores, some degree of fluctuation is acceptable. Moreover, there are simple ways to reduce the year-to-year score fluctuations; any number of scholars could assist the state with developing these (examples include relying on moving averages, adjusting statistical significance bands, and even using Bayesian inference).

Based on our understanding of the research literature and of the goals of California’s system, we recommend that the state adopt a growth model that disentangles the student composition of a school from that school’s measured efficacy. There are many ways to do this—Ehlert et al. (2014, 2016) provide an overview of the main ideas. Models that properly account for student circumstances offer the best combination of validity (i.e., the output is more likely to reflect schools’ causal effects on student achievement) and interpretability (i.e., the output can be described in ways that educators can understand). Such models have been used in other states and school districts, including California’s CORE districts[2] and the states of Arkansas, Missouri, Colorado, and New York.[3]

The state’s current “change” model is unacceptable – it profoundly fails the validity test, and therefore it does not accurately represent schools’ contribution to student achievement. Indeed, it is not clear what it represents at all.

Should you have questions about our recommendations, we would be happy to discuss them.


Morgan Polikoff, Associate Professor of Education, USC Rossier School of Education

Cory Koedel, Associate Professor of Economics and Public Policy, University of Missouri-Columbia

Andrew Ho, Professor of Education, Harvard Graduate School of Education

Douglas Harris, Professor of Economics, Tulane University

Dan Goldhaber, Professor, University of Washington

Thomas Kane, Walter H. Gale Professor of Education, Harvard Graduate School of Education

David Blazar, Assistant Professor, University of Maryland College Park

Eric Parsons, Assistant Research Professor of Economics, University of Missouri-Columbia

Martin R. West, Professor of Education, Harvard Graduate School of Education

Chad Aldeman, Principal, Bellwether Education

Richard C. Seder, Specialist, University of Hawaiʻi at Mānoa, Adjunct Assistant Professor, University of Southern California

Cara Jackson, Adjunct Faculty, American University

Aaron Tang, Acting Professor of Law, UC-Davis School of Law

David Rochman, Assessment and Evaluation Specialist, Orange County

Aime Black, Education consultant

Anne Hyslop, Education consultant


[2] The CORE district growth model conditions on student demographics, which we recommend for purposes of fairness and validity; however, a similar model that only conditions on prior achievement would be nearly as good and would be a dramatic improvement over what the state is currently using.

[3] More information about the Arkansas, Missouri, New York, and Colorado models can be found here:;

Researcher recommendations on FERPA legislation

In partnership with the Data Quality Campaign, I have organized a Researcher Day on the Hill next week to talk to hill staffers about data privacy, FERPA, and the importance of educational research. A great group of faculty from across the country, along with state and district policy leaders, is joining me to make the case that educational research needs good data and that these data can be properly safeguarded through policy.

Below is a letter that we are planning to share with staffers on that day. If you are interested in being a signatory, please email me, tweet at me, or comment on this post. Please share widely!

Dear [],

As researchers committed to supporting and improving student learning and protecting student privacy, we applaud the bi-partisan work underway to update the Family Educational Rights and Privacy Act (FERPA). Education research and the data that enables it are incredibly powerful tools that help educators and policymakers understand and personalize learning; make good policy, practice, and funding decisions; and improve academic, life, and work outcomes.

Families, educators, and the public must be able to trust that student data is used ethically and protected. Well-designed FERPA improvements can help build that trust and ensure that schools, districts, and states are able to use data to improve learning and strengthen education without compromising student privacy.

With this balanced approach as our guide, we submit the following recommendations for strengthening the bi-partisan Student Privacy Protection Act (H.R. 3157 – 114th Congress) before the measure is reintroduced for the 115th Congress’s consideration:

  • Enable states and districts to procure the research they need. The Every Student Succeeds Act’s evidence tiers provide new opportunities for states and districts to use data to better understand their students’ needs and improve teaching and learning. FERPA must continue to permit the research and research-practice partnerships that states and districts rely on to generate and act on this evidence. Section 5(c)(6)(C), should be amended to read “the purpose of the study is limited to improving student outcomes.” Without this change, states and districts would be severely limited in the research they can conduct.
  • Invest in state and local research and privacy capacity. States and districts need help to build their educators’ capacities to protect student privacy, including partnering effectively with researchers and other allies with legitimate educational reasons for handling student data. In many instances, new laws and regulations are not required to enhance privacy. Instead, education entities need help with complying with existing privacy laws, which are often complex. FERPA should provide privacy protection focused technical assistance, including through the invaluable Privacy and Technical Assistance Center, to improve stakeholders’ understanding of the law’s requirements and related privacy best practices.
  • Support community data and research efforts. In order to understand whether and how programs beyond school are successful, schools and community-based organizations like tutoring and afterschool programs need to securely share information about the students they serve. Harnessing education data’s power to improve student outcomes, as envisioned by the Every Student Succeeds Act, will require improvements to FERPA that permit schools and their community partners to better collaborate, including sharing data for legitimate educational purposes including conducting joint research.
  • Support evidence-use across the education and workforce pipeline. We recommend adding workforce programs to Section 5(c)(5)(A)(ii) and to the studies exception in Section 5(c)(6)(C), . Just as leaders need to evaluate the efficacy of education programs based on workforce data, the country also needs to better understand the efficacy of workforce programs. FERPA should recognize the inherent connectivity between these areas to better meet student and worker needs.

We welcome the opportunity to speak about these issues and recommendations further.


Morgan Polikoff, Associate Professor, University of Southern California

Stephen Aguilar, Provost’s postdoctoral fellow, University of Southern California

Albert Balatico, K-12 public school teacher, Louisiana

Estela Bensimon, Professor and Director, Center for Urban Education at the University of Southern California

David Blazar, Assistant Professor, University of Maryland

Jessica Calarco, Assistant Professor, Indiana University

Edward Chi, PhD student, University of Southern California

Darnell Cole, Associate Professor, Co-Director, Center for Education, Identity & Social Justice, University of Southern California

Zoë Corwin, Associate Research Professor, University of Southern California

Danielle Dennis, Associate Professor, University of South Florida

Thurston Domina, Associate Professor, UNC Chapel Hill

Sherman Dorn, Professor, Arizona State University

Greg Garner, Educator, North Carolina

Chloe Gibbs, Assistant Professor, University of Notre Dame

Dan Goldhaber, Director, CEDR (Center for Education Data and Research), University of Washington

Nora Gordon, Professor, Georgetown University

Michael Gottfried, Associate Professor, UC Santa Barbara

Oded Gurantz, Stanford University

Scott Imberman, Associate Professor, Michigan State University

Todd Hausman, k-12 public school teacher, Washington state

Heather Hough, Executive Director, CORE-PACE Research Partnership, Policy Analysis for California Education

Derek A. Houston, Assistant Professor, University of Oklahoma

Ethan Hutt, Assistant Professor, University of Maryland

Sandra Kaplan, Professor of Clinical Education, University of Southern California

Adrianna Kezar, Professor & Co-director, Pullias Center for Higher Education, University of Southern California

Daniel Klasik, Assistant Professor, George Washington University

Sarah Winchell Lenhoff, Assistant Professor, Wayne State University

Michael Little, Doctoral Student, UNC Chapel Hill

Tattiya J. Maruco, Research Project Specialist, University of Southern California Pullias Center for Higher Education

Tod R. Massa, Director of Policy Analytics, State Council of Higher Education for Virginia

Katherine McKnight, Senior Manager, RTI International

Heather Mechler, Director of Institutional Analytics, University of New Mexico

Tatiana Melguizo, Associate Professor, University of Southern California

Sam Michalowski, Associate Provost of Institutional Research and Assessment, Fairleigh Dickinson University

Raegen T. Miller, Research Director, FutureEd at Georgetown University

Federick Ngo, Assistant Professor, University of Nevada Las Vegas

Laura Owen, Research Professor, American University

Lindsay Page, Assistant Professor, University of Pittsburgh

Elizabeth Park, PhD student, University of Southern California

John Pascarella, Associate Professor of Clinical Education, University of Southern California

Emily Penner, Assistant Professor, University of California Irvine

Julie Posselt, Assistant Professor, University of Southern California

David Quinn, Assistant Professor, University of Southern California

Jenny Grant Rankin, Lecturer, PostDoc Masterclass at University of Cambridge

Richard Rasiej, Visiting Research Scholar, University of Southern California

Macke Raymond, Director, CREDO at Stanford University

John Reyes, Director of Educational Technology, Archdiocese of Los Angeles

David M. Rochman, Program Specialist, Assessment & Evaluation, Orange County Department of Education

Andrew Saultz, Assistant Professor, Miami University

Gale Sinatra, Professor, University of Southern California

John Slaughter, Professor, University of Southern California

Julie Slayton, Professor of Clinical Education, University of Southern California

Aaron Sojourner, Associate Professor, University of Minnesota

Walker Swain, Assistant Professor, University of Georgia

William G. Tierney, Wilbur Kieffer Professor of Higher Education, University Professor & Co-director, Pullias Center for Higher Education, University of Southern California

Sean Tingle, Instructor, Arizona State University

James Ward, Dean’s Fellow in Urban Education Policy, University of Southern Calfornia

Rachel White, Postdoctoral Scholar, University of Southern California

Developing new measures of teachers’ instruction: Part 2

Cross posted from here. Co-authored with Hovanes Gasparian

One of the guiding questions for C-SAIL’s Measurement Study is, “How reliably can raters code the content of teachers’ assignments and assessments?”

We find that raters can code mathematics assignments quite reliably, but that they struggle to code English language arts (ELA) assignments. In this post, we discuss why we think this finding is important and what the implications are for our and others’ work.

Teacher surveys are the backbone of our FAST Program Study and reporting plans. In addition to teacher surveys, we planned to collect assignments and assessments in order to check the extent to which the survey reports match the actual materials on which students are evaluated. This portion of the Measurement Study is necessary for us to understand the extent to which we can consistently analyze these materials in order to judge their alignment to standards.

Our analysis follows three previous studies of the reliability of content analysis procedures using the Surveys of Enacted Curriculum. Two of the studies (first, second) examined how reliably raters could code the content of state standards and assessments (in essence asking the same question as is discussed here, only with different documents). That work found that these analyses were generally fairly reliable (about .75 on a 0 to 1 scale, with 1 being perfect reliability) if four trained raters were used. The results looked better in mathematics than in English language arts. A third study examined the reliability of content analyses of entire mathematics textbooks, finding that they were incredibly reliable—often .99 or higher on the 0 to 1 scale, even for as few as two content analysts (all things equal, more raters = higher reliability).

This study hypothesized that the reasons math textbook analyses were so much more reliable than those of tests and standards were:

  • The length—all things equal, longer documents can be analyzed more reliably just like longer tests are more reliable than shorter ones.
  • The fact that the tasks in mathematics textbooks often measure quite discrete skills that are easier to code.

While the results of previous studies suggested raters could code both math and ELA documents reliably, we needed to update previous work for C-SAIL, both because we had modified the SEC tools (see previous post for more on this), and because teachers’ assignments and assessments are not as long as whole textbooks.

The procedures for this study were straightforward. We collected two weeks’ worth of assignments and assessments from 47 teachers—24 in English language arts (ELA) and 23 in mathematics. We had four trained content analysts analyze the set of materials for each teacher independently. Then we calculated the reliability using the same “generalizability theory” techniques we had used in the previous studies.

The results of our analyses were illuminating. In mathematics, just two weeks’ worth of assignments or assessments could be content analyzed quite reliably. The average reliability for two content analysts across the 23 teachers was .73, and that increased to .79 if three content analysts were used. Only 4 of the 23 math teachers had reliabilities below .70 when three analysts were used. In short, the results in mathematics were strong.

In ELA, the results were much weaker. The average reliability for two content analysts was .49, and it rose to only .57 with three content analysts. Of the 24 teachers, just 7 had reliabilities above .7 with three content analysts. In short, our raters struggled to achieve reliable content analyses in ELA on two weeks of assignments.

What do these results mean? It appears that it is straightforward to analyze mathematics materials—we now have evidence from tests, standards, textbooks, and teacher-created assignments/assessments that we can do this quite well. This means we can give good feedback to these teachers about their instruction based on relatively few raters.

In contrast, we were surprised at how weak the results were in ELA. Clearly, more work needs to be done in ELA to achieve reliability. Four strategies we could use to improve the reliability are:

  • Collecting assignments over a longer period (such as a full month).
  • Increasing the training we provide to content analysts.
  • Increasing the number of content analysts we use.
  • Simplifying the ELA content languages to make analysis easier.

We are also interested in your ideas. How do you think we could improve the reliability of ELA content analysis? Take a look at our ELA survey and let us know what you think via email ( or Twitter (@CSAILproject).

In future work, we plan to explore why the reliability of some teachers’ coded assignments/assessments was higher than other teachers. Was it something about the content of these documents that made reliable coding easier? Or was it merely that they were longer?

Finally, it is important to note that when we began planning the Measurement Study, we were expecting to include content analysis as part of the FAST Program Study. In particular, we were planning to collect some assignments and assessments from participating teachers every few weeks and to content analyze them to gauge their alignment to standards. As we further developed the FAST study, however, the study took a different direction. Thus, the work presented here is not directly connected to our ongoing intervention study, but it can inform other research on teachers’ instruction.

The Don’t Do It Depository

Cross posted from here.

We have known for quite a while that schools engage in all manner of tricks to improve their performance under accountability systems. These behaviors range from the innocuous—teaching the content in state standards—to the likely harmful—outright cheating.

A new study last week provided more evidence of the unintended consequences of another gaming behavior—reassigning teachers based on perceived effectiveness. Researchers Jason A. Grissom, Demetra Kalogrides and Susanna Loeb analyzed data from a large urban district and found that administrators moved the most effective teachers to the tested grades (3-6) and the least effective to the untested grades (K-2).

On the surface, this might seem like a strategy that would boost accountability ratings without affecting students’ overall performance. After all, if you lose 10 points in kindergarten but gain 10 in third grade, isn’t the net change zero?

In fact, the authors found that moving the least effective teachers to the earlier grades harmed students’ overall achievement, because those early grades simply matter more to students’ long-term trajectories. The schools’ gaming behaviors were having real, negative consequences for children.

This strategy should go down in the annals of what doesn’t work, a category that we simply don’t pay enough attention to. Over the past 15 years, there has been a concerted effort in education research to find out “what works” and to share these policies and practices with schools.

The best example of this is the push for rigorous evidence in education research through the Institute of Education Sciences and the What Works Clearinghouse. This may well be a productive strategy, but the WWC is chock full of programs that don’t seem to “work,” at least according to its own evidence standards, and I don’t think anyone believes the WWC has had its desired impact. (The former director of IES himself has joked that it might more properly be called the What Doesn’t Work Clearinghouse).

These two facts together led me to half-joke on Twitter that maybe states or the feds should change their approach toward evidence. Rather than (or in addition to) encouraging schools and districts to do good things, they should start discouraging them from doing things we know or believe to be harmful.

This could be called something like the “Don’t Do It Depository” or the “Bad Idea Warehouse” (marketing experts, help me out). Humor aside, I think there is some merit to this idea. Here, then, are a couple of the policies or practices that might be included in the first round of the Don’t Do It Depository.

The counterproductive practice of assigning top teachers to tested grades is certainly a good candidate. While we’re at it, we might also discourage schools from shuffling teachers across grades for other reasons, as recent research finds this common practice is quite harmful to student learning.

Another common school practice, particularly in response to accountability, is to explicitly prepare students for state tests. Of course, test preparation can range from teaching the content likely to be tested all the way to teaching explicit test-taking strategies (e.g., write longer essays because those get you more points). Obviously the latter is not going to improve students’ actual learning, but the former might. In any case, test preparation seems to be quite common, but there’s less evidence that you might think that it actually helps. For instance:

  • study of the ACT (which is administered statewide) in Illinois found test strategies and item practice did not improve student performance, but coursework did.
  • An earlier study in Illinois found that students exposed to more authentic intellectual work saw greater gains on the standardized tests than those not exposed to this content.
  • In the Measures of Effective Teaching Project, students were surveyed about many dimensions of the instruction they received and these were correlated with their teachers’ value-added estimates. Survey items focusing on test preparation activities were much more weakly related to student achievement gains than items focusing on instructional quality.
  • Research doesn’t even indicate that direct test preparation strategies such as those for the ACT or SAT are particularly effective, with actual student gains far lower than advertised by the test preparation companies.

In short, there’s really not great evidence that test preparation works. In light of this evidence, perhaps states or the feds could offer guidance on what kind of and how much test preparation is appropriate and discourage the rest.

Other activities or beliefs that should be discouraged include “learning styles,” the belief that individuals have preferred ways of learning such as visual vs. auditory. The American Psychological Association has put out a brief explainer debunking the existence of learning styles. Similarly, students are not digital natives, nor can they multitask, nor should they guide their own learning.

There are many great lists of bad practices that already exist; states or the feds should simply repackage them to make them shorter, clearer, and more actionable. They should also work with experts in conceptual change, given that these briefs will be directly refuting many strongly held beliefs.

Do I think this strategy would convince every school leader to stop doing counterproductive things? Certainly I do not. But this strategy, if well executed, could probably effect meaningful change in some schools, and that would be a real win for children at very little cost.

Using Research to Drive Policy and Practice

Cross posted from here.

I’m excited to be joining the Advisory Board of Evidence Based Education, and I’m looking forward to contributing what I can to their important mission. In this post, I thought I’d briefly introduce myself and my research and talk about my philosophy for using research to affect policy and practice.

My research focuses on the design, implementation and effects of standards, assessment and accountability policies. Over my last seven years as an Assistant (now Associate) Professor at the University of Southern California Rossier School of Education, I have studied a number of issues in these areas, including:

  • The alignment of state assessments of student achievement with content standards;
  • The design of states’ school accountability systems;
  • The instructional responses of teachers to state standards and assessments; and
  • The alignment and impacts of elementary mathematics textbooks.

My current work continues in this vein, studying the implementation of new “college- and career-ready” standards and the adoption, use and effects of curriculum materials in the core academic subjects.

As is clear from the above links, I have of course published my research in the typical academic journals—this kind of publication is the coin of the realm for academics at research-focused institutions. And while I also find great intrinsic value in publishing in these venues, I know that I will not be fully satisfied if my work exists solely for the eyes of other academics.

When I joined an education policy PhD program in 2006, one of the key drivers of my decision was that I wanted to do work that was relevant to policy (at the very least—impact was an even more ideal goal). Unfortunately, while my PhD programs at Vanderbilt and Penn prepared me well for the rigors of academia, they did not equip me with the tools to drive policy or practice through my research. Those skills have developed over time, through trial and error with and advice from colleagues. Here are a few lessons I have learned that may be of use to others thinking of working to ensure that their research is brought to bear on policy and practice.

First, it goes without saying that research will not be useful to policymakers or practitioners if it is not on topics that are of interest to them. This means researchers should, at a minimum, conduct research on current policies (this means timeliness is paramount). Even better would be selecting research topics (or even conducting research) together with policymakers or practitioners. If the topics come from the eventual users, they are much more likely to use the results.

Second, even the best-designed research will not affect policy or practice if it is only published in peer-reviewed journals. Early in my academic career, I attended a networking and mentoring workshop with panels of leaders from DC. I had just come off publishing an article on an extremely new and relevant federal policy in a top education journal. The paper was short (5,000 words) and accessible, I thought, so surely it would be picked up and used by congressional staff or folks at the Department of Education. The peals of laughter from the panelists when I proposed that my work might matter in its current form certainly disabused me of the idea that the research-to-policy pipeline is an easy one.

Equipped with this knowledge, I began specifically writing and publishing in outlets that I thought would be more likely to reach the eyes of those in power. These include publishing articles in practitioner-oriented journals and magazines, briefs published for state and federal audiences, and even blog posts on personal and organization websites. Out of everything I’ve written, I think the piece that might have had the greatest impact is an open letter I wrote on my personal blog about the design of accountability systems under the new federal education law. This kind of writing is very different from the peer-reviewed kind, and specific training is needed—hopefully doctoral programs will begin to offer this kind of training (and universities will begin to reward this kind of engagement).

Third, networks are absolutely essential for research to be taken up. The best research, supported by the best nonacademic writing (blogs, briefs, etc.), will not matter if no one sees it. Getting your ideas in front of people requires the building of networks, and again this is something that must be done consciously. Networks can certainly be built through social media, and they can also be built by presenting research at policy and practice conferences, through media engagement, and through work with organizations like Evidence Based Education.

These are just a few of the ideas I have accumulated over time in my goal to bring my research to bear on current issues in policy and practice. I hope that my work with Evidence Based Education will allow me to contribute to their efforts in this area as well. Through our collaboration, I think we can continue to improve the production and use of quality evidence in education.

My remarks upon winning the AERA Early Career Award

This weekend in San Antonio I was honored to receive the AERA Early Career Award. I was truly and deeply grateful to have been selected for this award, especially given the many luminaries of education research who’ve previously received it. I hope that the next phase of my career continues to meaningfully affect education research, policy, and practice. Next year I will give a lecture where I will talk about my agenda so far and my vision for the next 10 years of my research.

Of course, I couldn’t have received this award without a great deal of support from family, friends, and colleagues. Here’s what I said in my 90-second remarks:

Thank you to the committee for this award, and to my colleagues Bill Tierney and Katharine Strunk for nominating me. I’m profoundly honored.

On June 8, 2006, I packed up my bags and left Chicago to start my PhD at Vanderbilt University. I’d applied to their MPP program, but someone on their admissions committee saw something promising in my application and they convinced me to do a PhD instead.

That moment in the admissions meeting turns out to have defined my life. Six days after I moved to Nashville I had dinner with a handsome southern gentleman who would later become my husband. At the same time, I started working on a couple of research projects led by my advisor Andy Porter and his wife and co-conspirator Laura Desimone, work for which I followed them from Vandy to Penn a year later. In many ways, Andy is like a father to me, and I owe much of my academic success to him.

Everything else, I owe to my mother, who raised my brother and me mostly alone through financial and personal struggles. She taught me that common sense and honesty are just as important as smarts and hard work, and she showed me how to lead a simple, uncluttered life.

Nothing I’ve accomplished since I started studying education policy has happened without my husband, Joel, by my side. He is truly my other half.

My goal as an academic is to produce research with consequence—to bring evidence to bear on the important education policy issues of our day. I’m fortunate to be at USC Rossier, a school that truly values impact and public scholarship and supports its junior faculty to do this kind of research. In these fraught times, we as a community of scholars committed to truth must always, as we say at USC, Fight On!

Thank you.

Let’s leave the worst parts of NCLB behind

This was originally posted at the Education Gadfly.

“Those who cannot remember the past are condemned to repeat it.” It turns out this adage applies not just to global politics, but also to state education policies, and groups on both the left and the right should take heed.

No Child Left Behind (NCLB) is among the most lamented education policies in recent memory, and few of NCLB’s provisions received as much scorn as its singular focus on grade-level proficiency as the sole measure of school performance. Researchers and practitioners alike faulted the fetishizing of proficiency for things like:

  • Encouraging schools to focus their attention on students close to the proficiency cut (the “bubble kids”) as opposed to all students, including high- and low-achievers.
  • Incentivizing states to lower their definitions of “proficiency” over time.
  • Resulting in unreliable ratings of school performance that were highly sensitive to the cut scores chosen.
  • Misrepresenting both school “effectiveness” (since proficiency is so highly correlated with student characteristics) and “achievement gaps” (since the magnitude of gaps again depends tremendously on where the proficiency cut is set).
  • Throwing away vast quantities of useful information by essentially turning every child into a 1 (proficient) or a 0 (not).

(For more details on these criticisms and links to relevant research, see my previous writing on this topic.)

With some prodding from interested researchers and policy advocates, the Department of Education is allowing states to rectify this situation. Specifically, states now are permitted to use measures other than “percent proficient” for their measure of academic achievement under the Every Student Succeeds Act (ESSA). In previous posts, I recommended that the feds allow the use of performance indexes and average scale scores; performance indexes are now specifically allowed under the peer-review guidance the Department published a few weeks ago.

Despite this newfound flexibility, of the seventeen states with draft ESSA accountability plans, the Fordham Institute finds only six have moved away from percent proficient as their main measure of academic achievement. In fact, the Foundation for Excellence in Education is encouraging states to stay the course with percent proficient, arguing that it is an indicator that students will be on track for college or career success. While I agree with them that proficiency for an individual student is not a useless measure, it is an awful measure for evaluating whole schools.

Sticking with percent proficient is a terrible mistake that will doom states to many of the same issues they had under NCLB. I implore states that are still finalizing their ESSA accountability systems to learn from the past and choose better measures of school performance. Specifically, I make the following two recommendations:

  • No state should use “percent proficient” as a measure of academic achievement; all should use a performance index with a minimum of four levels for their status-based performance measures. The more levels in the index, the better it will be at accurately representing the average achievement of students in the school. States can continue reporting percent proficient on the side if compelled.
  • States should place as much emphasis as possible on measures of student growth to draw as much attention as possible to schools that are most in need of improvement.Growth measures at least attempt to estimate the actual impact of schools on students; status measures do not. From among the array of growth measures, I recommend true value-added models or student growth percentiles (though I prefer value-added models for reasons described here). These are much better choices than “growth-to-proficiency” models, which do not estimate the impact of schools and again mostly measure who is enrolled.

While both EdTrust and the Foundation for Excellence in Education recommend growth-to-proficiency measures, again, these are perhaps acceptable for individual students, but as measures of school performance there is no question these are not growth measures that approximate schools’ impacts.

Overall, the evidence on these issues is overwhelming. Educators and policymakers have complained about NCLB and “percent proficient” for as long as the policy has existed. With this evidence, and with the newfound flexibility under ESSA, there is no reason for any state to continue using percent proficient as a measure of school performance. Doing so in spite of our past experience all but ensures that many of NCLB’s worst problems will persist through the ESSA era.

Should California’s New Accountability Model Set the Bar for Other States?

This is a repost of something I published previously on the C-SAIL blog and at FutureEd.

California has released a pilot version of its long-awaited school and district performance dashboard under the federal Every Student Succeeds Act. The dashboard takes a dramatically different approach from prior accountability systems, signaling a sharp break with both the No Child Left Behind era and California’s past approach.

Not surprisingly, given the contentiousness of measuring school performance, it has drawn criticism (too many measures, a lack of clear goals and targets, the possibility for schools to receive high scores even with underperforming student groups) and praise (a fairer and more accurate summary of school performance, a reduced reliance on test scores).

I’m not exactly a neutral observer. Over the past year and a half, I played a role in the development of the dashboard as part of the state superintendent’s Accountability and Continuous Improvement Task Force, an advisory group that put forward many of the features in the new dashboard. In my view, both the dashboard’s supporters and its opponents are right.

The dashboard is clearly an intentional response to previous accountability systems’ perceived shortcomings in at least four ways:

  • California officials felt state accountability systems focused excessively on test scores under NCLB, to the neglect of other measures of school performance. In response, the new dashboard includes a wider array of non-test measures, such as chronic absenteeism and suspension rates.
  • There was a widespread, well-justified concern that prior accountability measures based primarily on achievement levels (proficiency rates) unfairly penalized schools serving more disadvantaged students and failed to reward schools for strong test score growth. (See a previous post for more on this.) In response, the new dashboard includes both achievement status and growth in its performance measures. And the state uses a more nuanced measure of status rather than merely the percent of students who are proficient.
  • California’s previous metric, the Academic Performance Index, boiled down a school’s performance to a single number on a 200-to-1000 scale. Officials creating the state’s new system believed this to be an unhelpful way to think about a school’s performance. In response, the new system offers dozens of scores but no summative rating.
  • There was near unanimity among the task force members (myself excluded), the State Board of Education, and the California Department of Education that NCLB-era accountability systems were excessively punitive, and that the focus should instead be on “continuous improvement,” rather than “test-and-punish.” As a result, the new California system is nearly silent on actual consequences for schools that don’t meet expectations.

For my money, the pilot dashboard has several valuable features. The most important of these is the focus on multiple measures of school performance. Test scores are important and should play a central role, but schools do much more than teach kids content, and we should start designing our measurement systems to be more in line with what we want schools to be doing. The pilot also rightly places increased emphasis on how their students learn in the course of a school year, regardless of where they start the year on the achievement spectrum. Finally, I appreciate that the state is laying out a theory of action for how California will improve its schools and aligning the various components of its policy systems with this theory.

Still, I have concerns about some of the choices made in the creation of the dashboard.

Most importantly, consequential accountability was left out of the task force conversation entirely. We were essentially silent on the important question of what to do when schools fail their students.

And while consequences for underperforming schools were a topic of discussion at the State Board of Education—and thus I am confident that the state will comply with federal and state law about identifying and intervening in the lowest-performing schools—I am skeptical that the state will truly hold persistently underperforming schools accountable in a meaningful way (e.g., through closure, staffing changes, charter conversion, or other consequences other than “more support”). The new dashboard does not even allow stakeholders to directly compare the performance of schools, diminishing any potential public accountability.

It was a poor decision to roll out a pilot system that makes it essentially impossible to compare schools. Parents want these tools specifically for their comparative purposes, so to not even allow this functionality is a mistake. Some organizations, such as EdSource, have stepped in to fill this gap, but the state should have led this from the start.

And while the state does have a tool that allows for some comparisons within school districts, it is clunky and cumbersome compared to information systems in other states and districts available today. The most appropriate comparison tool might be a sort of similar-schools index that compares each school to other schools with similar demographics (the state used to have this). I understand the state has plans to address this issue when it revises the system; making these comparison tools clear and usable is essential.

Also, while I understand the rationale for not offering a single summative score for a school, I think that some level of aggregation would improve on what’s available now. For example, overall scores for growth and performance level might be useful, in addition to an overall attendance/culture/climate score. The correct number of scores may not be one (though there is some research suggesting that is a more effective approach), but it is unlikely to be dozens.

Finally, the website (which, again, is a pilot) is simply missing the supporting documents needed for a parent to meaningfully engage with and make sense of the data. There is a short video and PDF guide, but these are not adequate. There is also a lack of quality translation (Google Translate is available, but the state should invest in high quality translations given the diversity of California’s citizens). Presumably the documentation will be improved for the final version.

These lessons focus primarily on the transparency of the systems, but this is just one of several principles that states should attend to (which I have offered previously): Accountability systems should actually measure school effectiveness, not just test scores. They should be transparent, allowing stakeholders to make sense of the metrics. And they should be fair, not penalizing schools for factors outside their control. As states, including California, work to create new accountability systems under ESSA, they should use these principles to guide their decisions.

It is critically important to give states some leeway as they make decisions about accountability under ESSA, to allow them to develop their own theory of action and to innovate within the confines of what’s allowed by law. I am pleased that California has put forward a clear theory of action and is employing a wide array of measures to gauge student effectiveness. However, when the dashboard tells us that a school is persistently failing to improve outcomes for its children, I hope the state is serious about addressing that failure. Otherwise, I am skeptical that the dashboard will meaningfully change California’s dismal track record of educational outcomes for its children.

We need a little patience

In the last year I’ve been doing a lot more blogging, and it’s sometimes hard for me to keep track of everything I’ve written. So I’m going to start reposting things here, in order to keep track. This is a repost of something I wrote for Fordham and for C-SAIL last week. So if you read it there, no need to read again!

It’s 2017, which means we’re in year six of the Common Core experiment. The big question that everyone wants the answer to is “Is Common Core working?” Many states seem poised to move in a new direction, especially with a new administration in Washington, and research evidence could play an instrumental role in helping states make the decision of whether to keep the standards, revise them, or replace them altogether. (Of course, it might also be that policymakers’ views on the standards are impervious to evidence.)

To my knowledge, there are two existing studies that try to assess Common Core’s impact on student achievement, both by Tom Loveless. They compare state NAEP gains between Common Core adopting and non-adopting states or compare states based on an index of the quality of their implementation of the standards. Both studies find, in essence, no effects of the standards, and the media have covered these studies using that angle. The C-SAIL project, on which I am co-principal investigator, is also considering a related question (in our case, we are asking about the impact of college- and career-readiness standards in general, including, but not limited to, the Common Core standards).

There are many challenges with doing this kind of research. A few of the most serious are:

  1. The need to use sophisticated quasi-experimental methods, since experimental methods are not available.
  2. The limited array of outcome variables available, since NAEP (which is not perfectly aligned to the Common Core) is really the only assessment that has the national comparability required and many college and career outcomes are difficult to measure.
  3. The fact that the timing of policy implementation is not clear when states varied so much in the timing of related policies like assessment and textbook adoptions.

Thus, it is not obvious when will be the right time to evaluate the policy, and with what outcomes.

Policymakers want to effect positive change through policy, and they often need to make decisions on a short cycle—after all, they often make promises in their elections, and it behooves them to show evidence that their chosen policies are working in advance of the next round of elections. The consequence is that there is a high demand for rapid evidence about policy effects, and the early evidence often contributes overwhelmingly to shaping the narrative about whether policies are working or not.

Unfortunately, there are more than a handful of examples where the early evidence on a policy turned out to be misleading, or where a policy seemed to have delayed effects. For example, the Gates Foundation’s small school reforms were widely panned as a flop in early reviews relying on student test scores, but a number of later rigorous studies showed (sometimes substantial) positive effects on outcomes such as graduation and college enrollment. It was too late, however—the initiative had already been scrapped by the time the positive evidence started rolling in.

No Child Left Behind acquired quite a negative reputation over its first half dozen years of implementation. Its accountability policies were seen as poorly targeted (they were), and it was labeled as encouraging an array of negative unintended consequences. These views quickly became well established among both researchers and policymakers. And yet, a series of recent studies have shown meaningful effects of the law on student achievement, which has done precisely zero to change public perception.

There are all manner of policies that may fit into this category to a greater or lesser extent. A state capacity building and technical assistance policy implemented in California was shelved after a few years, but evaluations found the policy improved student learning. Several school choice studies have found null or modest effects on test scores only to turn up impacts on longer-term outcomes like graduation. Even School Improvement Grants and other turnaround strategies may qualify in this category—though the recent impact evaluation was neutral, several studies have found positive effects and many have found impacts that grow as the years progress (suggesting that longer-term evaluations may yet show effects).

How does this all relate back to Common Core and other college- and career-readiness standards? There are implications for both researchers and policymakers.

For researchers, these patterns suggest that great care needs to be taken in interpreting and presenting the results of research conducted early in the implementation of Common Core and other policies. This is not to say that researchers should not investigate the early effects of policies, but rather that they should be appropriately cautious in describing what their work means. Early impact studies will virtually never provide the “final answer” as to the effectiveness of any given policy, and researchers should explicitly caution against the interpretation of their work as such.

For policymakers, there are at least two implications. First, when creating new policies, policymakers should think about both short- and long-term outcomes that are desired. Then, they should build into the law ample time before such outcomes can be observed (i.e., ensuring that decisions are not made before the law can have its intended effects). Even if this time is not explicitly built into the policy cycle, policymakers should at least be aware of these issues and adopt a stance of patience toward policy revisions. Second, to the extent that policies build in funds or plans for evaluation, these plans should include both short- and long-term evaluations.

Clearly, these suggestions run counter to prevailing preferences for immediate gratification in policymaking, but they are essential if we are to see sustained improvement in education. At a minimum, this approach might keep us from declaring failure too soon on policies that may well turn out to be successful. Since improvement through policy is almost always a process of incremental progress, failing to learn all the lessons of new policies may hamstring our efforts to develop better policies later. Finally, jumping around from policy to policy likely contributes to reform fatigue among educators, which may even undermine the success of future unrelated policies. In short, regardless of your particular policy preferences, there is good reason to move on from the “shiny object” approach to education policy and focus instead on giving old and seemingly dull objects a chance to demonstrate their worth before throwing them in the policy landfill.