Letter to the CA State Board of Education

Now closed for signatures

Dear Dr. Kirst,

We write to you as researchers who study the design of school accountability systems and the construction of growth models. We read with interest the memorandum dated June 20, 2018, from Tom Torlakson and the CDE to you and the State Board. While we appreciate the analyses and effort underlying this memo, we have serious concerns about its claims and implications. Specifically, we believe the memo does not offer an accurate perspective on the strengths and limitations of various approaches to measuring schools’ contributions to student achievement. We are concerned that, if the State Board relies on this memo and keeps the current approach to measuring “change,” it will be producing and disseminating inadequate measure that will give California school leaders and educational stakeholders incorrect information about school effectiveness. In this brief response, we outline what we view as the shortcomings of the memo and make specific recommendations for an alternative approach to measuring school effects on student achievement.

We read the memo as making three main arguments against a Residual Gain (RG) model:

  • The memo’s authors are concerned that the proposed RG model does not indicate how much improvement is needed to bring the average student up to grade-level standards, or whether achievement gaps are closing.
  • The memo’s authors are concerned that the RG model allows schools to make positive “Change” from the prior year to current year, even if they make negative growth. The memo’s authors say this is counterintuitive and will confuse educators.
  • The memo’s authors say that a RG model is volatile and should therefore not be used to make decisions, since decisions made in one year might be contradicted by the next year’s growth data.

Here, we respond to each of these concerns.

First, it is true that a residual gain model does not indicate how much improvement is needed to bring the average student up to standards. Models that attempt to indicate this are called “growth-to-proficiency” models. While perhaps appealing at first glance, these models do not measure school effectiveness. The reason is that they conflate the socioeconomic conditions of an area with “school effectiveness,” which is readily apparent when one considers how they work—namely, students in schools in high-poverty areas are much more likely to be below proficient, and thus required to make larger gains, than their peers who attend schools in low-poverty areas. Thus, growth-to-proficiency actually conveys very similar information to the state’s status measure (distance from level 3), which is not desirable for measuring the effectiveness of schools (Barnum, 2017). Building a growth measure around this metric would be largely redundant with the status measure and not in the spirit of measuring effectiveness in the first place.

Rather, what the state should aim for is a growth measure that comes as close as possible to capturing the true causal effect of schools on student achievement. For this goal, the most appropriate measures compare socioeconomically and demographically similar schools, and identify which schools produce students whose test scores improve the most (Barlevy and Neal, 2012; Ehlert et al., 2014, 2016).

Second, it is true that a residual gain model could provide different information from a “change” model that simply subtracts last year’s average score from this year’s. That is a feature of the system, not a bug. Among other reasons for this discrepancy, a “change” model does not adjust for changes in school composition. This is a problem with the “change” model and highlights an advantage of the residual gain model. There are many clear examples of how student growth models can be explained to educators and the general public, such as Castellano and Ho (2013). The state should follow these models.

Third, it is true that residual gain and other growth models can fluctuate somewhat from year to year. However, the year-to-year correlations of such models are positive and of a reasonable magnitude[1], indicating they provide consistent information. As long as high stakes decisions are not made on a single year’s scores, some degree of fluctuation is acceptable. Moreover, there are simple ways to reduce the year-to-year score fluctuations; any number of scholars could assist the state with developing these (examples include relying on moving averages, adjusting statistical significance bands, and even using Bayesian inference).

Based on our understanding of the research literature and of the goals of California’s system, we recommend that the state adopt a growth model that disentangles the student composition of a school from that school’s measured efficacy. There are many ways to do this—Ehlert et al. (2014, 2016) provide an overview of the main ideas. Models that properly account for student circumstances offer the best combination of validity (i.e., the output is more likely to reflect schools’ causal effects on student achievement) and interpretability (i.e., the output can be described in ways that educators can understand). Such models have been used in other states and school districts, including California’s CORE districts[2] and the states of Arkansas, Missouri, Colorado, and New York.[3]

The state’s current “change” model is unacceptable – it profoundly fails the validity test, and therefore it does not accurately represent schools’ contribution to student achievement. Indeed, it is not clear what it represents at all.

Should you have questions about our recommendations, we would be happy to discuss them.


Morgan Polikoff, Associate Professor of Education, USC Rossier School of Education

Cory Koedel, Associate Professor of Economics and Public Policy, University of Missouri-Columbia

Andrew Ho, Professor of Education, Harvard Graduate School of Education

Douglas Harris, Professor of Economics, Tulane University

Dan Goldhaber, Professor, University of Washington

Thomas Kane, Walter H. Gale Professor of Education, Harvard Graduate School of Education

David Blazar, Assistant Professor, University of Maryland College Park

Eric Parsons, Assistant Research Professor of Economics, University of Missouri-Columbia

Martin R. West, Professor of Education, Harvard Graduate School of Education

Chad Aldeman, Principal, Bellwether Education

Richard C. Seder, Specialist, University of Hawaiʻi at Mānoa, Adjunct Assistant Professor, University of Southern California

Cara Jackson, Adjunct Faculty, American University

Aaron Tang, Acting Professor of Law, UC-Davis School of Law

David Rochman, Assessment and Evaluation Specialist, Orange County

Aime Black, Education consultant

Anne Hyslop, Education consultant

[1] https://faculty.smu.edu/millimet/classes/eco7321/papers/koedel%20et%20al%202015.pdf

[2] The CORE district growth model conditions on student demographics, which we recommend for purposes of fairness and validity; however, a similar model that only conditions on prior achievement would be nearly as good and would be a dramatic improvement over what the state is currently using.

[3] More information about the Arkansas, Missouri, New York, and Colorado models can be found here: http://www.arkansased.gov/public/userfiles/ESEA/Documents_to_Share/School%20Growth%20Explanation%20for%20ES%20and%20DC%20111017.pdf;



Researcher recommendations on FERPA legislation

In partnership with the Data Quality Campaign, I have organized a Researcher Day on the Hill next week to talk to hill staffers about data privacy, FERPA, and the importance of educational research. A great group of faculty from across the country, along with state and district policy leaders, is joining me to make the case that educational research needs good data and that these data can be properly safeguarded through policy.

Below is a letter that we are planning to share with staffers on that day. If you are interested in being a signatory, please email me, tweet at me, or comment on this post. Please share widely!

Dear [],

As researchers committed to supporting and improving student learning and protecting student privacy, we applaud the bi-partisan work underway to update the Family Educational Rights and Privacy Act (FERPA). Education research and the data that enables it are incredibly powerful tools that help educators and policymakers understand and personalize learning; make good policy, practice, and funding decisions; and improve academic, life, and work outcomes.

Families, educators, and the public must be able to trust that student data is used ethically and protected. Well-designed FERPA improvements can help build that trust and ensure that schools, districts, and states are able to use data to improve learning and strengthen education without compromising student privacy.

With this balanced approach as our guide, we submit the following recommendations for strengthening the bi-partisan Student Privacy Protection Act (H.R. 3157 – 114th Congress) before the measure is reintroduced for the 115th Congress’s consideration:

  • Enable states and districts to procure the research they need. The Every Student Succeeds Act’s evidence tiers provide new opportunities for states and districts to use data to better understand their students’ needs and improve teaching and learning. FERPA must continue to permit the research and research-practice partnerships that states and districts rely on to generate and act on this evidence. Section 5(c)(6)(C), should be amended to read “the purpose of the study is limited to improving student outcomes.” Without this change, states and districts would be severely limited in the research they can conduct.
  • Invest in state and local research and privacy capacity. States and districts need help to build their educators’ capacities to protect student privacy, including partnering effectively with researchers and other allies with legitimate educational reasons for handling student data. In many instances, new laws and regulations are not required to enhance privacy. Instead, education entities need help with complying with existing privacy laws, which are often complex. FERPA should provide privacy protection focused technical assistance, including through the invaluable Privacy and Technical Assistance Center, to improve stakeholders’ understanding of the law’s requirements and related privacy best practices.
  • Support community data and research efforts. In order to understand whether and how programs beyond school are successful, schools and community-based organizations like tutoring and afterschool programs need to securely share information about the students they serve. Harnessing education data’s power to improve student outcomes, as envisioned by the Every Student Succeeds Act, will require improvements to FERPA that permit schools and their community partners to better collaborate, including sharing data for legitimate educational purposes including conducting joint research.
  • Support evidence-use across the education and workforce pipeline. We recommend adding workforce programs to Section 5(c)(5)(A)(ii) and to the studies exception in Section 5(c)(6)(C), . Just as leaders need to evaluate the efficacy of education programs based on workforce data, the country also needs to better understand the efficacy of workforce programs. FERPA should recognize the inherent connectivity between these areas to better meet student and worker needs.

We welcome the opportunity to speak about these issues and recommendations further.


Morgan Polikoff, Associate Professor, University of Southern California

Stephen Aguilar, Provost’s postdoctoral fellow, University of Southern California

Albert Balatico, K-12 public school teacher, Louisiana

Estela Bensimon, Professor and Director, Center for Urban Education at the University of Southern California

David Blazar, Assistant Professor, University of Maryland

Jessica Calarco, Assistant Professor, Indiana University

Edward Chi, PhD student, University of Southern California

Darnell Cole, Associate Professor, Co-Director, Center for Education, Identity & Social Justice, University of Southern California

Zoë Corwin, Associate Research Professor, University of Southern California

Danielle Dennis, Associate Professor, University of South Florida

Thurston Domina, Associate Professor, UNC Chapel Hill

Sherman Dorn, Professor, Arizona State University

Greg Garner, Educator, North Carolina

Chloe Gibbs, Assistant Professor, University of Notre Dame

Dan Goldhaber, Director, CEDR (Center for Education Data and Research), University of Washington

Nora Gordon, Professor, Georgetown University

Michael Gottfried, Associate Professor, UC Santa Barbara

Oded Gurantz, Stanford University

Scott Imberman, Associate Professor, Michigan State University

Todd Hausman, k-12 public school teacher, Washington state

Heather Hough, Executive Director, CORE-PACE Research Partnership, Policy Analysis for California Education

Derek A. Houston, Assistant Professor, University of Oklahoma

Ethan Hutt, Assistant Professor, University of Maryland

Sandra Kaplan, Professor of Clinical Education, University of Southern California

Adrianna Kezar, Professor & Co-director, Pullias Center for Higher Education, University of Southern California

Daniel Klasik, Assistant Professor, George Washington University

Sarah Winchell Lenhoff, Assistant Professor, Wayne State University

Michael Little, Doctoral Student, UNC Chapel Hill

Tattiya J. Maruco, Research Project Specialist, University of Southern California Pullias Center for Higher Education

Tod R. Massa, Director of Policy Analytics, State Council of Higher Education for Virginia

Katherine McKnight, Senior Manager, RTI International

Heather Mechler, Director of Institutional Analytics, University of New Mexico

Tatiana Melguizo, Associate Professor, University of Southern California

Sam Michalowski, Associate Provost of Institutional Research and Assessment, Fairleigh Dickinson University

Raegen T. Miller, Research Director, FutureEd at Georgetown University

Federick Ngo, Assistant Professor, University of Nevada Las Vegas

Laura Owen, Research Professor, American University

Lindsay Page, Assistant Professor, University of Pittsburgh

Elizabeth Park, PhD student, University of Southern California

John Pascarella, Associate Professor of Clinical Education, University of Southern California

Emily Penner, Assistant Professor, University of California Irvine

Julie Posselt, Assistant Professor, University of Southern California

David Quinn, Assistant Professor, University of Southern California

Jenny Grant Rankin, Lecturer, PostDoc Masterclass at University of Cambridge

Richard Rasiej, Visiting Research Scholar, University of Southern California

Macke Raymond, Director, CREDO at Stanford University

John Reyes, Director of Educational Technology, Archdiocese of Los Angeles

David M. Rochman, Program Specialist, Assessment & Evaluation, Orange County Department of Education

Andrew Saultz, Assistant Professor, Miami University

Gale Sinatra, Professor, University of Southern California

John Slaughter, Professor, University of Southern California

Julie Slayton, Professor of Clinical Education, University of Southern California

Aaron Sojourner, Associate Professor, University of Minnesota

Walker Swain, Assistant Professor, University of Georgia

William G. Tierney, Wilbur Kieffer Professor of Higher Education, University Professor & Co-director, Pullias Center for Higher Education, University of Southern California

Sean Tingle, Instructor, Arizona State University

James Ward, Dean’s Fellow in Urban Education Policy, University of Southern Calfornia

Rachel White, Postdoctoral Scholar, University of Southern California

Developing new measures of teachers’ instruction: Part 2

Cross posted from here. Co-authored with Hovanes Gasparian

One of the guiding questions for C-SAIL’s Measurement Study is, “How reliably can raters code the content of teachers’ assignments and assessments?”

We find that raters can code mathematics assignments quite reliably, but that they struggle to code English language arts (ELA) assignments. In this post, we discuss why we think this finding is important and what the implications are for our and others’ work.

Teacher surveys are the backbone of our FAST Program Study and reporting plans. In addition to teacher surveys, we planned to collect assignments and assessments in order to check the extent to which the survey reports match the actual materials on which students are evaluated. This portion of the Measurement Study is necessary for us to understand the extent to which we can consistently analyze these materials in order to judge their alignment to standards.

Our analysis follows three previous studies of the reliability of content analysis procedures using the Surveys of Enacted Curriculum. Two of the studies (first, second) examined how reliably raters could code the content of state standards and assessments (in essence asking the same question as is discussed here, only with different documents). That work found that these analyses were generally fairly reliable (about .75 on a 0 to 1 scale, with 1 being perfect reliability) if four trained raters were used. The results looked better in mathematics than in English language arts. A third study examined the reliability of content analyses of entire mathematics textbooks, finding that they were incredibly reliable—often .99 or higher on the 0 to 1 scale, even for as few as two content analysts (all things equal, more raters = higher reliability).

This study hypothesized that the reasons math textbook analyses were so much more reliable than those of tests and standards were:

  • The length—all things equal, longer documents can be analyzed more reliably just like longer tests are more reliable than shorter ones.
  • The fact that the tasks in mathematics textbooks often measure quite discrete skills that are easier to code.

While the results of previous studies suggested raters could code both math and ELA documents reliably, we needed to update previous work for C-SAIL, both because we had modified the SEC tools (see previous post for more on this), and because teachers’ assignments and assessments are not as long as whole textbooks.

The procedures for this study were straightforward. We collected two weeks’ worth of assignments and assessments from 47 teachers—24 in English language arts (ELA) and 23 in mathematics. We had four trained content analysts analyze the set of materials for each teacher independently. Then we calculated the reliability using the same “generalizability theory” techniques we had used in the previous studies.

The results of our analyses were illuminating. In mathematics, just two weeks’ worth of assignments or assessments could be content analyzed quite reliably. The average reliability for two content analysts across the 23 teachers was .73, and that increased to .79 if three content analysts were used. Only 4 of the 23 math teachers had reliabilities below .70 when three analysts were used. In short, the results in mathematics were strong.

In ELA, the results were much weaker. The average reliability for two content analysts was .49, and it rose to only .57 with three content analysts. Of the 24 teachers, just 7 had reliabilities above .7 with three content analysts. In short, our raters struggled to achieve reliable content analyses in ELA on two weeks of assignments.

What do these results mean? It appears that it is straightforward to analyze mathematics materials—we now have evidence from tests, standards, textbooks, and teacher-created assignments/assessments that we can do this quite well. This means we can give good feedback to these teachers about their instruction based on relatively few raters.

In contrast, we were surprised at how weak the results were in ELA. Clearly, more work needs to be done in ELA to achieve reliability. Four strategies we could use to improve the reliability are:

  • Collecting assignments over a longer period (such as a full month).
  • Increasing the training we provide to content analysts.
  • Increasing the number of content analysts we use.
  • Simplifying the ELA content languages to make analysis easier.

We are also interested in your ideas. How do you think we could improve the reliability of ELA content analysis? Take a look at our ELA survey and let us know what you think via email (gse-csail@gse.upenn.edu) or Twitter (@CSAILproject).

In future work, we plan to explore why the reliability of some teachers’ coded assignments/assessments was higher than other teachers. Was it something about the content of these documents that made reliable coding easier? Or was it merely that they were longer?

Finally, it is important to note that when we began planning the Measurement Study, we were expecting to include content analysis as part of the FAST Program Study. In particular, we were planning to collect some assignments and assessments from participating teachers every few weeks and to content analyze them to gauge their alignment to standards. As we further developed the FAST study, however, the study took a different direction. Thus, the work presented here is not directly connected to our ongoing intervention study, but it can inform other research on teachers’ instruction.

The Don’t Do It Depository

Cross posted from here.

We have known for quite a while that schools engage in all manner of tricks to improve their performance under accountability systems. These behaviors range from the innocuous—teaching the content in state standards—to the likely harmful—outright cheating.

A new study last week provided more evidence of the unintended consequences of another gaming behavior—reassigning teachers based on perceived effectiveness. Researchers Jason A. Grissom, Demetra Kalogrides and Susanna Loeb analyzed data from a large urban district and found that administrators moved the most effective teachers to the tested grades (3-6) and the least effective to the untested grades (K-2).

On the surface, this might seem like a strategy that would boost accountability ratings without affecting students’ overall performance. After all, if you lose 10 points in kindergarten but gain 10 in third grade, isn’t the net change zero?

In fact, the authors found that moving the least effective teachers to the earlier grades harmed students’ overall achievement, because those early grades simply matter more to students’ long-term trajectories. The schools’ gaming behaviors were having real, negative consequences for children.

This strategy should go down in the annals of what doesn’t work, a category that we simply don’t pay enough attention to. Over the past 15 years, there has been a concerted effort in education research to find out “what works” and to share these policies and practices with schools.

The best example of this is the push for rigorous evidence in education research through the Institute of Education Sciences and the What Works Clearinghouse. This may well be a productive strategy, but the WWC is chock full of programs that don’t seem to “work,” at least according to its own evidence standards, and I don’t think anyone believes the WWC has had its desired impact. (The former director of IES himself has joked that it might more properly be called the What Doesn’t Work Clearinghouse).

These two facts together led me to half-joke on Twitter that maybe states or the feds should change their approach toward evidence. Rather than (or in addition to) encouraging schools and districts to do good things, they should start discouraging them from doing things we know or believe to be harmful.

This could be called something like the “Don’t Do It Depository” or the “Bad Idea Warehouse” (marketing experts, help me out). Humor aside, I think there is some merit to this idea. Here, then, are a couple of the policies or practices that might be included in the first round of the Don’t Do It Depository.

The counterproductive practice of assigning top teachers to tested grades is certainly a good candidate. While we’re at it, we might also discourage schools from shuffling teachers across grades for other reasons, as recent research finds this common practice is quite harmful to student learning.

Another common school practice, particularly in response to accountability, is to explicitly prepare students for state tests. Of course, test preparation can range from teaching the content likely to be tested all the way to teaching explicit test-taking strategies (e.g., write longer essays because those get you more points). Obviously the latter is not going to improve students’ actual learning, but the former might. In any case, test preparation seems to be quite common, but there’s less evidence that you might think that it actually helps. For instance:

  • study of the ACT (which is administered statewide) in Illinois found test strategies and item practice did not improve student performance, but coursework did.
  • An earlier study in Illinois found that students exposed to more authentic intellectual work saw greater gains on the standardized tests than those not exposed to this content.
  • In the Measures of Effective Teaching Project, students were surveyed about many dimensions of the instruction they received and these were correlated with their teachers’ value-added estimates. Survey items focusing on test preparation activities were much more weakly related to student achievement gains than items focusing on instructional quality.
  • Research doesn’t even indicate that direct test preparation strategies such as those for the ACT or SAT are particularly effective, with actual student gains far lower than advertised by the test preparation companies.

In short, there’s really not great evidence that test preparation works. In light of this evidence, perhaps states or the feds could offer guidance on what kind of and how much test preparation is appropriate and discourage the rest.

Other activities or beliefs that should be discouraged include “learning styles,” the belief that individuals have preferred ways of learning such as visual vs. auditory. The American Psychological Association has put out a brief explainer debunking the existence of learning styles. Similarly, students are not digital natives, nor can they multitask, nor should they guide their own learning.

There are many great lists of bad practices that already exist; states or the feds should simply repackage them to make them shorter, clearer, and more actionable. They should also work with experts in conceptual change, given that these briefs will be directly refuting many strongly held beliefs.

Do I think this strategy would convince every school leader to stop doing counterproductive things? Certainly I do not. But this strategy, if well executed, could probably effect meaningful change in some schools, and that would be a real win for children at very little cost.

Using Research to Drive Policy and Practice

Cross posted from here.

I’m excited to be joining the Advisory Board of Evidence Based Education, and I’m looking forward to contributing what I can to their important mission. In this post, I thought I’d briefly introduce myself and my research and talk about my philosophy for using research to affect policy and practice.

My research focuses on the design, implementation and effects of standards, assessment and accountability policies. Over my last seven years as an Assistant (now Associate) Professor at the University of Southern California Rossier School of Education, I have studied a number of issues in these areas, including:

  • The alignment of state assessments of student achievement with content standards;
  • The design of states’ school accountability systems;
  • The instructional responses of teachers to state standards and assessments; and
  • The alignment and impacts of elementary mathematics textbooks.

My current work continues in this vein, studying the implementation of new “college- and career-ready” standards and the adoption, use and effects of curriculum materials in the core academic subjects.

As is clear from the above links, I have of course published my research in the typical academic journals—this kind of publication is the coin of the realm for academics at research-focused institutions. And while I also find great intrinsic value in publishing in these venues, I know that I will not be fully satisfied if my work exists solely for the eyes of other academics.

When I joined an education policy PhD program in 2006, one of the key drivers of my decision was that I wanted to do work that was relevant to policy (at the very least—impact was an even more ideal goal). Unfortunately, while my PhD programs at Vanderbilt and Penn prepared me well for the rigors of academia, they did not equip me with the tools to drive policy or practice through my research. Those skills have developed over time, through trial and error with and advice from colleagues. Here are a few lessons I have learned that may be of use to others thinking of working to ensure that their research is brought to bear on policy and practice.

First, it goes without saying that research will not be useful to policymakers or practitioners if it is not on topics that are of interest to them. This means researchers should, at a minimum, conduct research on current policies (this means timeliness is paramount). Even better would be selecting research topics (or even conducting research) together with policymakers or practitioners. If the topics come from the eventual users, they are much more likely to use the results.

Second, even the best-designed research will not affect policy or practice if it is only published in peer-reviewed journals. Early in my academic career, I attended a networking and mentoring workshop with panels of leaders from DC. I had just come off publishing an article on an extremely new and relevant federal policy in a top education journal. The paper was short (5,000 words) and accessible, I thought, so surely it would be picked up and used by congressional staff or folks at the Department of Education. The peals of laughter from the panelists when I proposed that my work might matter in its current form certainly disabused me of the idea that the research-to-policy pipeline is an easy one.

Equipped with this knowledge, I began specifically writing and publishing in outlets that I thought would be more likely to reach the eyes of those in power. These include publishing articles in practitioner-oriented journals and magazines, briefs published for state and federal audiences, and even blog posts on personal and organization websites. Out of everything I’ve written, I think the piece that might have had the greatest impact is an open letter I wrote on my personal blog about the design of accountability systems under the new federal education law. This kind of writing is very different from the peer-reviewed kind, and specific training is needed—hopefully doctoral programs will begin to offer this kind of training (and universities will begin to reward this kind of engagement).

Third, networks are absolutely essential for research to be taken up. The best research, supported by the best nonacademic writing (blogs, briefs, etc.), will not matter if no one sees it. Getting your ideas in front of people requires the building of networks, and again this is something that must be done consciously. Networks can certainly be built through social media, and they can also be built by presenting research at policy and practice conferences, through media engagement, and through work with organizations like Evidence Based Education.

These are just a few of the ideas I have accumulated over time in my goal to bring my research to bear on current issues in policy and practice. I hope that my work with Evidence Based Education will allow me to contribute to their efforts in this area as well. Through our collaboration, I think we can continue to improve the production and use of quality evidence in education.

My remarks upon winning the AERA Early Career Award

This weekend in San Antonio I was honored to receive the AERA Early Career Award. I was truly and deeply grateful to have been selected for this award, especially given the many luminaries of education research who’ve previously received it. I hope that the next phase of my career continues to meaningfully affect education research, policy, and practice. Next year I will give a lecture where I will talk about my agenda so far and my vision for the next 10 years of my research.

Of course, I couldn’t have received this award without a great deal of support from family, friends, and colleagues. Here’s what I said in my 90-second remarks:

Thank you to the committee for this award, and to my colleagues Bill Tierney and Katharine Strunk for nominating me. I’m profoundly honored.

On June 8, 2006, I packed up my bags and left Chicago to start my PhD at Vanderbilt University. I’d applied to their MPP program, but someone on their admissions committee saw something promising in my application and they convinced me to do a PhD instead.

That moment in the admissions meeting turns out to have defined my life. Six days after I moved to Nashville I had dinner with a handsome southern gentleman who would later become my husband. At the same time, I started working on a couple of research projects led by my advisor Andy Porter and his wife and co-conspirator Laura Desimone, work for which I followed them from Vandy to Penn a year later. In many ways, Andy is like a father to me, and I owe much of my academic success to him.

Everything else, I owe to my mother, who raised my brother and me mostly alone through financial and personal struggles. She taught me that common sense and honesty are just as important as smarts and hard work, and she showed me how to lead a simple, uncluttered life.

Nothing I’ve accomplished since I started studying education policy has happened without my husband, Joel, by my side. He is truly my other half.

My goal as an academic is to produce research with consequence—to bring evidence to bear on the important education policy issues of our day. I’m fortunate to be at USC Rossier, a school that truly values impact and public scholarship and supports its junior faculty to do this kind of research. In these fraught times, we as a community of scholars committed to truth must always, as we say at USC, Fight On!

Thank you.

Let’s leave the worst parts of NCLB behind

This was originally posted at the Education Gadfly.

“Those who cannot remember the past are condemned to repeat it.” It turns out this adage applies not just to global politics, but also to state education policies, and groups on both the left and the right should take heed.

No Child Left Behind (NCLB) is among the most lamented education policies in recent memory, and few of NCLB’s provisions received as much scorn as its singular focus on grade-level proficiency as the sole measure of school performance. Researchers and practitioners alike faulted the fetishizing of proficiency for things like:

  • Encouraging schools to focus their attention on students close to the proficiency cut (the “bubble kids”) as opposed to all students, including high- and low-achievers.
  • Incentivizing states to lower their definitions of “proficiency” over time.
  • Resulting in unreliable ratings of school performance that were highly sensitive to the cut scores chosen.
  • Misrepresenting both school “effectiveness” (since proficiency is so highly correlated with student characteristics) and “achievement gaps” (since the magnitude of gaps again depends tremendously on where the proficiency cut is set).
  • Throwing away vast quantities of useful information by essentially turning every child into a 1 (proficient) or a 0 (not).

(For more details on these criticisms and links to relevant research, see my previous writing on this topic.)

With some prodding from interested researchers and policy advocates, the Department of Education is allowing states to rectify this situation. Specifically, states now are permitted to use measures other than “percent proficient” for their measure of academic achievement under the Every Student Succeeds Act (ESSA). In previous posts, I recommended that the feds allow the use of performance indexes and average scale scores; performance indexes are now specifically allowed under the peer-review guidance the Department published a few weeks ago.

Despite this newfound flexibility, of the seventeen states with draft ESSA accountability plans, the Fordham Institute finds only six have moved away from percent proficient as their main measure of academic achievement. In fact, the Foundation for Excellence in Education is encouraging states to stay the course with percent proficient, arguing that it is an indicator that students will be on track for college or career success. While I agree with them that proficiency for an individual student is not a useless measure, it is an awful measure for evaluating whole schools.

Sticking with percent proficient is a terrible mistake that will doom states to many of the same issues they had under NCLB. I implore states that are still finalizing their ESSA accountability systems to learn from the past and choose better measures of school performance. Specifically, I make the following two recommendations:

  • No state should use “percent proficient” as a measure of academic achievement; all should use a performance index with a minimum of four levels for their status-based performance measures. The more levels in the index, the better it will be at accurately representing the average achievement of students in the school. States can continue reporting percent proficient on the side if compelled.
  • States should place as much emphasis as possible on measures of student growth to draw as much attention as possible to schools that are most in need of improvement.Growth measures at least attempt to estimate the actual impact of schools on students; status measures do not. From among the array of growth measures, I recommend true value-added models or student growth percentiles (though I prefer value-added models for reasons described here). These are much better choices than “growth-to-proficiency” models, which do not estimate the impact of schools and again mostly measure who is enrolled.

While both EdTrust and the Foundation for Excellence in Education recommend growth-to-proficiency measures, again, these are perhaps acceptable for individual students, but as measures of school performance there is no question these are not growth measures that approximate schools’ impacts.

Overall, the evidence on these issues is overwhelming. Educators and policymakers have complained about NCLB and “percent proficient” for as long as the policy has existed. With this evidence, and with the newfound flexibility under ESSA, there is no reason for any state to continue using percent proficient as a measure of school performance. Doing so in spite of our past experience all but ensures that many of NCLB’s worst problems will persist through the ESSA era.