Researcher recommendations on FERPA legislation

In partnership with the Data Quality Campaign, I have organized a Researcher Day on the Hill next week to talk to hill staffers about data privacy, FERPA, and the importance of educational research. A great group of faculty from across the country, along with state and district policy leaders, is joining me to make the case that educational research needs good data and that these data can be properly safeguarded through policy.

Below is a letter that we are planning to share with staffers on that day. If you are interested in being a signatory, please email me, tweet at me, or comment on this post. Please share widely!


Dear [],

As researchers committed to supporting and improving student learning and protecting student privacy, we applaud the bi-partisan work underway to update the Family Educational Rights and Privacy Act (FERPA). Education research and the data that enables it are incredibly powerful tools that help educators and policymakers understand and personalize learning; make good policy, practice, and funding decisions; and improve academic, life, and work outcomes.

Families, educators, and the public must be able to trust that student data is used ethically and protected. Well-designed FERPA improvements can help build that trust and ensure that schools, districts, and states are able to use data to improve learning and strengthen education without compromising student privacy.

With this balanced approach as our guide, we submit the following recommendations for strengthening the bi-partisan Student Privacy Protection Act (H.R. 3157 – 114th Congress) before the measure is reintroduced for the 115th Congress’s consideration:

  • Enable states and districts to procure the research they need. The Every Student Succeeds Act’s evidence tiers provide new opportunities for states and districts to use data to better understand their students’ needs and improve teaching and learning. FERPA must continue to permit the research and research-practice partnerships that states and districts rely on to generate and act on this evidence. Section 5(c)(6)(C), should be amended to read “the purpose of the study is limited to improving student outcomes.” Without this change, states and districts would be severely limited in the research they can conduct.
  • Invest in state and local research and privacy capacity. States and districts need help to build their educators’ capacities to protect student privacy, including partnering effectively with researchers and other allies with legitimate educational reasons for handling student data. In many instances, new laws and regulations are not required to enhance privacy. Instead, education entities need help with complying with existing privacy laws, which are often complex. FERPA should provide privacy protection focused technical assistance, including through the invaluable Privacy and Technical Assistance Center, to improve stakeholders’ understanding of the law’s requirements and related privacy best practices.
  • Support community data and research efforts. In order to understand whether and how programs beyond school are successful, schools and community-based organizations like tutoring and afterschool programs need to securely share information about the students they serve. Harnessing education data’s power to improve student outcomes, as envisioned by the Every Student Succeeds Act, will require improvements to FERPA that permit schools and their community partners to better collaborate, including sharing data for legitimate educational purposes including conducting joint research.
  • Support evidence-use across the education and workforce pipeline. We recommend adding workforce programs to Section 5(c)(5)(A)(ii) and to the studies exception in Section 5(c)(6)(C), . Just as leaders need to evaluate the efficacy of education programs based on workforce data, the country also needs to better understand the efficacy of workforce programs. FERPA should recognize the inherent connectivity between these areas to better meet student and worker needs.

We welcome the opportunity to speak about these issues and recommendations further.

Sincerely,

Morgan Polikoff, Associate Professor, University of Southern California

Stephen Aguilar, Provost’s postdoctoral fellow, University of Southern California

Albert Balatico, K-12 public school teacher, Louisiana

Estela Bensimon, Professor and Director, Center for Urban Education at the University of Southern California

David Blazar, Assistant Professor, University of Maryland

Jessica Calarco, Assistant Professor, Indiana University

Edward Chi, PhD student, University of Southern California

Darnell Cole, Associate Professor, Co-Director, Center for Education, Identity & Social Justice, University of Southern California

Zoë Corwin, Associate Research Professor, University of Southern California

Danielle Dennis, Associate Professor, University of South Florida

Thurston Domina, Associate Professor, UNC Chapel Hill

Sherman Dorn, Professor, Arizona State University

Greg Garner, Educator, North Carolina

Chloe Gibbs, Assistant Professor, University of Notre Dame

Dan Goldhaber, Director, CEDR (Center for Education Data and Research), University of Washington

Nora Gordon, Professor, Georgetown University

Michael Gottfried, Associate Professor, UC Santa Barbara

Oded Gurantz, Stanford University

Scott Imberman, Associate Professor, Michigan State University

Todd Hausman, k-12 public school teacher, Washington state

Heather Hough, Executive Director, CORE-PACE Research Partnership, Policy Analysis for California Education

Derek A. Houston, Assistant Professor, University of Oklahoma

Ethan Hutt, Assistant Professor, University of Maryland

Sandra Kaplan, Professor of Clinical Education, University of Southern California

Adrianna Kezar, Professor & Co-director, Pullias Center for Higher Education, University of Southern California

Daniel Klasik, Assistant Professor, George Washington University

Sarah Winchell Lenhoff, Assistant Professor, Wayne State University

Michael Little, Doctoral Student, UNC Chapel Hill

Tattiya J. Maruco, Research Project Specialist, University of Southern California Pullias Center for Higher Education

Tod R. Massa, Director of Policy Analytics, State Council of Higher Education for Virginia

Katherine McKnight, Senior Manager, RTI International

Heather Mechler, Director of Institutional Analytics, University of New Mexico

Tatiana Melguizo, Associate Professor, University of Southern California

Sam Michalowski, Associate Provost of Institutional Research and Assessment, Fairleigh Dickinson University

Raegen T. Miller, Research Director, FutureEd at Georgetown University

Federick Ngo, Assistant Professor, University of Nevada Las Vegas

Laura Owen, Research Professor, American University

Lindsay Page, Assistant Professor, University of Pittsburgh

Elizabeth Park, PhD student, University of Southern California

John Pascarella, Associate Professor of Clinical Education, University of Southern California

Emily Penner, Assistant Professor, University of California Irvine

Julie Posselt, Assistant Professor, University of Southern California

David Quinn, Assistant Professor, University of Southern California

Jenny Grant Rankin, Lecturer, PostDoc Masterclass at University of Cambridge

Richard Rasiej, Visiting Research Scholar, University of Southern California

Macke Raymond, Director, CREDO at Stanford University

John Reyes, Director of Educational Technology, Archdiocese of Los Angeles

David M. Rochman, Program Specialist, Assessment & Evaluation, Orange County Department of Education

Andrew Saultz, Assistant Professor, Miami University

Gale Sinatra, Professor, University of Southern California

John Slaughter, Professor, University of Southern California

Julie Slayton, Professor of Clinical Education, University of Southern California

Aaron Sojourner, Associate Professor, University of Minnesota

Walker Swain, Assistant Professor, University of Georgia

William G. Tierney, Wilbur Kieffer Professor of Higher Education, University Professor & Co-director, Pullias Center for Higher Education, University of Southern California

Sean Tingle, Instructor, Arizona State University

James Ward, Dean’s Fellow in Urban Education Policy, University of Southern Calfornia

Rachel White, Postdoctoral Scholar, University of Southern California

Advertisements

Developing new measures of teachers’ instruction: Part 2

Cross posted from here. Co-authored with Hovanes Gasparian


One of the guiding questions for C-SAIL’s Measurement Study is, “How reliably can raters code the content of teachers’ assignments and assessments?”

We find that raters can code mathematics assignments quite reliably, but that they struggle to code English language arts (ELA) assignments. In this post, we discuss why we think this finding is important and what the implications are for our and others’ work.

Teacher surveys are the backbone of our FAST Program Study and reporting plans. In addition to teacher surveys, we planned to collect assignments and assessments in order to check the extent to which the survey reports match the actual materials on which students are evaluated. This portion of the Measurement Study is necessary for us to understand the extent to which we can consistently analyze these materials in order to judge their alignment to standards.

Our analysis follows three previous studies of the reliability of content analysis procedures using the Surveys of Enacted Curriculum. Two of the studies (first, second) examined how reliably raters could code the content of state standards and assessments (in essence asking the same question as is discussed here, only with different documents). That work found that these analyses were generally fairly reliable (about .75 on a 0 to 1 scale, with 1 being perfect reliability) if four trained raters were used. The results looked better in mathematics than in English language arts. A third study examined the reliability of content analyses of entire mathematics textbooks, finding that they were incredibly reliable—often .99 or higher on the 0 to 1 scale, even for as few as two content analysts (all things equal, more raters = higher reliability).

This study hypothesized that the reasons math textbook analyses were so much more reliable than those of tests and standards were:

  • The length—all things equal, longer documents can be analyzed more reliably just like longer tests are more reliable than shorter ones.
  • The fact that the tasks in mathematics textbooks often measure quite discrete skills that are easier to code.

While the results of previous studies suggested raters could code both math and ELA documents reliably, we needed to update previous work for C-SAIL, both because we had modified the SEC tools (see previous post for more on this), and because teachers’ assignments and assessments are not as long as whole textbooks.

The procedures for this study were straightforward. We collected two weeks’ worth of assignments and assessments from 47 teachers—24 in English language arts (ELA) and 23 in mathematics. We had four trained content analysts analyze the set of materials for each teacher independently. Then we calculated the reliability using the same “generalizability theory” techniques we had used in the previous studies.

The results of our analyses were illuminating. In mathematics, just two weeks’ worth of assignments or assessments could be content analyzed quite reliably. The average reliability for two content analysts across the 23 teachers was .73, and that increased to .79 if three content analysts were used. Only 4 of the 23 math teachers had reliabilities below .70 when three analysts were used. In short, the results in mathematics were strong.

In ELA, the results were much weaker. The average reliability for two content analysts was .49, and it rose to only .57 with three content analysts. Of the 24 teachers, just 7 had reliabilities above .7 with three content analysts. In short, our raters struggled to achieve reliable content analyses in ELA on two weeks of assignments.

What do these results mean? It appears that it is straightforward to analyze mathematics materials—we now have evidence from tests, standards, textbooks, and teacher-created assignments/assessments that we can do this quite well. This means we can give good feedback to these teachers about their instruction based on relatively few raters.

In contrast, we were surprised at how weak the results were in ELA. Clearly, more work needs to be done in ELA to achieve reliability. Four strategies we could use to improve the reliability are:

  • Collecting assignments over a longer period (such as a full month).
  • Increasing the training we provide to content analysts.
  • Increasing the number of content analysts we use.
  • Simplifying the ELA content languages to make analysis easier.

We are also interested in your ideas. How do you think we could improve the reliability of ELA content analysis? Take a look at our ELA survey and let us know what you think via email (gse-csail@gse.upenn.edu) or Twitter (@CSAILproject).

In future work, we plan to explore why the reliability of some teachers’ coded assignments/assessments was higher than other teachers. Was it something about the content of these documents that made reliable coding easier? Or was it merely that they were longer?

Finally, it is important to note that when we began planning the Measurement Study, we were expecting to include content analysis as part of the FAST Program Study. In particular, we were planning to collect some assignments and assessments from participating teachers every few weeks and to content analyze them to gauge their alignment to standards. As we further developed the FAST study, however, the study took a different direction. Thus, the work presented here is not directly connected to our ongoing intervention study, but it can inform other research on teachers’ instruction.

The Don’t Do It Depository

Cross posted from here.


We have known for quite a while that schools engage in all manner of tricks to improve their performance under accountability systems. These behaviors range from the innocuous—teaching the content in state standards—to the likely harmful—outright cheating.

A new study last week provided more evidence of the unintended consequences of another gaming behavior—reassigning teachers based on perceived effectiveness. Researchers Jason A. Grissom, Demetra Kalogrides and Susanna Loeb analyzed data from a large urban district and found that administrators moved the most effective teachers to the tested grades (3-6) and the least effective to the untested grades (K-2).

On the surface, this might seem like a strategy that would boost accountability ratings without affecting students’ overall performance. After all, if you lose 10 points in kindergarten but gain 10 in third grade, isn’t the net change zero?

In fact, the authors found that moving the least effective teachers to the earlier grades harmed students’ overall achievement, because those early grades simply matter more to students’ long-term trajectories. The schools’ gaming behaviors were having real, negative consequences for children.

This strategy should go down in the annals of what doesn’t work, a category that we simply don’t pay enough attention to. Over the past 15 years, there has been a concerted effort in education research to find out “what works” and to share these policies and practices with schools.

The best example of this is the push for rigorous evidence in education research through the Institute of Education Sciences and the What Works Clearinghouse. This may well be a productive strategy, but the WWC is chock full of programs that don’t seem to “work,” at least according to its own evidence standards, and I don’t think anyone believes the WWC has had its desired impact. (The former director of IES himself has joked that it might more properly be called the What Doesn’t Work Clearinghouse).

These two facts together led me to half-joke on Twitter that maybe states or the feds should change their approach toward evidence. Rather than (or in addition to) encouraging schools and districts to do good things, they should start discouraging them from doing things we know or believe to be harmful.

This could be called something like the “Don’t Do It Depository” or the “Bad Idea Warehouse” (marketing experts, help me out). Humor aside, I think there is some merit to this idea. Here, then, are a couple of the policies or practices that might be included in the first round of the Don’t Do It Depository.

The counterproductive practice of assigning top teachers to tested grades is certainly a good candidate. While we’re at it, we might also discourage schools from shuffling teachers across grades for other reasons, as recent research finds this common practice is quite harmful to student learning.

Another common school practice, particularly in response to accountability, is to explicitly prepare students for state tests. Of course, test preparation can range from teaching the content likely to be tested all the way to teaching explicit test-taking strategies (e.g., write longer essays because those get you more points). Obviously the latter is not going to improve students’ actual learning, but the former might. In any case, test preparation seems to be quite common, but there’s less evidence that you might think that it actually helps. For instance:

  • study of the ACT (which is administered statewide) in Illinois found test strategies and item practice did not improve student performance, but coursework did.
  • An earlier study in Illinois found that students exposed to more authentic intellectual work saw greater gains on the standardized tests than those not exposed to this content.
  • In the Measures of Effective Teaching Project, students were surveyed about many dimensions of the instruction they received and these were correlated with their teachers’ value-added estimates. Survey items focusing on test preparation activities were much more weakly related to student achievement gains than items focusing on instructional quality.
  • Research doesn’t even indicate that direct test preparation strategies such as those for the ACT or SAT are particularly effective, with actual student gains far lower than advertised by the test preparation companies.

In short, there’s really not great evidence that test preparation works. In light of this evidence, perhaps states or the feds could offer guidance on what kind of and how much test preparation is appropriate and discourage the rest.

Other activities or beliefs that should be discouraged include “learning styles,” the belief that individuals have preferred ways of learning such as visual vs. auditory. The American Psychological Association has put out a brief explainer debunking the existence of learning styles. Similarly, students are not digital natives, nor can they multitask, nor should they guide their own learning.

There are many great lists of bad practices that already exist; states or the feds should simply repackage them to make them shorter, clearer, and more actionable. They should also work with experts in conceptual change, given that these briefs will be directly refuting many strongly held beliefs.

Do I think this strategy would convince every school leader to stop doing counterproductive things? Certainly I do not. But this strategy, if well executed, could probably effect meaningful change in some schools, and that would be a real win for children at very little cost.

Using Research to Drive Policy and Practice

Cross posted from here.


I’m excited to be joining the Advisory Board of Evidence Based Education, and I’m looking forward to contributing what I can to their important mission. In this post, I thought I’d briefly introduce myself and my research and talk about my philosophy for using research to affect policy and practice.

My research focuses on the design, implementation and effects of standards, assessment and accountability policies. Over my last seven years as an Assistant (now Associate) Professor at the University of Southern California Rossier School of Education, I have studied a number of issues in these areas, including:

  • The alignment of state assessments of student achievement with content standards;
  • The design of states’ school accountability systems;
  • The instructional responses of teachers to state standards and assessments; and
  • The alignment and impacts of elementary mathematics textbooks.

My current work continues in this vein, studying the implementation of new “college- and career-ready” standards and the adoption, use and effects of curriculum materials in the core academic subjects.

As is clear from the above links, I have of course published my research in the typical academic journals—this kind of publication is the coin of the realm for academics at research-focused institutions. And while I also find great intrinsic value in publishing in these venues, I know that I will not be fully satisfied if my work exists solely for the eyes of other academics.

When I joined an education policy PhD program in 2006, one of the key drivers of my decision was that I wanted to do work that was relevant to policy (at the very least—impact was an even more ideal goal). Unfortunately, while my PhD programs at Vanderbilt and Penn prepared me well for the rigors of academia, they did not equip me with the tools to drive policy or practice through my research. Those skills have developed over time, through trial and error with and advice from colleagues. Here are a few lessons I have learned that may be of use to others thinking of working to ensure that their research is brought to bear on policy and practice.

First, it goes without saying that research will not be useful to policymakers or practitioners if it is not on topics that are of interest to them. This means researchers should, at a minimum, conduct research on current policies (this means timeliness is paramount). Even better would be selecting research topics (or even conducting research) together with policymakers or practitioners. If the topics come from the eventual users, they are much more likely to use the results.

Second, even the best-designed research will not affect policy or practice if it is only published in peer-reviewed journals. Early in my academic career, I attended a networking and mentoring workshop with panels of leaders from DC. I had just come off publishing an article on an extremely new and relevant federal policy in a top education journal. The paper was short (5,000 words) and accessible, I thought, so surely it would be picked up and used by congressional staff or folks at the Department of Education. The peals of laughter from the panelists when I proposed that my work might matter in its current form certainly disabused me of the idea that the research-to-policy pipeline is an easy one.

Equipped with this knowledge, I began specifically writing and publishing in outlets that I thought would be more likely to reach the eyes of those in power. These include publishing articles in practitioner-oriented journals and magazines, briefs published for state and federal audiences, and even blog posts on personal and organization websites. Out of everything I’ve written, I think the piece that might have had the greatest impact is an open letter I wrote on my personal blog about the design of accountability systems under the new federal education law. This kind of writing is very different from the peer-reviewed kind, and specific training is needed—hopefully doctoral programs will begin to offer this kind of training (and universities will begin to reward this kind of engagement).

Third, networks are absolutely essential for research to be taken up. The best research, supported by the best nonacademic writing (blogs, briefs, etc.), will not matter if no one sees it. Getting your ideas in front of people requires the building of networks, and again this is something that must be done consciously. Networks can certainly be built through social media, and they can also be built by presenting research at policy and practice conferences, through media engagement, and through work with organizations like Evidence Based Education.

These are just a few of the ideas I have accumulated over time in my goal to bring my research to bear on current issues in policy and practice. I hope that my work with Evidence Based Education will allow me to contribute to their efforts in this area as well. Through our collaboration, I think we can continue to improve the production and use of quality evidence in education.

My remarks upon winning the AERA Early Career Award

This weekend in San Antonio I was honored to receive the AERA Early Career Award. I was truly and deeply grateful to have been selected for this award, especially given the many luminaries of education research who’ve previously received it. I hope that the next phase of my career continues to meaningfully affect education research, policy, and practice. Next year I will give a lecture where I will talk about my agenda so far and my vision for the next 10 years of my research.

Of course, I couldn’t have received this award without a great deal of support from family, friends, and colleagues. Here’s what I said in my 90-second remarks:


Thank you to the committee for this award, and to my colleagues Bill Tierney and Katharine Strunk for nominating me. I’m profoundly honored.

On June 8, 2006, I packed up my bags and left Chicago to start my PhD at Vanderbilt University. I’d applied to their MPP program, but someone on their admissions committee saw something promising in my application and they convinced me to do a PhD instead.

That moment in the admissions meeting turns out to have defined my life. Six days after I moved to Nashville I had dinner with a handsome southern gentleman who would later become my husband. At the same time, I started working on a couple of research projects led by my advisor Andy Porter and his wife and co-conspirator Laura Desimone, work for which I followed them from Vandy to Penn a year later. In many ways, Andy is like a father to me, and I owe much of my academic success to him.

Everything else, I owe to my mother, who raised my brother and me mostly alone through financial and personal struggles. She taught me that common sense and honesty are just as important as smarts and hard work, and she showed me how to lead a simple, uncluttered life.

Nothing I’ve accomplished since I started studying education policy has happened without my husband, Joel, by my side. He is truly my other half.

My goal as an academic is to produce research with consequence—to bring evidence to bear on the important education policy issues of our day. I’m fortunate to be at USC Rossier, a school that truly values impact and public scholarship and supports its junior faculty to do this kind of research. In these fraught times, we as a community of scholars committed to truth must always, as we say at USC, Fight On!

Thank you.

Let’s leave the worst parts of NCLB behind

This was originally posted at the Education Gadfly.


“Those who cannot remember the past are condemned to repeat it.” It turns out this adage applies not just to global politics, but also to state education policies, and groups on both the left and the right should take heed.

No Child Left Behind (NCLB) is among the most lamented education policies in recent memory, and few of NCLB’s provisions received as much scorn as its singular focus on grade-level proficiency as the sole measure of school performance. Researchers and practitioners alike faulted the fetishizing of proficiency for things like:

  • Encouraging schools to focus their attention on students close to the proficiency cut (the “bubble kids”) as opposed to all students, including high- and low-achievers.
  • Incentivizing states to lower their definitions of “proficiency” over time.
  • Resulting in unreliable ratings of school performance that were highly sensitive to the cut scores chosen.
  • Misrepresenting both school “effectiveness” (since proficiency is so highly correlated with student characteristics) and “achievement gaps” (since the magnitude of gaps again depends tremendously on where the proficiency cut is set).
  • Throwing away vast quantities of useful information by essentially turning every child into a 1 (proficient) or a 0 (not).

(For more details on these criticisms and links to relevant research, see my previous writing on this topic.)

With some prodding from interested researchers and policy advocates, the Department of Education is allowing states to rectify this situation. Specifically, states now are permitted to use measures other than “percent proficient” for their measure of academic achievement under the Every Student Succeeds Act (ESSA). In previous posts, I recommended that the feds allow the use of performance indexes and average scale scores; performance indexes are now specifically allowed under the peer-review guidance the Department published a few weeks ago.

Despite this newfound flexibility, of the seventeen states with draft ESSA accountability plans, the Fordham Institute finds only six have moved away from percent proficient as their main measure of academic achievement. In fact, the Foundation for Excellence in Education is encouraging states to stay the course with percent proficient, arguing that it is an indicator that students will be on track for college or career success. While I agree with them that proficiency for an individual student is not a useless measure, it is an awful measure for evaluating whole schools.

Sticking with percent proficient is a terrible mistake that will doom states to many of the same issues they had under NCLB. I implore states that are still finalizing their ESSA accountability systems to learn from the past and choose better measures of school performance. Specifically, I make the following two recommendations:

  • No state should use “percent proficient” as a measure of academic achievement; all should use a performance index with a minimum of four levels for their status-based performance measures. The more levels in the index, the better it will be at accurately representing the average achievement of students in the school. States can continue reporting percent proficient on the side if compelled.
  • States should place as much emphasis as possible on measures of student growth to draw as much attention as possible to schools that are most in need of improvement.Growth measures at least attempt to estimate the actual impact of schools on students; status measures do not. From among the array of growth measures, I recommend true value-added models or student growth percentiles (though I prefer value-added models for reasons described here). These are much better choices than “growth-to-proficiency” models, which do not estimate the impact of schools and again mostly measure who is enrolled.

While both EdTrust and the Foundation for Excellence in Education recommend growth-to-proficiency measures, again, these are perhaps acceptable for individual students, but as measures of school performance there is no question these are not growth measures that approximate schools’ impacts.

Overall, the evidence on these issues is overwhelming. Educators and policymakers have complained about NCLB and “percent proficient” for as long as the policy has existed. With this evidence, and with the newfound flexibility under ESSA, there is no reason for any state to continue using percent proficient as a measure of school performance. Doing so in spite of our past experience all but ensures that many of NCLB’s worst problems will persist through the ESSA era.

Should California’s New Accountability Model Set the Bar for Other States?

This is a repost of something I published previously on the C-SAIL blog and at FutureEd.


California has released a pilot version of its long-awaited school and district performance dashboard under the federal Every Student Succeeds Act. The dashboard takes a dramatically different approach from prior accountability systems, signaling a sharp break with both the No Child Left Behind era and California’s past approach.

Not surprisingly, given the contentiousness of measuring school performance, it has drawn criticism (too many measures, a lack of clear goals and targets, the possibility for schools to receive high scores even with underperforming student groups) and praise (a fairer and more accurate summary of school performance, a reduced reliance on test scores).

I’m not exactly a neutral observer. Over the past year and a half, I played a role in the development of the dashboard as part of the state superintendent’s Accountability and Continuous Improvement Task Force, an advisory group that put forward many of the features in the new dashboard. In my view, both the dashboard’s supporters and its opponents are right.

The dashboard is clearly an intentional response to previous accountability systems’ perceived shortcomings in at least four ways:

  • California officials felt state accountability systems focused excessively on test scores under NCLB, to the neglect of other measures of school performance. In response, the new dashboard includes a wider array of non-test measures, such as chronic absenteeism and suspension rates.
  • There was a widespread, well-justified concern that prior accountability measures based primarily on achievement levels (proficiency rates) unfairly penalized schools serving more disadvantaged students and failed to reward schools for strong test score growth. (See a previous post for more on this.) In response, the new dashboard includes both achievement status and growth in its performance measures. And the state uses a more nuanced measure of status rather than merely the percent of students who are proficient.
  • California’s previous metric, the Academic Performance Index, boiled down a school’s performance to a single number on a 200-to-1000 scale. Officials creating the state’s new system believed this to be an unhelpful way to think about a school’s performance. In response, the new system offers dozens of scores but no summative rating.
  • There was near unanimity among the task force members (myself excluded), the State Board of Education, and the California Department of Education that NCLB-era accountability systems were excessively punitive, and that the focus should instead be on “continuous improvement,” rather than “test-and-punish.” As a result, the new California system is nearly silent on actual consequences for schools that don’t meet expectations.

For my money, the pilot dashboard has several valuable features. The most important of these is the focus on multiple measures of school performance. Test scores are important and should play a central role, but schools do much more than teach kids content, and we should start designing our measurement systems to be more in line with what we want schools to be doing. The pilot also rightly places increased emphasis on how their students learn in the course of a school year, regardless of where they start the year on the achievement spectrum. Finally, I appreciate that the state is laying out a theory of action for how California will improve its schools and aligning the various components of its policy systems with this theory.

Still, I have concerns about some of the choices made in the creation of the dashboard.

Most importantly, consequential accountability was left out of the task force conversation entirely. We were essentially silent on the important question of what to do when schools fail their students.

And while consequences for underperforming schools were a topic of discussion at the State Board of Education—and thus I am confident that the state will comply with federal and state law about identifying and intervening in the lowest-performing schools—I am skeptical that the state will truly hold persistently underperforming schools accountable in a meaningful way (e.g., through closure, staffing changes, charter conversion, or other consequences other than “more support”). The new dashboard does not even allow stakeholders to directly compare the performance of schools, diminishing any potential public accountability.

It was a poor decision to roll out a pilot system that makes it essentially impossible to compare schools. Parents want these tools specifically for their comparative purposes, so to not even allow this functionality is a mistake. Some organizations, such as EdSource, have stepped in to fill this gap, but the state should have led this from the start.

And while the state does have a tool that allows for some comparisons within school districts, it is clunky and cumbersome compared to information systems in other states and districts available today. The most appropriate comparison tool might be a sort of similar-schools index that compares each school to other schools with similar demographics (the state used to have this). I understand the state has plans to address this issue when it revises the system; making these comparison tools clear and usable is essential.

Also, while I understand the rationale for not offering a single summative score for a school, I think that some level of aggregation would improve on what’s available now. For example, overall scores for growth and performance level might be useful, in addition to an overall attendance/culture/climate score. The correct number of scores may not be one (though there is some research suggesting that is a more effective approach), but it is unlikely to be dozens.

Finally, the website (which, again, is a pilot) is simply missing the supporting documents needed for a parent to meaningfully engage with and make sense of the data. There is a short video and PDF guide, but these are not adequate. There is also a lack of quality translation (Google Translate is available, but the state should invest in high quality translations given the diversity of California’s citizens). Presumably the documentation will be improved for the final version.

These lessons focus primarily on the transparency of the systems, but this is just one of several principles that states should attend to (which I have offered previously): Accountability systems should actually measure school effectiveness, not just test scores. They should be transparent, allowing stakeholders to make sense of the metrics. And they should be fair, not penalizing schools for factors outside their control. As states, including California, work to create new accountability systems under ESSA, they should use these principles to guide their decisions.

It is critically important to give states some leeway as they make decisions about accountability under ESSA, to allow them to develop their own theory of action and to innovate within the confines of what’s allowed by law. I am pleased that California has put forward a clear theory of action and is employing a wide array of measures to gauge student effectiveness. However, when the dashboard tells us that a school is persistently failing to improve outcomes for its children, I hope the state is serious about addressing that failure. Otherwise, I am skeptical that the dashboard will meaningfully change California’s dismal track record of educational outcomes for its children.