Over at Ahead of the Heard, Chad Aldeman has written about the recent Mathematica study, which found that PARCC and MCAS were equally predictive of early college success. He essentially argues that if all tests are equally predictive, states should just choose the cheapest bargain-basement test, content and quality be damned. He offers a list of reasons, which you’re welcome to read.
As you’d guess, I disagree with this argument. I’ll offer a list of reasons of my own here.
- The most obvious point is that we have reasonable evidence that testing drives instructional responses to standards. Thus, if the tests used to measure and hold folks/schools accountable are lousy and contain poor quality tasks, we’ll get poor quality instruction as well. This is why many folks are thinking these days that better tests should include tasks that are much closer to the kinds of things we want kids to actually be doing. In that case, “teaching to the test” becomes “good teaching.” May be a pipe dream, but that’s something I commonly hear.
- A second fairly obvious point is that switching to a completely unaligned test would end any possible notion that the tests could provide feedback to teachers about what they should be doing differently/better. Certainly we can all argue that current test results are provided too late to be useful–though smart testing vendors ought to be working on this issue as hard as possible–but if the test is in no way related to what teachers are supposed to be teaching, it’s definitely useless to them as a formative measure.
- Chad’s analysis seems to prioritize predictive validity–how well do results from the test predict other desired outcomes–over all the other types of validity evidence. It’s not clear to me why we should prefer predictive validity (especially when we already have evidence that GPAs do better at that than most standardized tests, though SAT/ACT adds a little) over, say, content-related validity. Don’t we first and foremost want the test to be a good measure of what students were supposed to have learned in the grade? More generally, I think it makes more sense to have different tests for different purposes, rather than piling all the purposes into a single test.
- Certainly if the tests are going to have stakes attached to them, the courts require a certain level of content validity (or what they’ve called instructional validity). See Debra P. v. Turlington. If a kid’s going to be held accountable, they need to have had the opportunity to learn what was on the test. If the test is the SAT, that’s probably not going to happen.
Anyway, take a look at the Mathematica report (you should anyway!) and Chad’s post and let me know what you think.