(Authors: Bill Haltermann and Pam Roberge, BAP Analytics)
We like the NYS ELA grades 3-8 assessments. However, we can’t help noticing the number of articles denouncing the state assessments for a whole host of reasons. It varies, of course, depending on who the author is. Some say state results shouldn’t be used for district and building accountability. Others point out the unfairness of using high stakes tests scores as a significant part of the teacher evaluation process. Given that there are other ways of measuring student progress and teacher effectiveness, there is merit to those arguments.
But invariably mixed in with those arguments is the statement in some form or fashion that there is no useful diagnostic information associated with the test results. As education professionals who have devoted a significant amount of time over the past decade devoted to specifically mining state tests for data to help teachers and students, we couldn’t disagree more strongly.
Let’s dissect the arguments so we can drill down on the facts. Yes, state test results are used for many different purposes. Too many. There is no one magical assessment that could effectively address accountability, teacher evaluation and diagnostics all at the same time. But, just because there are questions about two of those three issues, that does not negate the third – the diagnostic value.
One of our favorite phrases when we talk to administrators and teachers is that we want to focus on skills data rather than scores data. Many attempt to use scores to measure student, building, and district progress and teacher effectiveness. We emphasize that scores, in fact, do not have diagnostic value and can’t provide prescriptive measures to improve classroom instruction.
Skills data derived from the state tests, on the other hand, can be diagnostic and positively alter instruction. We don’t say that in the abstract; we have seen it happen in school districts. When administrators and teachers embrace the state tests diagnostic data to target skill deficits, we can document not only instructional modifications, but also student improvement.
So why, contrary to the evidence, do people insist there is no diagnostic value in the state tests? There are several reason. One reason is purely psychological. If the tests are used badly for one reason, the illogical leap is that it can’t be good for any reason. A second and an unfortunately prevalent reason is that many educators simply do not know how to properly get or use the diagnostic data from the state tests. If we had a nickel for every time we heard someone say they had analyzed the state test data, but only looked at the scores and performance levels, we would be seriously rich today. Just because you can analyze one subset of data from the state tests does not mean you understand how to get the diagnostic value. Why does that happen?
Let’s pause for a second and glance back at history. When the common core was introduced, the Regents’ Reform Agenda had three foundational legs. The first was the new standards, the second was the teacher evaluation system and the third was what we will encapsulate in the phrase “data driven instruction (DDI)”. Guess which one wasn’t mandated? Since there was no mandate and no uniform state-wide system, the rollout and deployment of DDI was spotty at best. Because of the lack of rigorous and consistent training, many claim to use DDI, but very few do it well or even appropriately.
In all fairness, useful DDI training has been lacking. Diagnostic analysis of state test data is complicated. It’s an art not a science. There are several pieces of great news that serve as foundations to give us the ability to do a proper analysis. SED supplies us with several pieces of critical data. The first is the mapping of each question on the state tests to the standards. There can be no diagnostic analysis unless we know the skills being tested. BAP Analytics takes that even further and maps all the released questions to sub-skills related to the standards. This makes the analysis and skill deficit targeting more precise and useful. The more educators understand what is being tested and how, the better prepared they are to positively modify instruction. The second critical piece of data that is supplied by SED and the RIC’s is the success rate for large groups of students for each question, i.e. the percent of students who got each question correct (either state-wide or a large regional sample). This data supplies us with benchmarks and p-values (which we can use as a proxy for question difficulty). Because of the psychometrics of test creation, questions are designed to cover a range of levels of difficulty. No proper diagnostic can be done without those measures that frame the question difficulty.
Another critical piece of information necessary for successful DDI is the ability to access released passages and questions from the tests. As educators we need to know not only what skills are being tested, but how they are being tested. Studying the complexity of the text used in the state tests and how many different way skills are tested is essential. Remember it’s not the standards that define the level of instructional rigor, it’s the state tests. The state tests are the only resources we have that help ensure a common, consistent level of instruction across the state, across districts and across classrooms. We ignore state tests and the diagnostic information they supply at our peril and to the detriment of our students.
We have been fortunate to have been a part of several district initiatives that not only have successfully analyzed state test diagnostic data, but then connected that analysis to classroom instruction. It is truly amazing to see the DDI cycle completed and how that can empower and motivate teachers to help their students improve.