If you have started MAP Growth testing this fall, you saw that more student assessments are being invalidated this year than in prior years. You may also be wondering why that’s the case.
At NWEA, we are constantly seeking ways to improve the accuracy and validity of our assessments. That is why we pioneered the use of computer adaptive tests in schools; it is why we are meticulous in aligning our tests to standards; and it is why we have criteria to invalidate a test if there is evidence that the score would be unreliable. We have always invalidated MAP Growth assessments with abnormally high standard error of measurement, and we invalidate tests completed in under six minutes.
The six-minute standard was implemented because we confirmed that students who completed tests this quickly produced deflated scores that reflected a lot of guessing. Over the years, we have regularly received feedback from educators that this criterion was too conservative, and that many tests were reported as valid when teachers knew students had given very little effort on the test. We took that feedback to heart, and it led us on a program of research to come up with better ways to evaluate the validity of assessments, led by Dr. Steven Wise.
Dr. Wise’s approach to the question focused on rapid-guessing, which he defined as answering a test item in less than 10% of the average time it takes a student to complete it (with a maximum of 10 seconds). He chose 10% of the average response time, because at that level there is empirical evidence that a student cannot read the item, process it, and answer in an informed manner. He found that students who rapidly responded were no more accurate in their answers than someone who simply guessed randomly.
Rapid-guessing – defined as answering a test item in less than 10% of the average time it takes a student to complete it (with a maximum of 10 seconds).
Based on his research, we implemented a new criterion this fall to invalidate a MAP Growth test when a student rapid-guesses on 30% or more of the items. When a student guesses on 30% or more of the items on a test, the score is untrustworthy. The reported scores underestimate the student’s actual performance, and their guessing introduces a random element to the scores that reduces their reliability. Further, students more frequently rapid-guess on items with long passages, or items in the geometry domain in mathematics. This means the accuracy of goal scores are more compromised than the overall test scores, which makes the results unreliable as a tool to aid instruction.
For an assessment to be useful for teachers and students, the student score must be accurate and valid. Reporting untrustworthy scores is not good for either students or educators. For students, such scores may cause students to be given inappropriate placement in school, improperly identified for Response to Intervention, or denied gifted and talented opportunities. In the spring, a deflated score can cause students to miss their growth goals and may impact the results of teacher or principal evaluations. Most importantly, an untrustworthy score sends inaccurate information about what the student knows and can do and makes the assessment an unreliable tool to guide instruction.
We have been monitoring the impact of the new criteria in the first weeks of fall testing. We anticipated that about 2% to 5% of assessments were likely to be invalidated during the first season. So far, we’ve found that 1.1% of mathematics assessments and 3.7% of reading assessments have been invalidated because of rapid-guessing. This is in line with our estimates.
Some educators have asked whether NWEA’s criterion for rapid-guessing is reasonable, wondering whether there are students who can answer questions in under 10% of the average response time accurately. That criterion was tested prior to its implementation, and we found that students who violated the rapid guessing criterion, on average, answered those questions at a chance level, meaning they did no better than a student who randomly guessed. We have also run checks on that issue on tests taken so far this fall. Our finding has been that rapid-guesses are correct only 25% of the time (the same as if the student had answered randomly), while non-rapid-guesses are correct 50% of the time.
Some educators have also asked us to override the invalidation criterion, which would allow the tests violating the rapid guessing criterion to be reported. This was considered carefully prior to release, and we made the decision that releasing a score that we know to be an untrustworthy estimate of the student’s achievement would introduce the risk that the score would be used inappropriately to make instructional and accountability decisions. We hope you agree that reporting no score is preferable to reporting a score we know is wrong.
We understand that the new criterion has invalidated more tests and that retesting those students has a cost in instructional time and resources. While we have introduced warnings in the proctor console to help proctors identify students who are rapid guessing during their assessments, we are seeking ways to improve our implementation of this feature so that more students will produce valid results without requiring retesting. One thing we are considering is pausing the test when a student has begun rapid guessing so that the proctor has an opportunity to correct the problem before the student rapid-guesses on too many items. We are also confident that, as proctors and students become familiar with the feature, that lower rates of invalidation will occur.
Dr. Steve Wise, Senior Research Fellow at NWEA, also contributed to this post.