Assessment designers strive to create assessments that show a high degree of fidelity to the following five traits:
1. Content validity
2. Reliability
3. Fairness
4. Student engagement and motivation
5. Consequential relevance
The first post in our series on quality educational assessments focused on the importance of content validity, or ensuring that an assessment measures what it intends to measure for its intended purpose. No more, and no less. In this post, we’ll discuss a second trait of high-quality assessments: reliability.
“Reliability” defined
Reliability refers to the consistency of an assessment’s results. It is the degree to which student results are the same when:
- They take the same test on different occasions
- Different scorers score the same task
- Different but equivalent tests are taken at the same time or at different times
Reliability is about making sure that different test forms in a single administration are equivalent; that retests of a given test are equivalent to the original test; and that test difficulty remains constant year to year, administration to administration.
Reliability is a trait achieved through statistical analysis in a process called equating. Equating is one of the many behind-the-scenes functions performed by psychometricians, folks trained in the statistical measurement of knowledge. They help ensure an assessment is reliable.
In general, informal, classroom-based, teacher-created assessments do not directly engage with the concept of reliability, as these types of assessments do not require advanced statistical analysis. But they do informally engage with the concept. When a student has to take a make-up test, for example, the test should be approximately as difficult as the original test. There are many such informal assessment examples where reliability is a desired trait. In fact, it is hard to conceive of a situation where reliability would not be a desired trait. The main difference is how it is tracked. For informal assessments, your professional judgment is often called upon; for large-scale assessments, reliability is tracked and demonstrated statistically.
Why reliability matters
Reliability of measurement is important. It all comes down to trust. Can your teachers trust that an assessment is giving them data they can trust is accurate?
Let me illustrate this point with an example: Yesterday, my 9-month-old son had a fever. I needed to be able to trust the measurement given by my thermometer to track whether his fever was getting worse or better. I needed to know if he needed medicine or if I needed to call the advice nurse for guidance on next steps. Maybe the advice nurse would have suggested a drive to urgent care. My reliable thermometer saved me a lot of worry, not to mention a trip to urgent care and an expensive copay.
In education, when we use assessment data to help us make strategic instructional decisions and track progress over time, it’s not all that different from caring for a child’s health. In education, we care for their learning path and finding ways to avoid costly detours. We need to have a high degree of confidence in the consistency or reliability in an assessment designed to tell us what students know. Otherwise, how can we be sure we are starting their learning journey in the right place?
Learn more
Read more about reliability and how it works in concert with validity in our guide, Not all assessment data is equal: Why validity and reliability matter. In my next post, I’ll explore the need for fairness.