| | |

Questions About Computer Adaptive Tests

In Florida it is testing season. For my local high school, that means state-mandated Accountability assessments (End-of-Course/EOC exams and Florida’s FAST PM3 for English Language Arts/ELA) and Advanced Placement (AP) assessments are scheduled every single day for the next 3 weeks. While no single student is taking every scheduled assessment, someone in their class probably is. It is disruptive to classroom instruction. In elementary and middle schools, all students must be assessed by computer-based English Language Arts and Math FAST PM3s. Some students will take the State Science Assessment (5th and 8th grade) and some middle school classes require a state EOC. Compared to paper based assessments, which can be administered to all grade levels at the same time, scheduling computer based grade level assessments for the entire student body is a challenge.

This year, Florida testing regime has added a wrinkle…

This year, for the first time, Florida’s end-of-year accountability assessment (F.A.S.T.) will be a Computer Adaptive Test. F.A.S.T. assessment questions will be aligned to the state’s B.E.S.T. standards and the difficulty of each question will depend on how the student answered the previous question. A correct response leads to a more difficult item, while an incorrect response results in the selection of a less difficult item for the student. Students will not be asked questions that are above or below grade-level and a student’s final score will depend not simply on how many questions they answered correctly but, also, on the difficulty of the questions they answered correctly.

Testing strategy is different for a CAT. Unlike previous state assessments where students were advised to “make a best guess” when confronted with a challenging question and come later back to reconsider their answer, with CATs every question must be answered sequentially. If a students guesses incorrectly, it will affect the type of questions the student will be presented with and, likely, their final score.

Proponents of CATs report they prove “measurement efficiency” and can reduce testing time – a primary goal of DeSantis’ shift to F.A.S.T. testing. Lord knows, Florida’s students need to spend less time testing.

My question is – are CATs appropriate assessments for Florida’s test-based accountability system?


First, a brief review of the use of standardized assessments in state accountability systems.

Criterion-Referenced Tests

Florida has a standards based accountability system. Annual assessments are aligned to state standards and students are assessed with regard to these standards which define what they “should” know. Such assessments are referred to as criterion-referenced tests because they evaluate student achievement against a common and consistently applied set of criteria (currently the state’s B.E.S.T. standards). For accountability purposes, each student is held to the same standard, rather than ranked against each other.

The most common example given for a criterion-referenced test is the driver’s-license exams, which require would-be drivers to achieve a minimum passing score to earn a license. With a criterion-refernced test, anyone who achieves the set standard or passing score, can pass the test. It is possible for all test takers to pass a criterion-referenced test and this is important for accountability assessments because if a student demonstrates they can read at grade level, it shouldn’t matter how many students are “better readers” if the goal is to ensure grade level performance.

In contrast to criterion-referenced tests, norm-referenced test are designed to rank test takers on a “bell curve.” To produce a bell curve each time, test questions are designed to accentuate performance differences among test takers—not to determine if students have achieved specified learning standards, learned required material, or acquired specific skills. With a norm-referenced test, there will always be a highest performing and lowest performing co-hort at each end of the bell-shaped curve.

Unlike norm-referenced tests, criterion-referenced tests measure performance against a fixed set of criteria and this is why they are generally felt to be the appropriate assessment for state accountability systems. You can learn more about criterion-referenced tests here.

How Florida uses student test scores.

In Florida, annual state assessment scores are used to grade schools and districts, calculate learning gains, evaluate teachers, determine school turnaround status (and possible school closure or conversion to a charter school), calculate course grades and class placement, make promotion/retention decisions and determine eligibility for high school graduation. The high stakes attached to Florida’s state assessments are seemingly endless. How can one child’s test scores determine so much?

Florida’s test-based accountability was part of Jeb Bush’s A+ Plan and has remained largely intact for 25 years. When the Florida Senate proposed deregulating public schools this session, the Florida House refused to consider any changes the the accountability system, with the House Speaker vowing to set himself on fire if such proposals passed. Just a year ago, the same Florida House passed Universal Education Savings Accounts, offering public funding for private school tuition and home schooling without any required test-based accountability at all.


Again, I have questions. I recognize that CATs are modern and may be more efficient, but are CATs appropriate assessments for the myriad of determinations associated with Florida’s test-based accountability system? Also, has anyone bothered to ask?

  • How can fairness and reliability be determined if every student sees a different set of questions?
  • If a student makes a mistake on their very first question, can they achieve the same score as a classmate who answered the first question correctly?
  • If questions are confined to grade level standards, how do they assess students working significantly above or below grade level?
  • Can such assessments accurately assess learning gains for students in the highest or lowest quartiles?
  • Can a student score a Level 3 passing score if they answer all the least difficult items correctly, indicating they have a basic understanding of all the tested standards, or will they need to answer higher level questions correctly as well?
  • Are we expected to simply trust the CAT algorithm or will independent studies be done to ensure such assessments are fair, valid and reliable, especially for our most at risk sub-populations?
  • Has anyone shown that ranking students, teachers, and schools is a valid use of CAT criterion-based assessments?
  • Do the benefits of CATs outweigh the convenience of being able to assess an entire school at once with paper-based assessments?
  • Will these assessments lead to further narrowing of the curriculum and/or more intensive testing preparation?
  • And finally, if these assessments are so great, why don’t publicly funded private school and homeschool students have to take them?

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *