Senator’s Montford and Gaetz asked some hard questions of the Alpine testing company that did the FSA validity study. I agree that the mismatch of questions with specific standards is an issue. I agree that validity is relative; tests are not 100% valid for every purpose. Alpine states this clearly in its report.
I keep wondering why no one has brought up the fact that independent ratings of the complexity of the math and ELA tests revealed a serious mismatch with ratings done by DOE staff. The mismatches were systematic, so the math items were less complex than intended and the ELA items were more complex. The DOE should have had these questions evaluated but did not have time, so they did it in house. This problem can come home to roost when proficiency levels are created.
I posted about this earlier. I object to the word ‘slightly’ that was used in the report. Over one third of the items were affected. This is not a ‘slight’ problem. Here’s the data:
DOK ratings were slightly lower than intended in math because somewhat more items were intended to reflect level 2 but were rated level 1. Thirty-six percent of the math items were rated below the intended level.
DOK ratings were slightly higher expected in ELA because many were intended to reflect level 3 but were rated at level 2. Thirty seven percent of the ELA items were rated above the intended level.
Questions need to be asked about the way in which the DOE will adjust proficiency this year and in following years. With a mismatch of items and standards plus a mismatch of item complexity, one wonders what proficiency means.