Commissioner of Education Pam Stewart makes no mention of the FSA validity study qualifications regarding the validity of the FSA for different uses. She summarizes the study in three bullets. Then, she states that the FSA is an accurate way to measure student’s mastery of the standards. She announces that group level scores will be used to calculate teacher evaluations and student level cut scores.
The report says much more. The devil is in the details. Important details. This post compares the FSA report finding to the Commissioner’s statement. There are recommendations in the report that are not referenced in the statement. These could make a difference in what scores mean.
Stewart’s statement includes three conclusions drawn from the study. Let’s look at her summary and the details that were not included.
- The report found that the policies and procedures were generally consistent with accepted practices..
This is ‘generally’ true, but the word ‘generally’ dismisses problems that were found. The report recommended that the Utah items be phased out; they did not align well with Florida standards. A third of the items on the ELA and the math portions of the tests had a complexity level that did not match the DOE expected level. A follow up study was recommended on the alignment of test items and standards. The test administration was problematic. The limitations of the grade 10 ELA and the Algebra I equating methods impact the interpretation of the scores. This needs to be clearly explained on score reports.
- Information for testing consequences, score reporting, and interpretative score guides were not included in this study as the score reports with scale scores and achievement level descriptors along with the interpretative guides were not available…
- There are some notable exceptions to the breadth of our conclusions for this study. Specifically, the criterion, construct, and consequential validity. (Basically, whether this test is a valid measure of critical thinking and problem solving as a construct, has not been validated.)
- Test scores should not be used as a sole determinant in the prevention of advancement to the next grade, graduation requirement, and placement into a remedial program for some students who had test administration problems with computer based tests.
- Use of group level scores (i.e. teacher evaluations, school grades etc.) must consider the impact of test administration problems on particular teachers and schools.
- Independent verification of the alignment of test items to standards were generally consistent with DOE intended estimates. However, average depth of knowledge ratings were generally lower than expected in math and higher than expected in ELA tests.
Given some systematic differences in alignment of test questions and standards as well as ratings of knowledge complexity of test questions by independent panels of raters, the implications for setting performance levels becomes significant. While average differences were within the guidelines the researchers set, the variation among ratings by tests are notable.