Arne Duncan, the Secretary of Education, believes that a culture of ‘good enough’ exists in some schools that has to be changed.
Grading schools and teachers drives change to help students, so the argument goes.
Good teachers do make a difference in student learning in a school that supports their efforts. How much students learn in a year, adjusted by other factors, is the value added measurement (VAM) used to identify good teachers. This is a tricky business. Even the experts do not agree how accurate they are. Read to the end of the post, I saved the best until last.
TEST SCORES COUNT, BUT WHAT ELSE MATTERS?
VAM scores represent differences in FCAT achievement gain scores from the state average for a particular grade. The scores are adjusted by variables that impact them. They include: attendance, previous achievement for 2 years, class size, number of subject related courses, homogeneity of class scores, gifted status, ELL, disabilities, age, and student mobility.
In order to sort out the effect of the school leadership etc. from the impact of teachers, one-half of the difference in gain scores between schools was added to the list above.
Using FCAT student data adjusted for the effects of these variables, the American Institutes for Research reported a study of the value added model (Florida VAM Technical Report) for teacher evaluation in Florida. Results differed for schools, grades and teachers but in ways you might not expect.
- VAM score reliability varies by grade level. Teacher effects for reading in grade 5 are more certain than for grade 10 (Std. errors ranged from: 8.98 – 16.37). In math teacher effects are more certain for grade nine than grade five. (Std. errors ranged from 7.9 – 24.37). Therefore, teacher scores are compared only to scores of teachers in the same grade.
- Teacher experience counts especially in grades 4, 7, 8, 10 in reading and in grades 4, 5, and 8 in mathematics.
- Teachers’ academic credentials matter. Teachers with Masters degrees were most effective.
- Achievement growth is highest for non gifted students in reading but there was no difference in math. This varies by grade and subject. ELL students show high growth in some grades.
- Systematic school effects exist and explain differences in how students perform above and beyond that which is explained by teacher effects.
WHICH STUDENTS DO TESTS MEASURE WELL?
Using tests that are either too difficult for the lowest achievement groups or too easy for the best students is always a validity issue. The reason is straight forward. There are not enough questions at the high or low ends to show much growth. Most questions on tests are in the middle range of difficulty where most students are likely to be. If you want go have accurate measures at either end of the achievement scale, you have to have longer tests.
This is the reason why adaptive tests have been developed. We’ll talk about those another time.
There are many practical reasons why state assessment scores are inadequate as value added measures. What these tests measure is very limited–reading, writing, and math. How do you evaluate teachers in other areas? Fortunately, many states use both test scores and other measures. In Florida, VAM scores make up about one half of the teacher evaluations. Is this fair?
WHAT DO THE EXPERTS SAY?
Here are some of the ‘heavy weight’ statistical and practical arguments for and against the use of VAM scores to use student test scores as an indicator of teacher and school quality. First of all, a Florida judge has said that the use of VAMs are legal. The dispute continues. See: A legal Argument Against the Use of VAMs in Teacher Evaluations . Teachers College Record. December 2014.
The American Statistical Association provided guidelines for the use of VAMs. ASA Issues Statement with Recommendations for Using VAMs for Educational Assessment. The ASA recommendations point to the need for caution in VAM use.
Two researchers from Harvard and another from Columbia argue that VAM scores do identify teacher effectiveness in their Discussion of American Statistical Association Statement on Using Value-Added Models for Educational Assessment They conclude, however, that the use of high stakes test scores in this way could also lead to cheating and teaching to the test. Other variables could have more long term effects such as principal ratings, observations and student evaluations.
One of the strongest arguments against relying on VAM scores alone is made by Darling-Hammond in Evaluating Teacher Evaluations: What We Know about Value added Models and Other Methods . She cites research evidence that teacher VAM scores vary from class to class and year to year based on the composition of the class.
Her paper also gives a comprehensive review of alternative methods to evaluate teachers that are not only valid, but also tend to improve teaching. These approaches use indicators such as knowledge of subject, use of lessons geared to subject area standards etc.
So after all, are VAM scores much ado about nothing? Not quite, but almost.
Ed Haertel, a leading educational researcher, stated that studies show that about ten percent of student scores are related to teacher effect! Schools explain a little more. Most of the effect is not related to teachers or schools.
Look at the graph, we are pressuring teachers and schools. They account for less than 25% of the achievement of students. Maybe achievement scores should only be 25% of a teacher’s rating, not 50%. If we want our teachers and schools to be more effective, we have to make it possible. Most things are outside of their control.
If you have now decided to become a testing expert, you will find much more information on the Fair Test website.