Journal Issue: Excellence in the Classroom Volume 17 Number 1 Spring 2007
Evidence on Individual-Based Performance Systems
In this section I review evidence on several individual-based incentive programs, again both in the United States and abroad.
Evidence from the United States
Studies of individual-based incentive schemes in the United States have had some success in isolating the programs’ causal effects on student outcomes. But their findings have also been quite mixed. For example, one study assessed the effect on student achievement of a merit pay scheme in Michigan that rewarded individual teachers according to student retention rates and evaluation questionnaires completed by their students.53 The scheme, which did not directly target student achievement, did improve student retention. But pass rates fell, while attendance rates and grade point averages remained unchanged. The authors concluded that “incentive systems within complex organizations such as schools may produce results that are unintended and at times misdirected.”
In contrast, another study combined panel data from the U.S. National Education Longitudinal Survey of 1988 (NELS88) to estimate the effects of teacher incentives on student outcomes.54 The authors defined incentive schemes as any merit raise or bonus awarded to any proportion of teachers in a school, although the variables did not identify whether schemes stipulated that rewards were to be tied directly to student achievement. The wealth of data in the survey enabled the authors to control for many student, teacher, school, and family characteristics to make it easier to compare students taught by treatment teachers (those in the incentive scheme) with students taught by teachers in a control group. The results were positive, particularly in public and poor (low economic status) schools. Test scores were higher when awards were higher and when awards were given only to a few teachers within a school.
Finally, a third study analyzed incentive effects on student SAT scores in the Tennessee STAR (Student Teacher Achievement Ratio) and Career Ladder Evaluation programs.55 It controlled for student and teacher characteristics as well as for class attributes that do not change over time (by including class fixed effects based on panel data). It found that SAT scores improved, with gains varying across subjects and with teacher seniority.
One reason for the mixed evidence may be that the studies combine students at all grade levels. One analysis of merit pay reforms in South Carolina in the 1980s and 1990s suggested that merit pay might be more effective in earlier grades than in later grades. More generally, the study cautioned that the effects of performance-based pay may vary across countries, schools, population groups, or time.
The first international example of an individual- based program is an experiment, begun in fifty high schools in Israel in December 2000, that offered teachers a bonus based on student achievement. The experiment included all English, Hebrew, Arabic, and mathematics teachers who taught tenth- to twelfth-grade classes in preparation for matriculation examinations in these subjects in June 2001. Each teacher was ranked separately on the basis of the mean performance of each class she taught. The ranking was based on the difference between actual class performance and performance predicted on the basis of students’ socioeconomic characteristics, their level of proficiency in each subject, and a fixed school-level effect. Each teacher was ranked twice, once for the students’ passing rate and once for average score.
Each school submitted student enrollment lists, itemized by grade, subject, and teacher, on the program’s starting date. All students on these lists were included in the class mean outcomes. Students who dropped out or did not take the exams, regardless of the reason, were imputed a score of zero to neutralize any incentive for teachers to keep poorly performing students out of the tests.
All teachers who performed better than predicted in both passing rate and average score were ranked from first to fourth place and awarded points according to ranking. The awards, based on total points, ranged from 6 to 25 percent of the average annual income of high school teachers. A teacher could win several awards if she prepared more than one class for a matriculation examination.56 Of the 629 teachers in the program, 302 won awards.
My analysis of the program found that it significantly improved matriculation examination participation rates as well as the passing rate and average test scores among those who took the test.57 These gains accounted for about half of the improved outcomes among all students. They appear to have resulted from changes in teaching methods, afterschool teaching, and increased responsiveness to students’ needs, not from artificial inflation or manipulation of test scores. The evidence that the incentive program improved teacher effort and pedagogy is important in the context of concerns about the programs’ unintended effects, such as teaching to the test or cheating and the manipulation of test scores, and fears that such programs do not produce real learning.
As a second example, in 1999 the United Kingdom introduced a systemwide performance-related pay policy for teachers using student progress (value added) as one key criterion.58 Using long-term teacher data and a before-and-after comparison research design, one study evaluated the policy’s effect on test scores on the important GCSE (General Certificate of Secondary Education) exams, taken by students at age sixteen at the end of compulsory education.59 Because the incentive scheme was explicitly teacherrather than school-based, the study followed teachers over two complete two-year teaching cycles before and after the policy was introduced. Students were linked to teachers who taught them specific subjects, making it possible to compare treatment and control group teachers. This panel data structure also made it possible to control for student and teacher unchanged characteristics (by including respective fixed effects) and to measure the scheme’s target: student progress.
The study reported statistically and economically significant student progress. For instance, relative to control teachers, treatment teachers increased their value added by almost half a GCSE grade per student, equal to 0.73 standard deviation. Also significant were differences between school subjects, with treatment math teachers showing no effect from participating in the program. Although promising, again, the study is not definitive. One concern is that it applied only to teachers who had been in the profession for about eight years, so that treatment and control teachers differed systematically in experience. If teachers improve in their capacity to generate value added as they gain more experience but at a decreasing rate, then, all else being equal, one can expect to see greater improvements in progress between the two teaching cycles for the less experienced (control) group. Taking teacher experience into account in the analysis may not solve the problem if the relationship between teachers’ experience and productivity is nonlinear.
In general, the evidence suggests that well-designed individual-based incentives can significantly improve student outcomes. But the research base is small, and implementing purely individual-based programs presents many challenges.