Journal Issue: Excellence in the Classroom Volume 17 Number 1 Spring 2007
Evidence on School-Based Performance Systems
In this section I review evidence on several school-based incentive programs implemented in recent years both in the United States and in other countries. The programs vary in their basic structure and details, with some targeted at teams of teachers and others at individuals.43
Evidence from the United States
Although school-based performance pay theoretically has many attractive features, researchers have been able to find little causal evidence that it is effective in U.S. programs. For example, three researchers studied school-based incentive pay systems in Kentucky, North Carolina (Charlotte-Mecklenburg), and Maryland.44 They concluded that in the Charlotte-Mecklenburg and Kentucky programs, but not in the Maryland program, both teacher motivation and student outcomes improved. But because all three studies lacked a control group, they could not establish definitively that the program itself—and not some other factor—was the cause of the improvements.45
Similarly Helen Ladd studied a school-based bonus program in Dallas.46 The program, which began in the 1991–92 school year and continued through 1995, ranked schools by how well their students’ test scores compared with state average scores, adjusting for students’ socioeconomic status. To avoid teaching to the test or other gaming behavior, the program relied on multiple measures of student outcomes, including two tests given each year. Ladd compared gains in schoollevel test scores in Dallas with gains in other cities (adjusting for many school characteristics, such as racial mix and relative deprivation) to evaluate the impact of this bonus scheme. She found that pass rates appeared to increase more quickly in Dallas than in other cities. Effects were most positive for Hispanics and whites and insignificant for blacks. Although the study suggests that a school-based program can be effective, it was not conclusive. It had, for example, only a limited number of student and school characteristics to adjust to make the participating schools comparable to other schools in the state. In addition, the test score gains in Dallas may have been part of a trend that started before the program was implemented.
The Dallas study also highlights some unintended consequences. In an earlier study, Charles Clotfelter and Ladd had reported that in the Dallas program, schools of low socioeconomic status rarely won awards.47 In response, the state divided schools into five groups based on socioeconomic characteristics and rewarded the top performers in each group. But some of the lower-performing schools in the upper socioeconomic bands felt that they had been treated unfairly. Dividing the schools into socioeconomic groups also encouraged an undesired strategic response from principals who realized that their ability to gain an award was based on the socioeconomic category into which they were placed.
Finally, two studies of a South Carolina performance- based program that included both school-based and individual-based rewards found that student performance improved.48 The studies, however, may overstate the incentive effects because teachers could choose whether to apply for an award. If, as would be expected, only the most productive teachers chose to apply, then part of the student gains may be attributable not to the incentives but to the fact that participants were better teachers in the first place.
One of the stronger examples of a schoolbased incentive program comes from Israel. In February 1995, Israel announced a competition for a monetary bonus for secondary schools and teachers based on their students’ performance.49 The objectives were to reduce dropout rates and improve scholastic achievement. The three performance measures were average number of credits per student, share of students receiving a matriculation diploma, and school dropout rate.
Sixty-two schools were initially selected for the program, with several schools added later. In 1996, participating schools competed for about $1.5 million in awards. Schools were ranked according to their annual improvement, adjusting for the socioeconomic background of the students. Only the top third of performers won awards. The distribution of cash incentives among the award-winning schools was determined solely by their ranking in terms of relative improvement (in 1996, the highest-scoring winner won $105,000; the lowest-scoring, $13,250). Teachers received 75 percent of the award as a salary bonus (proportional to gross income); the remainder was used to improve faculty facilities, such as teachers’ common rooms. In 1996, the bonuses ranged from 1 to 3 percent of average teacher salary. The combined performance of a team determined the total incentive payment, which was split among individuals regardless of performance.50
The student outcomes rewarded included most of those that can be affected by teachers, thereby reducing the dilemma teachers are assumed to face regarding how to allocate their time between rewarded and nonrewarded activities. School averages of all three performance measures were based on the size of the graduating cohort while in ninth grade rather than in twelfth grade. This procedure was adopted to discourage schools from gaming the incentive system—by encouraging weak students to transfer or drop out or by placing them in the nonmatriculation track. To encourage schools to direct more effort toward weak students, only the first 22 credit units taken by each student were counted in computing the school’s mean to determine its rank in the bonus program.
Two years after the program was implemented, I compared the program schools with a control group and found significant gains in student performance in the former.51 Average credits were 0.7 unit higher, the share of students sitting for matriculation examinations increased by 2.1 percent, and average scores and passing rates in these examinations improved as well. Of particular importance was the decline in the dropout rate in students’ transition from middle to high school. The programs also appeared mainly to affect weaker students.
Another analysis of a school-based teachers’ incentive program, this one in Kenya, examined effects on both teacher behavior and test scores.52 The program randomly assigned fifty Kenyan primary schools to a treatment group eligible for monetary incentives (21–43 percent of monthly salary). The winning schools were determined by their average test score performance relative to other treatment schools in districtwide examinations; all teachers in the winning schools received awards. The program penalized schools for dropouts by assigning low scores to students who did not take the examination. Data were collected on many types of teacher effort—teacher attendance, homework assignments, pedagogical techniques, and holding extra test preparation sessions—and on student scores obtained after the program’s conclusion.
During the two years the program was in place, student scores increased significantly in treatment schools (0.14 standard deviation above the control group). But the gain in scores was not attributable to the expected incentive-induced changes in teacher behavior. In fact, teacher attendance did not improve, and no changes were found in either homework assignment or pedagogy. Instead, teachers were more likely to conduct test preparation sessions outside regular class hours. Data collected the year after the program ended showed no lasting test score gains, suggesting that the teachers focused on improving short-term rather than long-term learning. Consistent with this hypothesis, the program had no effect on dropout rates even though examination participation rose (presumably because teachers wanted to avoid penalties for no-shows). The test score effect was also strongest in geography, history, and Christian religion, arguably subjects involving the most memorization.
Although group-based pay, either alone or combined with individual-based incentives, has the promise of overcoming some of the difficulties inherent in implementing individual- based systems, the little causal evidence of its effectiveness is mixed. The strongest evidence comes from the Israeli experience; whether it could be replicated either in the United States or abroad is unknown.