Saturday, May 16, 2009

Teachers, Performance Pay, and Accountability (What Education Should Learn from Other Sectors)

[The following is Daniel Koretz' preface to the book

Teachers, Performance Pay, and Accountability (What

Education Should Learn from Other Sectors), by Scott J.

Adams John Heywood Richard Rothstein (eds), ISBN:

1-932066-38-1. The book may be purchased ($14.50) from

the Economic Policy Institute at

http://mpower.mosaicprint.com/EPI/p-153-teachers-performance-pay-and-accountability-what-education-should-learn-from-other-sectors.aspx

-- moderator.]



Teachers, Performance Pay, and Accountability

What Education Should Learn From Other Sectors



05-14-09

May 2009 | An Econonic Policy Institute book



By Scott J. Adams, John S. Heywood & Richard Rothstein

Preface by Daniel Koretz

Series editors Sean P. Corcoran and Joydeep Roy



by Daniel Koretz



Accountability for students' test scores has become the

cornerstone of education policy in the United States.

State policies that rewarded or punished schools and

their staffs for test scores became commonplace in the

1990s. The No Child Left Behind (NCLB) act federalized

this approach and made it in some respects more

draconian. There is now growing interest in pay for

performance plans that would reward or punish

individual teachers rather than entire schools. This

volume is important reading for anyone interested in

that debate.



The rationale for this approach is deceptively simple.

Teachers are supposed to increase students' knowledge

and skills. Proponents argue that if we manage schools

as if they were private firms and reward and punish

teachers on the basis of how much students learn,

teachers will do better and students will learn more.

This straightforward rationale has led to similarly

simple policies in which scores on standardized tests

of a few subjects dominate accountability systems, to

the near exclusion of all other evidence of

performance.



It has become increasingly clear that this model is

overly simplistic, and that we will need to develop

more sophisticated accountability systems. However,

much of the debate-for example, arguments about the

reauthorization of NCLB-continues as if the current

approach were at its core reasonable and that the

system needs only relatively minor tinkering. To put

this debate on a sensible footing requires that we

confront three issues directly.



The first of these critically important issues,

addressed in the first section of this volume by Scott

Adams and John Heywood, is that the rationale for the

current approach misrepresents common practice in the

private sector. Pay for performance based on numerical

measures actually plays a relatively minor role in the

private sector. There are good reasons for this.

Economists working on incentives have pointed out for

some time that for many occupations (particularly,

professionals with complex roles), the available

objective measures are seriously incomplete indicators

of value to firms, and therefore, other measures,

including subjective evaluations, have to be added to

the mix.



And that points to the second issue, known as

Campbell's Law in the social sciences and Goodhart's

Law in economics. In large part because available

numerical measures are necessarily incomplete, holding

workers accountable for them-without countervailing

measures of other kinds-often leads to serious

distortions. Workers will often strive to produce what

is measured at the expense of what is not, even if what

is not measured is highly valuable to the firm. One

also often finds that employees "game" the system in

various ways that corrupt the performance measures, so

that they overstate production even with respect to the

goals that are measured. Richard Rothstein's section in

this volume shows the ubiquity of this problem and

illustrates many of the diverse and even inventive

forms it can take. Some distortions are inevitable,

even when an accountability system has net positive

effects that make it worth retaining. However, the net

effects can be negative, and the distortions are often

serious enough that they need to be addressed

regardless. To disregard this is to pay a great

disservice to the nation's children.



The third essential issue is score inflation-increases

in scores larger than the improvements in learning

warrant-which is the primary form Campbell's Law takes

in test-based accountability systems. Many educators

and policy makers insist that this is not a serious

problem. They are wrong: score inflation is real,

common, and sometimes very large.



Three basic mechanisms generate score inflation. The

first is gaming that increases aggregate scores by

changing the group of students tested-for example,

removing students from testing by being lax about

truancy or assigning students to special education. The

second, which is a consequence of our ill-advised and

unnecessary focus on a single cut score (the

"proficient" standard), is what many teachers call "the

bubble kids problem." Some teachers focus undue effort

on students near the cut while reducing their focus on

other students well below or above it, because only the

ones near the cut score offer the hope of improvement

in the numbers that count.



The third mechanism is preparing students for tests in

ways that inflates individual students' scores. This

mechanism is the least well understood and most

controversial, but it can be the most important of the

three, creating very large biases in scores. One often

hears the argument: "our test is aligned with

standards, and it measures important knowledge and

skills, so what can be wrong with teaching to it?" This

argument is baseless and shows a misunderstanding of

both testing and score inflation. Score inflation does

not require that the test contain unimportant material.

It arises because tests are necessarily small samples

of very large domains of achievement. In building a

test, one has to sample not only content, but task

formats, criteria for scoring, and so on. When this

sampling is somewhat predicable-as it almost always is-

teachers can emphasize the material most likely to

recur, at the expense of other material that is less

likely to be tested but that is nonetheless important.

The result is scores that overstate mastery of the

domain. The evidence is clear that this problem can be

very large. There is no space here to discuss this

further, but if you are not persuaded, I strongly urge

you to read Measuring Up: What Educational Testing

Really Tells Us, where I explain the basic mechanisms

by which this happens and show some of the evidence of

the severity of the problem.



My experience as a public school teacher, my years as

an educational researcher, and my time as a parent of

students in public schools have all persuaded me that

we need better accountability in schools. We won't

achieve that goal, however, by hiding our heads in the

sand. This volume will make an important contribution

to sensible debate about more effective approaches.



Daniel Koretz is the Henry Lee Shattuck Professor of

Education at the Harvard Graduate School of Education,

Harvard University, and is a member of the National

Academy of Education.

No comments: