91制片厂视频

Teaching Profession

Combined Measures Better at Gauging Teacher Effectiveness, Study Finds

Study probes scores, observations, surveys
By Stephen Sawchuk 鈥 January 08, 2013 | Corrected: January 09, 2013 8 min read
  • Save to favorites
  • Print
Email Copy URL

Corrected: This story has been updated with the current title for Jay P. Greene, a professor of education policy at the University of Arkansas.

Student feedback, test-score growth calculations, and observations of practice appear to pick up different but complementary information that, combined, can provide a balanced and accurate picture of teacher performance, according to research recently released from the Bill & Melinda Gates Foundation.

A composite measure on teacher effectiveness drawing on all three of those measures, and tested through a random-assignment experiment, closely predicted how much a high-performing group of teachers would successfully boost their students鈥 standardized-test scores, concludes the series of new papers, part of the massive Measures of Effective Teaching study launched more than three years ago.

鈥淚f you select the right measures, you can provide teachers with an honest assessment of where they stand in their practice that, hopefully, will serve as the launching point for their development,鈥 said Thomas J. Kane, a professor of education and economics at the Harvard Graduate School of 91制片厂视频, who headed the study.

Basing more than half a teacher鈥檚 evaluation on test-score-based measures of student achievement seemed to compromise it, the researchers also found.

Another piece suggests that teachers should be observed by more than one person to ensure that scores are reliable.

The findings, released Jan. 8, are among dozens from the final work products of MET. Together, they are billed as a proof point for the three measures the foundation has spent years studying.

Multiple Yardsticks

Researchers compared a number of different schemes for weighting the three indicators in an evaluation system. In general, more weight on鈥渧alue added鈥 made the systems more predictive of achievement growth on state tests, but less reliable. Results differed by grade and subjects; those depicted are for middle school teachers of english/language arts

BRIC ARCHIVE

SOURCE: Bill & Melinda Gates Foundation

Even as they praised the project鈥檚 other insights, some scholars debated the strength of the findings from the random experiment. One glitch: Teachers and administrators didn鈥檛 always comply with the randomization, making it harder to interpret the results.

鈥淲e can only be certain that it鈥檚 a valid predictor of future test scores for those teachers who complied with the assignments,鈥 said Jonah E. Rockoff, an associate professor of finance and economics at Columbia Business School, who has studied teacher-quality issues using economic techniques. Mr. Rockoff was not involved in the study, but reviewed early drafts of the findings.

Taken as a whole, the final MET findings provide much food for thought about how teacher evaluations might best be structured. But they are not likely to end a contentious, noisy debate about evaluation systems, and they are almost certain to be intensely scrutinized, in part because of Gates鈥 separate support for advocacy organizations that have already staked out positions on teacher evaluations.

(The Gates Foundation also provides support for coverage of business and innovation in 91制片厂视频 Week.)

Weighing Measures

The $45 million study, in progress since 2009, is one of the largest and most extensive research projects ever undertaken on the question of how to identify and measure high-quality teaching. It involved some 3,000 teachers in six districts: Charlotte-Mecklenberg, N.C.; Dallas; Denver; Hillsborough County, Fla.; Memphis, Tenn.; and New York City.

Earlier studies released by the MET project had examined three potential measures of teacher quality: observations of teachers keyed to teaching frameworks, surveys of students鈥 perceptions of their teachers, and a value-added method, which attempts to isolate teachers鈥 contributions to their students鈥 academic achievement. Researchers examined the relationship of each measure to students鈥 scores on state standardized tests as well as on a more complex, project-based series of tasks; and to students鈥 feelings of effort and engagement in class.

Each of those measures, the earlier papers stated, had tradeoffs in terms of their reliability and their correlation to the academic and nonacademic outcomes.

One of the four new papers released by the Gates Foundation goes the next step: It examines different ways of weighting those three measures.

It found that those that relied the most heavily on state standardized-test scores appeared to be counterproductive. Those composites tended to be volatile and were also the least predictive of how students taught by those teachers would fare on the more cognitively demanding tasks.

Yet weighting schemes that put the most emphasis on teacher observations were the least predictive of gains on the state test scores, it says.

In all, the study indicates, those that use a more equal mix of components, including between a third and half based on value-added, couple better correlations to the outcome measures with improved reliability.

In a way, the findings indicate that there is no one 鈥渂est鈥 way to weight the measures; instead, that decision will depend on what policymakers most value, whether state test scores or other outcomes.

Randomized Experiment

From the beginning, one of the foundation鈥檚 key goals was to subject promising measures to 鈥渧alidation鈥 through a randomized experiment.

Though infrequently conducted in K-12 education because of logistical problems and expense, random assignment allows researchers to eliminate sources of bias, such as the sorting of students into particular classes, not accounted for using traditional statistical techniques.

The Gates project, with its reach across six districts and thousands of teachers, offered an unusual chance to test the ideas at a scale not seen previously.

For the randomization, researchers in 2009-10 generated estimates of teachers鈥 performance based on composite measures using data from the surveys, prior test scores, and observation scores. Within individual schools, the study randomly assigned a class of students to each of the participating teachers in particular grades and subjects. After a year, then, researchers compared those teachers鈥 actual performance to the estimates.

The results were examined in groups based on the teachers鈥 predicted performance.

In general, the groups of teachers identified as being more effective did, in fact, help the assigned classes of students learn more, producing results on par with what the measures had predicted. They also improved student performance not just on traditional standardized tests but also on the deeper, project-based tasks.

鈥淏ecause of the random assignment, we can be confident that we identified a subgroup of teachers who caused achievement to happen,鈥 regardless of student characteristics, Harvard鈥檚 Mr. Kane said. 鈥淚t鈥檚 sort of a big deal to be able to say that.鈥

Student attrition and other factors, including the refusal of several schools to carry out the randomization despite agreeing to do so, led to relatively high rates of noncompliance. About 66 percent of students in Dallas stayed with their assigned teacher, but only 27 percent of students in Memphis did.

To account for the noncompliance, researchers used a statistical technique known as 鈥渋nstrumental variables鈥 to adjust the results. The technique is widely used in the social sciences.

Scholars had different opinions about how far the findings could be extrapolated.

鈥淭hese results could still be based on a very selective group of teachers,鈥 said Jesse M. Rothstein, an assistant professor of economics at the University of California, Berkeley, who has often been critical of the MET findings. 鈥淚 would love to see a lot more investigating of just who was and wasn鈥檛 complying, and why they were left out.鈥

Douglas N. Harris, a professor of economics at Tulane University, in New Orleans, added that the study didn鈥檛 address some other potential sources of bias. For example, it鈥檚 possible that bias in the value-added estimates for each individual teacher might have been averaged out in the group estimates. (The averaging was done in order to obtain a sufficient sample size, a limit of the random-assignment method.) But most school districts and states using value-added approaches are using individual, not group-level results, he noted.

The study鈥檚 authors also acknowledge that the experiment is limited to comparisons of teachers within, but not across, schools.

鈥淭here are a lot of ways in which there could be a nonrandom assignment of students to teachers,鈥 Mr. Harris said. 鈥淭hey鈥檙e studying some elements of that, but not others.鈥

Teacher Observations

In yet another new finding, the researchers dug deeper into observations of teachers. They examined lessons from a subset of 67 teachersin the Hillsborough, Fla., district, investigating ways to improve the scoring of those lessons.

The researchers found that having different raters score observations of teachers鈥 practice may be a key component for the observations systems: Raters鈥 first perception of a teacher鈥檚 practice tended to influence how they scored additional lessons taught by that same teacher, the study found.

Nearly all teachers scored in the middle categories on the framework studied, the four-tiered Framework for Teaching, a popular tool created in 1996 by consultant Charlotte Danielson, rather than at the top or bottom ones. The researchers struggled to interpret that finding.

鈥淚t could be that observers are simply uncomfortable making absolute distinctions between teachers,鈥 that paper says. 鈥淚t could be that the performance-level standards need to make finer distinctions. Or it could simply be that underlying practice on the existing scales does not vary that much.鈥

Mixed Reception?

Nearly every work product released by the MET researchers thus far has been contested to some degree by observers, and the most recent results are likely to be no exception.

鈥淭hey see this as proof that the more equally weighted, combined measure is superior, but they omit all discussion of the expense and difficulty of collecting the classroom observations and student surveys,鈥 said Jay P. Greene, a professor of education policy at the University of Arkansas. Mr. Greene contends that earlier reports from Gates have veered too far into advocacy.

By contrast, the American Federation of Teachers, whose leader has had an on-again-off-again rapport with Mr. Gates and with the MET project, embraced the final studies.

鈥淭he MET findings reinforce the importance of evaluating teachers based on a balance of multiple measures of teaching effectiveness, in contrast to the limitations of focusing on student test scores, value-added scores, or any other single measure,鈥 AFT President Randi Weingarten said in a statement.

A version of this article appeared in the January 16, 2013 edition of 91制片厂视频 Week as Multiple Gauges Best for Teachers

Events

Recruitment & Retention Webinar Keep Talented Teachers and Improve Student Outcomes
Keep talented teachers and unlock student success with strategic planning based on insights from Apple 91制片厂视频 and educational leaders.鈥
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of 91制片厂视频 Week's editorial staff.
Sponsor
Families & the Community Webinar
Family Engagement: The Foundation for a Strong School Year
Learn how family engagement promotes student success with insights from National PTA, AASA鈥痑nd leading districts and schools.鈥
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of 91制片厂视频 Week's editorial staff.
Sponsor
Special 91制片厂视频 Webinar
How Early Adopters of Remote Therapy are Improving IEPs
Learn how schools are using remote therapy to improve IEP compliance & scalability while delivering outcomes comparable to onsite providers.
Content provided by 

EdWeek Top School Jobs

Teacher Jobs
Search over ten thousand teaching jobs nationwide 鈥 elementary, middle, high school and more.
Principal Jobs
Find hundreds of jobs for principals, assistant principals, and other school leadership roles.
Administrator Jobs
Over a thousand district-level jobs: superintendents, directors, more.
Support Staff Jobs
Search thousands of jobs, from paraprofessionals to counselors and more.

Read Next

Teaching Profession This Teacher Is in Her 64th Year in the Classroom. She Has No Plans to Quit
Martha Strever has, in some cases, taught three generations of students from the same family.
8 min read
Martha Strever, a math teacher at Linden Avenue Middle School in Red Hook, N.Y., addresses her class on Sept. 6, 2024.
Martha Strever, a math teacher at Linden Avenue Middle School in Red Hook, N.Y., addresses her class on Sept. 6, 2024.
Flynn Larsen for 91制片厂视频 Week
Teaching Profession What the Research Says Teachers Want Sustainable Workplaces. State Policies Make it Harder
Greater opportunities for collaboration could boost teacher retention, national group finds.
3 min read
Rear view of classroom with two teachers in front of a whiteboard with math equations.
E+/Getty
Teaching Profession STEM Career Changer Challenges: Grading, IEPs, and Learning Differences
When STEM professionals get into the classroom, they run into challenges that might be unique to career switchers.
3 min read
Image of a classroom with STEM topics on the back wall.
Laura Baker/Educaton Week via Canva
Teaching Profession Q&A A Job in the White House Didn't Prepare This Teacher for Returning to the Classroom
Former science teacher and Obama adviser Steve Robinson says STEM teachers need more support after they enter the classroom.
5 min read
Image of a man in a suit entering a public school building.
Laura Baker/91制片厂视频 Week via Canva