'No Effects' Studies Raising Eyebrows

Save to favorites
Print

Copy URL

Like a steady drip from a leaky faucet, the experimental studies being released this school year by the federal Institute of 91制片厂视频 Sciences are mostly producing the same results: 鈥淣o effects,鈥� 鈥淣o effects,鈥� 鈥淣o effects.鈥�

The disappointing yield is prompting researchers, product developers, and other experts to question the design of the studies, whether the methodology they use is suited to the messy real world of education, and whether the projects are worth the cost, which has run as high as $14.4 million in the case of one such study.

Purpose: To compare outcomes for children in grades 4-8 who had been randomly assigned to receive or not receive services through the department鈥檚 student-mentoring grants program.

Date Issued: Feb. 25, 2009

Results: No overall, statistically significant effects were found for any of the 17 measures studied, although some positive effects appeared for certain subgroups of students.

Purpose: To compare four mathematics curricula that reflect different approaches to teaching that subject in the early grades.

Date Issued: Feb. 24, 2009

Results: Statistically significant, positive effects were found for two programs, but none for the other two.

Purpose: To evaluate the effects of 10 commercial software products used at various grade levels.

Date Issued: Feb. 17, 2009

Results: Only one model produced statistically significant test-score gains across both years of the study. Two algebra programs produced positive effects in classrooms that had used the programs two years in a row.

Purpose: To compare the achievement of elementary school children, in the same grades and the same schools, randomly assigned to teachers trained through either traditional education schools or alternative-route programs.

Date Issued: Feb. 19, 2009

Results: No statistically significant differences were found between the two groups

Source: U.S. Department of 91制片厂视频

But proponents of the methodology say those critics ought to pay more attention to the message than to the messenger.

鈥淚 just think that鈥檚 the way the world works,鈥� said Jon Baron, the executive director of the Coalition for Evidence-Based Policy, a Washington-based advocacy group. 鈥淭he good news is that some things do work, and those are the things we should focus on and scale up.鈥�

The studies are part of a new generation of so-called 鈥渟cientifically based鈥� research that was set in motion by the institute鈥攖he main research arm of the U.S. Department of 91制片厂视频鈥攚hen it was created in 2002.

The body of research employs a study design called 鈥渞andomized controlled trials,鈥� in which subjects are randomly assigned to either an experimental group or a business-as-usual group. Although rarely used in education before the wave of studies backed by the IES, such designs are widely considered to be the 鈥済old standard鈥� for determining whether an intervention works.

Of the eight such studies released by the federal institute this academic year, six have produced mixed results pointing to few, or no, significant positive effects on student achievement.

They include studies on: school-based mentoring programs in elementary school; commercial software programs for teaching mathematics; various certification routes for teachers; teacher-induction programs; interventions for boosting literacy instruction for disadvantaged preschoolers and their families; and professional-development initiatives in reading.

In addition, the research agency鈥檚 final evaluation of the federal Reading First program, which uses a research design that differs slightly from the randomized controlled approach, found that the $6 billion federal reading program improved young children鈥檚 decoding skills, but failed to make dramatic differences in reading comprehension.

On the other hand, an ongoing study of 鈥渄ouble dose鈥� reading classes for struggling 9th grade readers is showing positive results. And a head-to-head comparison of four different elementary math curricula identified two, philosophically different programs that gave 2nd graders an added boost in that subject over the standard curricula.

鈥楾in鈥� Standard?

Still, the overall results are leading some experts to question the value of the recent spate of randomized controlled studies.

鈥淚t鈥檚 not a bad idea to get people more organized and more motivated to do more experimental studies,鈥� said Linda Darling-Hammond, a Stanford University education professor and the former lead adviser on President Barack Obama鈥檚 education transition team. 鈥淏ut we鈥檙e spending a lot of money on some pretty poor designs which are not likely to give us results. It鈥檚 as though in the education community we鈥檝e taken the gold standard and turned it into a tin standard.鈥�

Ms. Darling-Hammond points out that at least two of the studies鈥攐ne on school-based mentoring and one that compared teachers who were alternatively certified with those who had come to the classroom by more traditional routes鈥攄id not have 鈥渃lear treatments.鈥� In other words, the control group and the treatment group were too similar, in her view, in important respects.

In the case of the teacher-certification study, for instance, some of the alternatively certified teachers had taken as many education courses as peers who had graduated from education schools.

In the mentoring study, which focused on school-based mentoring programs for students in grades 4-8, 35 percent of the students in the control group received mentoring services anyway. Fourteen percent of the students in the mentoring group never got matched up with a mentor.

Another scholar, Sean P. Corcoran, an assistant professor of economics at New York University, worries that the studies, many of which have been set in schools with high concentrations of poor students, aren鈥檛 producing findings that apply to a wide range of educational settings. 鈥淲hat most policymakers are looking for is: What will work in my school?鈥� he said.

The teacher-certification study is a case in point, Mr. Corcoran said.

鈥淭he schools sampled were those that routinely hired alternatively certified teachers, and those tend to be hard-to-staff schools to begin with,鈥� he said.

In such hard-to-staff schools, where research has long shown that teacher quality is comparatively weak, it鈥檚 no surprise that the alternatively trained teachers were just as effective as those who had taken more traditional routes into the classroom, he added.

Yet study readers may come away with the impression that the findings offer a broader indictment of traditional education school training. 鈥淚nterpretation is the biggest problem,鈥� Mr. Corcoran said. 鈥淚t鈥檚 not that these are poorly designed studies.鈥�

鈥楧osage鈥� at Issue

Michael Milone, a Placitas, N.M.-based assessment specialist who helped develop some of the programs tested in the educational software study, faults that study for paying too little attention to whether teachers were using the programs or not. ( 鈥淩eading, Math Software Found to Have Little Effect on Scores,鈥� March 18, 2009.)

鈥淚n looking at these complex evaluation studies, it鈥檚 almost like no one looks at things like how many of the kids show up,鈥� he said.

Purpose: To measure and compare the impact of two programs that aim to improve struggling 9th graders鈥� literacy achievement by providing an extra reading class during the school day.

Date Issued: November 2008

Results: Both programs were shown to have a statistically significant positive effect on student achievement.

Purpose: Evaluate the impacts of programs used in 17 districts to provide support for beginning teachers in elementary schools.

Date Issued: October 2008

Results: No statistically significant differences were found between the treatment and control groups in terms of student achievement, teachers鈥� practices, or retention rates for teachers.

Purpose: To find out whether federal Even Start programs with a heavier emphasis on literacy instruction will lead to better outcomes for children and families.

Date Issued: September 2008

Results: For all seven measures of literacy and language, there were no statistically significant differences between children getting more literacy-rich instruction and those in regular Even Start programs. The program did lead to improvements in parenting skills, though, as well as in children鈥檚 social skills.

Purpose: To weigh the impact of two professional-development programs鈥攐ne with added support from school-based coaches and one without鈥攂oth aimed at improving teachers鈥� knowledge of 鈥渟cientifically based鈥� practices for teaching reading.

Date issued: September 2008

Results: Although teachers鈥� knowledge grew, there were no differences in test scores after one year between 2nd graders whose teachers took part in the programs and their peers whose teachers did not. Having reading coaches available for teachers produced a small positive effect, but it was not statistically significant.

Source: U.S. Department of 91制片厂视频

The educational technology study tracked the number of hours teachers used the software, for example. 鈥淏ut you also want to know how many kids work on the program,鈥� Mr. Milone said. 鈥淚s the dosage intensity and duration appropriate for that student?鈥�

The analogy in medicine, he said, might be to evaluate a drug that patients don鈥檛 take as prescribed. 鈥淚f it doesn鈥檛 work,鈥� he added, 鈥渨hat does that say about the medication?鈥�

Limits to Uses

Though he was involved briefly in early partnerships with the federal research agency to make greater use of randomized studies, Harris M. Cooper agrees that randomized controlled trials, like any research design, have limitations. One is that they are better at picking up short-term effects than they are at measuring long-term results.

The studies are also better suited to detecting the effects of highly specific interventions than they are at broader education improvement efforts farther removed from the classroom, experts say.

鈥淩CTS can be oversold, but at the same time, they are a critical part of our research arsenal, and the best approach to getting our arms around a problem, especially if they are involved with multiple, complementary [research] methods,鈥� added Mr. Cooper, a professor of education, psychology, and neuroscience at Duke University in Durham, N.C.

Indeed, various panels of the National Academies, a key source of advice to Congress on scientific matters, have concluded that, when it comes to determining cause and effect, randomized controlled trials are the most effective research design to use.

What they cannot do, though, is reveal what鈥檚 happening inside the 鈥渂lack boxes鈥� of classrooms.

While randomized studies were underutilized in education for a long time, Mr. Cooper said, 鈥淚 think I would also like to see proponents be appropriately humble about what these studies can tell us.鈥�

Lessons Learned

For their part, federal education officials say the randomized studies carried out so far have often focused on disadvantaged, inner-city schools because that is where the need for reliable solutions to education problems is greatest. And, in the case of the teacher-certification study, that is where the alternatively certified teachers are.

If some of the experiments were less concerned with fidelity to the intervention, officials add, that reflected an intentional decision to study how educational practices are used鈥攐r not used鈥�-in the real world, rather than in environments controlled by program developers.

鈥淟ots of social programs are less effective than people think,鈥� said Grover J. 鈥淩uss鈥� Whitehurst, who headed the Institute of 91制片厂视频 Sciences from its start until last November. 鈥淚 think it鈥檚 in the nature of evaluation science to find more inconclusive findings than positive findings, and that鈥檚 informative. If you鈥檙e spending a lot of money on something that鈥檚 believed to be effective, and now you have questions about its effectiveness, then I think it鈥檚 a positive thing.鈥�

Finding positive effects is also more challenging in education, because typically students in both the treatment and control groups are making academic progress.

鈥淚t鈥檚 not a question of whether a particular new intervention is efficacious at all,鈥� said Richard J. Murnane, a professor of education and society at the Harvard Graduate School of 91制片厂视频. 鈥淚t鈥檚 a question of whether it鈥檚 better than what we would鈥檝e been doing otherwise.鈥�

Randomized studies will also be more useful as they become part of an ongoing program of research, Mr. Murnane said. For example, when a large randomized study found that students who participated in after-school programs received no special boost in test scores, compared with those not participating, the IES underwrote a second study to see what would happen if those after-school programs had a stronger academic component.

The federal research agency receives no special allocation, though, that would enable it to build a thoughtful, long-term plan of study, according to Mr. Whitehurst, who now directs the Brown Center on 91制片厂视频 Policy at the Washington-based Brookings Institution.

鈥淪o IES ends up doing a bunch of one-off evaluations that are either desired by Congress or desired by some other program office in education that has money available,鈥� he said. 鈥淭hat makes us like MDRC or Mathematica: a research organization that does mostly studies that somebody else is able and willing to fund.鈥�

Phoebe H. Cottingham, the commissioner of the institute鈥檚 National Center on 91制片厂视频 Evaluation and Regional Assistance, which oversees many of the large-scale studies, said policymakers have learned some lessons about choosing educational models to be tested with randomized studies.

鈥淪ome of them were based on what are fairly weakly supported ideas,鈥� she said. 鈥淚t doesn鈥檛 mean you鈥檙e going to get an effect just because something worked in one efficacy trial.

鈥淲e think we鈥檙e going to have more luck with the next cohort of studies,鈥� she added.

Debra Viadero

Assistant Managing Editor, 91制片厂视频 Week

Debra Viadero was an assistant managing editor for 91制片厂视频 Week.