91制片厂视频

Opinion
Assessment Opinion

鈥楽tandardized,鈥 You Say?

By Todd Farley 鈥 November 17, 2008 5 min read
  • Save to favorites
  • Print
Email Copy URL

滨鈥檓 always amazed by the certainty with which staunch advocates of standardized testing view the results of those large-scale assessments. This past September, for example, 鈥渁ccording to the Nation鈥檚 Report Card, since 2000, more kids are learning reading and math.鈥 She made the claim as if it were indisputable fact.

BRIC ARCHIVE

I, meanwhile, have spent the last 14 years scoring student responses to open-ended questions on standardized tests (including nearly yearly work on the tests called 鈥渢he nation鈥檚 report card鈥), so I view any such results with considerably more skepticism. In fact, 滨鈥檓 not certain the industry that鈥檚 employed me for the last decade-and-a-half has ever produced the results for a test鈥攁s far as I can tell, they鈥檝e only produced results.

There鈥檚 not enough column space in this newspaper to list the myriad discrepancies I鈥檝e seen in the scoring of short-answer/essay questions on 鈥渟tandardized鈥 tests, but in my opinion, test scoring is akin to a scientific experiment in which everything is a variable. Everything. In my experience, the score given to every open-ended response, and ultimately the final results given to each student, depended as much on the vagaries of the testing industry as they did on the quality of student answers.

To start, those student scores would depend on the scoring center where a test was read, whether one in Iowa populated with liberal whites, one in Arizona filled with conservative senior citizens, or one in Virginia peopled more with African-Americans and military personnel.

Those student scores also would depend on what point in a project a test was assessed (either before some rule got changed, or after), what time of day it was read (hopefully not until after the morning coffee had kicked in, but before the fog of daily boredom had crashed down), and what cubicle it was sent to (one whose trainer was more stringent in interpreting scoring rules, or one whose trainer had a more tolerant perspective).

Ultimately, those scores would depend on which temporarily employed 鈥減rofessional scorer鈥 assessed each student response鈥攚hether one of those workers who actually understood the rules and doled out the points accordingly or, more likely, one of the dingbats and dilettantes I worked with over the years who pretty much had no idea what they were supposed to be doing. Seriously, who else does anyone think is doing that short-term, high-stress, low-paying job?

During my career, I did work with plenty of temporary scorers who were intelligent and accomplished people, including those working part time as they went to law school or medical school, teachers working night shifts after a day in the classroom, one guy whose debut short-story collection was already on the shelves of Barnes & Noble, and another running the 400-meter in the Atlanta Olympics. Mostly, however, I worked with people who were not particularly smart or accomplished. I worked with scorers who, for example, were too daft to recall the scoring rules鈥攆rom 1994, when one friend could never remember 鈥渞iding in a single file鈥 was an acceptable bike-safety rule, to 2007, when an avuncular co-worker was forever stumped that he could credit 鈥渘o hope for the future鈥 but not 鈥渘o hope for the past.鈥

I worked with scorers whose knowledge of the English language was so suspect I doubted their ability to comprehend any student response, let alone to recognize the subtleties of either proper English or its American vernacular (a Japanese woman not knowing that 鈥渋rksome鈥 has a negative connotation, a Middle Eastern man failing to understand what 鈥済rossed out,鈥 鈥渂ummed out,鈥 or 鈥渇eeling it鈥 meant). I worked with scorers who continued to score tests after completely failing to understand the training, either because of physical ailments (a guy who had 25 percent hearing loss, another with limited short-term memory), a lack of common sense (people crediting 鈥渄irt,鈥 鈥渕ud,鈥 or 鈥淪tyrofoam鈥 as a student鈥檚 favorite food), or perhaps the onset of senility (one grandmother giving grades of 4 to student responses that were obviously 1s, and 1s that were obviously 4s).

I worked with a scorer who told me his 鈥渞eal job鈥 was as an 鈥渦ltimate fighter鈥 (those bruisers who crawl into an octagonal ring to engage in bare-knuckled brawling for the enjoyment of the American viewing public). And while he was a nice guy, his mind worked about as quickly as you鈥檇 expect from someone who鈥檇 gotten punched in the head a lot. After three weeks of scoring student responses to a state reading test, he waved me over to his computer to ask me exactly what he was being tested for. Was it psychological, he wondered? I had to explain to him鈥攁fter he鈥檇 been working for 15 days and had scored thousands of student responses鈥攖hat he wasn鈥檛 being tested, the students were.

滨鈥檓 deciding if the kids did good?鈥 he asked. 鈥滨鈥檓 deciding if they鈥檙e smart or not?鈥

鈥淏asically, yes,鈥 I smiled. 鈥淵es you are.鈥

鈥淲ow,鈥 he said, shaking his head in disbelief. 鈥Me? Wow.鈥 Wow indeed.

I don鈥檛 want to be too smug about my own superiority or too unkind about people who have been my fellow scorers. Still, it鈥檚 important to remember, when talking about the almighty standardized test, that many of the people who will actually read and assess student responses might have ended up at a scoring center because they had trouble getting a job elsewhere. They had college degrees and time on their hands, so they found work scoring standardized tests.

And lest anyone think 滨鈥檓 being hyperbolic, doubting that such a motley crew of ne鈥檈r-do-wells could ever do the sort of professional work the assessment industry surely guarantees, do remember that the statistics that prove the 鈥渟tandardization鈥 of the test-scoring process are numbers often controlled by temporary supervisors whose lives are made easier by producing just such results. In other words, if a supervisor and his scorers can鈥檛 go home some day until the team鈥檚 鈥渞eliability percentage鈥 (agreement between scorers) hits 80 percent, trust me, that threshold will be reached. It鈥檚 not that hard. As a friend of mine once quipped, 鈥淚 can make statistics dance.鈥

Dancing statistics, however, are a story for another day. Let鈥檚 conclude here simply by agreeing that the only certainty there should be regarding standardized-test scores is the certainty they鈥檙e not indisputable.

Related Tags:

A version of this article appeared in the November 19, 2008 edition of 91制片厂视频 Week as 鈥楽tandardized,鈥 You Say?

Events

Recruitment & Retention Webinar Keep Talented Teachers and Improve Student Outcomes
Keep talented teachers and unlock student success with strategic planning based on insights from Apple 91制片厂视频 and educational leaders.鈥
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of 91制片厂视频 Week's editorial staff.
Sponsor
Families & the Community Webinar
Family Engagement: The Foundation for a Strong School Year
Learn how family engagement promotes student success with insights from National PTA, AASA鈥痑nd leading districts and schools.鈥
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of 91制片厂视频 Week's editorial staff.
Sponsor
Special 91制片厂视频 Webinar
How Early Adopters of Remote Therapy are Improving IEPs
Learn how schools are using remote therapy to improve IEP compliance & scalability while delivering outcomes comparable to onsite providers.
Content provided by 

EdWeek Top School Jobs

Teacher Jobs
Search over ten thousand teaching jobs nationwide 鈥 elementary, middle, high school and more.
Principal Jobs
Find hundreds of jobs for principals, assistant principals, and other school leadership roles.
Administrator Jobs
Over a thousand district-level jobs: superintendents, directors, more.
Support Staff Jobs
Search thousands of jobs, from paraprofessionals to counselors and more.

Read Next

Assessment From Our Research Center It's Hard to Shift to Competency-Based Learning. These Strategies Can Help
Educators are interested in the model and supportive of some of its key components, even if largely unfamiliar with the practice.
6 min read
A collage of a faceless student sitting and writing in notebook with stacks of books, math equations, letter grades and numbers all around him.
Nadia Radic for 91制片厂视频 Week
Assessment Explainer What Is Standards-Based Grading, and How Does It Work?
Schools can retool to make instruction more personalized and student-centered. But grading is a common sticking point.
11 min read
A collage of two faceless students sitting on an open book with a notebook and laptop. All around them are numbers, math symbols and pieces of an actual student transcript.
Nadia Radic for 91制片厂视频 Week
Assessment Letter to the Editor Are Advanced Placement Exams Becoming Easier?
A letter to the editor reflects on changes to the College Board's Advanced Placement exams over the years.
1 min read
91制片厂视频 Week opinion letters submissions
Gwen Keraval for 91制片厂视频 Week
Assessment Opinion 鈥楩ail Fast, Fail Often鈥: What a Tech-Bro Mantra Can Teach Us About Grading
I was tied to traditional grading practices鈥攗ntil I realized they didn鈥檛 reflect what I wanted students to learn: the power of failure.
Liz MacLauchlan
4 min read
Glowing light bulb among the crumpled papers of failed attempts
iStock/Getty + 91制片厂视频 Week