Beyond the Classroom: The history and challenges of standardized testing

This is the first of a two-part series on standardized testing. The second part will run next week.

By definition, a standardized test is a test that is administered and scored in a consistent manner.

There are two major kinds of standardized tests: A standardized aptitude test is used to predict how well a student is likely to perform in a future educational setting such as a college and graduate school (SAT, GRE, ACT). The score of a standardized achievement-test is used to evaluate a student’s progress and indirectly, the effectiveness of the educator and educational system.

Standardized achievement tests (FCAT, Florida State Assessment) permit an inference to be made about the knowledge or skills that a student has in a particular content area. The results can be compared to a national sample of students on the same level/age.

Although a standardized test is perceived as being more fair than a nonstandardized test and permits data comparisons across all test takers, the actual standardized test makers and test takers are by no means standardized.

First, some history.

The earliest recorded standardized test was in China (Han Dynasty, 100-220 A.D.). Known as the Imperial examinations, the tests covered the Six Arts (music, archery and horsemanship, arithmetic, writing, and knowledge of the rituals and ceremonies).

Standardized testing was introduced into Europe in the early 19th century. The West, skeptical about standardized testing, continued to favor traditional open-ended discussion and debate inherited from Ancient Greece and evaluated their students by their essays. British India, meanwhile, faced with expanding commerce, adopted standardized testing as a way to efficiently hire and promote employees and to prevent corruption and favoritism. Standardized testing spread from Britain, to Europe and then to the U.S where it was used during large immigration waves to assess social roles.

Following the Industrial Revolution, open-ended assessments began their decline. Lack of standardization produced significant data measurement errors and interpretive ambiguities due to favoritism, disagreements over the merits of answers or cultural and language differences.

The publication of “A Nation at Risk” in 1983 drastically changed the educational landscape by holding teachers and students accountable to higher standards. By the 1990s, tests were created to measure the progress of these standards. Although the tests varied, it was the first time teachers felt the inextricable link between policy and testing.

High-stakes standardized testing deepens the chasm between educators and policy makers and distracts us from attending to the true crises in education. And since money speaks volumes, testing will likely influence education for years to come.

Here are some of the key issues:

▪ Cost and Implications: In a 2012 report, standardized-testing regimens cost states some $1.7 billion a year overall (0.25 percent of total U.S. K-12 spending).

▪ The test: It is cheaper and faster to create a computer graded multiple-choice test compared to a subjective essay or performance assessment graded by trained personnel. Critics say that multiple choice test rewards quick answers to superficial questions and do not measure depth or creativity.

▪ The test makers: Private testing companies that control standardized testing operate behind closed doors with little to no public accountability. These companies write and grade the tests, and publish the books used to prepare for the tests. This has great implications for the school districts that can’t afford to buy them.

▪ Government funding and incentives: Governing bodies desire meaningful comparisons within the public education system. Following the 1965 law requiring annual standardized assessments of public school students, the No Child Left Behind Act (2002) hinged federal funding and autonomy to the outcomes of school based exams. The Obama administration with its Race to the Top incentive dangled an additional $4.3 billion in front of school administrators to offer millions of dollars to states who provided test results to help identify effective teachers.

▪ Accountability: Policy makers use standardized testing results to raise teacher accountability and improve student achievement. W. James Popham in his article, “Why Standardized Tests Don’t Measure Educational Quality,” states that the sole use of test standardized test data to judge both an educator’s and a school’s proficiency is misguided. Many feel that the “high scores — effective, low scores — ineffective” perception is flawed.

▪ Use of data: Inferences taken from standardized tests are used to evaluate a student’s mastery of knowledge or skills in a particular subject. Interpretation of this data is crucial for remediation when used correctly and in a timely fashion, but can also cause unintended consequences for students, teachers and the schools themselves.

▪ Effect on instruction: Do gains on a standardized test actually reflect improved knowledge or do they simply reflect test preparation? Critics contend that many academic gains are superficial and short term, and that test preparation curtails best practices in teaching. The resounding opinion of testing on classroom instruction is that an inordinate amount of time is spent on tested subjects (reading and math) at the sacrifice of hands-on science, the arts, history and life skill curriculum.

▪ The entirety of a child’s knowledge/skill set is vast and diverse. Can a standardized test fairly evaluate it? The objective part of a standardized test remains the computer generated score. Deciding what items to include, how questions are worded, how the test is administered and the uses of exam results are all made by subjective human beings.

▪ Comparative interpretations vs. judging educational quality: Societal inequality, systemic selection bias, overwhelming data noise and issues with the tests themselves are pulled into question as to their ability to judge the effectiveness of schools or teachers.

International leaders in academics elsewhere, such as in Finland, use teacher observation, documentation of student work and performance-based assessment, instead of large-scale standardized testing.

Popham likens the current use of standardized achievement tests to determine educational quality to using a tablespoon to measure temperature. He shares three reasons why standardized tests should not be used to judge educational quality.

1. Testing vs. teaching mismatches: Standardized test making companies generate revenue for their shareholders. If you possess their teaching materials you are more likely to teach what is on the test.

2. Elimination of important test items: Most of the items on standardized achievement tests are “middle difficulty” items. The better the job a teacher does in teaching an important tested concept or skill, or the better a student performs on a tested item, the less likely those items will be tested in the future.

3. Misguided Focus: Performance on standardized test scores are known to be influenced by three main factors, only one of which is related to quality of instruction: what’s taught in school, a student’s native intellectual ability and a student’s out-of-school learning.

As Albert Einstein said “The only thing that interferes with my learning is my education.”

Laurie Futterman ARNP is a former Heart Transplant Coordinator at Jackson Memorial Medical Center. She now chairs the science department and teaches gifted middle school science at David Lawrence Jr. K-8 Center. She has three children and lives in North Miami.

