Robert Krampf’s first e-mail to Florida’s Department of Education was cordial, even as he raised troubling allegations that poorly written FCAT Science exam questions could be grading students as wrong even when they chose right answers.
In a gesture of cooperation, Krampf asked if there was “anything that I can do to help,” and the well-known science educator (nicknamed “The Happy Scientist”) signed off with his usual cheery closing of “have a wonder-filled day.”
Things got less cordial from there.
Internal government e-mails show the state dismissed Krampf as “only a blogger,” even though his popular educational science videos are regularly used by teachers and school districts nationwide and across Florida. A frustrated Krampf complained on his teaching website — prompting multiple media reports that were unflattering to the state and the FCAT.
Finally, after almost a year of Krampf feuding via e-mail with state education leaders, Florida belatedly acknowledged its errors, and the state now uses new scientific definitions and FCAT sample questions to replace those that were either unclear or just plain wrong.
Krampf’s one-man crusade to fix the FCAT only further fueled longstanding criticisms of a test-driven educational system that affects millions of students and thousands of teachers and schools throughout Florida.
Students who fail the Reading FCAT are at risk of being held back or not graduating high school. Teacher evaluations are, in part, based on FCAT scores. And schools receive coveted “A” grades based on cumulative FCAT scores; schools that get repeated Ds or Fs can be forced to close. This year’s letter grades will be released over the summer.
With so much riding on FCAT scores, the accuracy of the test is hugely important. But Krampf’s findings call that accuracy into question.
Krampf focused on the FCAT test-writing guidelines, as opposed to the actual exam, which the state doesn’t release. The guidelines, which are produced by the state, are worth scrutinizing because they’re used as a reference tool when private companies create the actual test.
Among the problems Krampf found:
- The term “Germination” was defined, in part, as “the process by which plants begin to grow from seed to spore.” That simply never happens — there are no plants that go from seed to spore.
- A fifth-grade sample question asked how flowers would respond to light coming in through a window. The correct answer was “the flowers would lean toward the window,” but Krampf said it’s usually leaves (and not flowers) that lean toward sunlight. Also, Krampf found that one of the “incorrect” answers (that the flowers would begin to wilt in the sun) is, in fact, correct.
- One fifth-grade sample FCAT question featured an illustration of a giant panda paw, which has a sixth finger (other types of bears have only five fingers). The question asks students what this thumb-like sixth finger would be most useful for, with the correct answer being to “hold the bamboo stalks it feeds on.”
Incorrect answers have the panda climbing with the thumb, digging with the thumb, or crushing the bamboo with it. Krampf argued that people do, in fact, use their thumbs for climbing, digging and crushing items — making the question confusing.
Initially, state education administrators defended some mistakes, and resisted changing them. One justification cited by the state: although some of the items were scientifically incorrect, accuracy sometimes had to be sacrificed in order to use “grade-appropriate” terms for 5th or 8th graders.
Krampf wrote back: “If the Science FCAT tests have used the same flawed philosophy of allowing questions to have multiple correct answers because 5th graders are not expected to be that smart, then there are almost certainly students who ‘failed’ the FCAT by giving answers that were correct. That means that there are probably quite a few schools that have been labeled as F schools that actually earned a higher mark.”
Then, in December of last year — 10 months after Krampf’s first e-mail — the state backed down, and altered its guidelines to fix the disputed definitions and sample questions.
Still, Florida insists those errors never made it onto an actual exam, in part because the real test questions go through a highly-intensive screening process.
“The system we have has worked very well, really,” said Cheryl Etters, a spokeswoman for the Florida Department of Education. She said the multiple layers of test question review (done by Florida educators, science professors, and outside independent auditors) “should be a reassurance to the public.”
“We’ve done those kinds of things to make sure that the test is valid,” Etters said.
Florida’s screening process for the FCAT guidelines, however, has historically been less rigorous. But after Krampf blasted the guidelines for their numerous problems, the state will now subject its guidelines to the same fact-checking as real test questions.
And if there are any test questions that don’t match up with Florida’s new-and-improved FCAT guidelines, they are automatically removed from the exam, the state says.
Earlier this month, Florida released its annual FCAT Science results, with student science scores lagging slightly behind scores in reading and math.
Statewide, roughly 53 percent of this year’s 5th graders scored at least “proficient” on the science test, with roughly 47 percent of 8th graders doing so. Miami-Dade and Broward counties performed a bit worse: in 5th grade, 51 percent of Miami-Dade students passed, and 49 percent passed in Broward. For 8th grade, the passage rate was 42 percent in Miami-Dade and 46 percent in Broward.
But Krampf says he doesn’t believe those FCAT numbers, despite the state’s assurances. Because Florida, like most states, doesn’t release its standardized tests, there is no way for Krampf or anyone else to double-check whether confusing questions or sloppy definitions were on the actual test.
Krampf says it’s difficult to accept — without any proof — that the tests are completely error-free when the guidelines that served as their blueprint were so flawed. He also cites the heavy bureaucratic resistance he encountered in trying to fix the guidelines — further evidence, he says, of state leaders’ inability to police themselves.
“I don’t know why they’re so afraid to admit that the stuff is wrong,” Krampf said. “But that makes me suspect that the same paranoia, and the same denial, is taking place with the actual FCAT.”
Bob Schaeffer of FairTest — an organization that advocates against the “misuse” of standardized tests — says the time has come to release FCAT exams, in the name of transparency.
“The politicians and the testing companies are saying ‘Trust us,’” Schaeffer said. “There’s an awful lot of reasons not to.”
In the realm of college admissions tests, there is less secrecy — all SAT exams are eventually made public through practice tests, and about half of SAT exams in a given year are made available following students’ completion of the tests. And in New York State, a bipartisan bill was recently filed in the Legislature that would require K-12 tests to be released too. It’s too soon to predict if that legislation will pass.
Etters, the Florida education spokeswoman, said releasing FCAT exams would add considerable time and expense to producing the tests — forcing test writers to come up with an entirely new batch of questions each year, instead of the existing practice of using some questions repeatedly.
The current system, with its various stages of review, strikes “a good balance,” Etters said.
Florida already spends considerable sums on testing — the state has a five-year $254 million contract with London-based Pearson to manage its FCAT exams.
Only one FCAT Science test, an 8th-grade exam from 2007, has ever been publicly released. Krampf says that test is far from perfect.
One question on the test, which Krampf questioned state leaders about, involved a hypothetical experiment using a glass of tea. Students were told to use tea bags, sugar, water, and stirring rods, and were asked what variables must stay the same to maintain the integrity of the results. The state’s “correct” answers were the amount of tea, sugar and water.
Krampf asked: what about the amount of time spent stirring the tea? Surely, that must stay the same. Would a student get credit for writing that?
Florida’s Department of Education initially replied that it was “highly likely” that students who answered “stirring” would have been marked correct. But when Krampf asked for the guidelines given to test graders, to verify that stirring was accepted, the state effectively ended the conversation.
“Unfortunately, I do not have easy access to the documents you are requesting,” wrote Sharon Koon, director of Florida’s office of assessment. “To do so, I would need to pull staff off of critical work tasks to provide the answers to your questions.”
Responses like that have left Krampf convinced that the state’s top priority is protecting the FCAT, rather than making sure the test accurately evaluates what students have learned.
“I don’t know that they’re really seriously evaluating anything,” he said.