In the political world, the promise of data — whether it’s Nate Silver’s spot-on election predictions or President Obama’s clearinghouse of government information, Data.gov — is that we no longer have to take so much on faith. “What do the data show?” is the new “What do you think?,” the new “Is this a good idea?”
But belief in the clarifying power of data is its own kind of faith, and it is one Obama has embraced, even before winning the presidency. And now, with the revelation that the National Security Agency is processing huge caches of telephone records and Internet data, the American public is being asked to take on faith how data — and how much data — is being gathered and used in Washington.
The “big data” presidency transcends intelligence-gathering and surveillance, encompassing the White House’s approach on matters from healthcare to reelection. A big-data fact sheet put out by the White House in March 2012 — upon the launch of its $200 million Big Data Research and Development Initiative — listed more than 85 examples of such efforts across a number of agencies. They include the CyberInfrastructure for Billions of Electronic Records (CI-BER), led in part by the National Archives and the National Science Foundation, and NASA’s Global Earth Observation System of Systems (GEOSS), which the fact sheet described as a “collaborative, international effort to share and integrate Earth observation data.” And the Defense Department is putting about $250 million a year into the research and development of such projects — “a big bet on big data,” as the White House called it.
“In the same way that past federal investments in information-technology R&D led to dramatic advances in supercomputing and the creation of the Internet,” said a statement from John P. Holdren, director of the White House Office of Science and Technology Policy, “the initiative we are launching today promises to transform our ability to use Big Data for scientific discovery, environmental and biomedical research, education, and national security.”
This constant emphasis on data-driven decision-making is, in some respects, a deliberate break from the George W. Bush years, the revenge of the “reality-based community” that a Bush aide famously disdained, describing its members as people who believe that “solutions emerge from your judicious study of discernible reality.” The White House’s embrace of big data is meant to suggest that ideology is less important than inarguable facts. In some ways, this faith in data over ideology defines what it means to be part of Team Obama.
Faith in data’s power is, no doubt, part of Obama’s political genealogy. Both his 2008 campaign and last year’s reelection bid made extensive use of organized and analyzed information. (His team’s data-mining and microtargeting became one of the big stories of those campaigns.) Obama campaign folks dismiss the idea that they were using data to sell the president like soda pop by burrowing into our brains with targeted appeals. In campaign politics, they say, the power of data is in making the most of resources, whether ad dollars or volunteer enthusiasm.
Generally, though, the conversation went something like this: Our mastery of data is (1) world-changingly powerful and (2) not something the public should worry about too much.
Obama didn’t first learn about the power of data on the campaign trail. One of his legislative accomplishments during his brief time in the Senate was a bill co-authored with Sen. Tom Coburn, R-Okla., that called for the creation of an online federal spending database. At the time, they called it “a significant tool that will make it much easier to hold elected officials accountable for the way taxpayer money is spent.” (The result: USAspending.gov.) The transparency bill helped establish Obama’s bona fides as bipartisan as well as tech-savvy.
And this reputation carried over into the White House. Cass Sunstein, Obama’s first-term regulatory czar and now a law professor at Harvard, often said that regulations should be “evidence-based and data-driven.” Meanwhile, U.S. Chief Technology Officer Todd Park, an assistant to the president, declared that “we are witnessing the emergence of a data-powered revolution in healthcare” in the lead-up to the latest Health Datapalooza, an annual conference showcasing innovations in the use of data by companies, academics and government agencies. Data in the hands of both patients and medical practitioners, Park argues, has the power to lower costs and improve healthcare.
That work is of a piece with the Obama administration’s release last month of a massive price list showing what more than 3,000 U.S. hospitals charge to treat 100 different conditions — a move inspired by Steve Brill’s Time magazine cover story this year on healthcare costs. The government collects that data in the course of administering Medicare and chose to release it to bolster public support for Obama’s healthcare overhaul.
Sometimes, too, data has been for Obama a way of routing around awkward confrontations. His Federal Communications Commission (FCC) has been hammered by some advocates for, as they see it, failing to put the public’s interest in cheaper, faster and more widespread Internet access ahead of the demands of telecom companies such as Verizon and AT&T. The FCC has focused its energies on techniques such as a “broadband speed test” that asks people to gauge how well their Internet connection works, data that providers aren’t eager to release. A neat hack, maybe, but some advocates would rather the FCC focus on getting tougher with industry.
In his talks on new approaches to government regulations, Sunstein is fond of quoting the late legal scholar (and, like Sunstein, former University of Chicago law professor) Karl Llewellyn: “Technique without morals is a menace,” Llewellyn supposedly remarked, “but morals without technique is a mess.”
The enduring challenge of those words has reappeared in the NSA controversy. Are the agency’s techniques a menace? Obama says he is happy to have the conversation. “I think it’s a sign of maturity,” the president said recently in California. “Because probably five years ago, six years ago, we might not have been having this debate.” But he hasn’t been entirely helpful in getting the debate going. “Nobody is listening to your telephone calls,” he said with a slight smile that quickly turned downward. “That’s not what this program is about.”
Unfortunately, “data-driven” has become a conversation-ender, rather than a conversation-framer. There are scores of substantive policy discussions to be had, about big issues — like PRISM — and small, like how the Obama administration chooses to order the healthcare plans detailed for the public on HealthCare.gov, which can nudge Americans toward one provider or another. But there are gaping holes in the understanding of big data among the private sector, elected officials and policy specialists — not to mention the public at large.
The PRISM and Verizon episodes have made plain that we lack even a common vocabulary for talking about big data. Sen. Ron Wyden, D-Ore., has for years tried to dance between his responsibilities as a member of the Senate Intelligence Committee and his desire to galvanize public attention on the NSA’s operations. In a hearing last year, he asked Director of National Intelligence James Clapper for a yes-or-no answer to a seemingly straightforward question: “Does the NSA collect any type of data at all on millions or hundreds of millions of Americans?”
“No, sir,” came Clapper’s response. “Not wittingly. There are cases where they could inadvertently, perhaps, collect, but not wittingly.”
In light of the PRISM and Verizon revelations, critics have seized on these remarks as evidence of the agency’s duplicity. But as the Electronic Frontier Foundation has pointed out, the intelligence community uses a different definition of “collect” than other humans do, holding that it refers to the act of actively processing materials rather than, you know, collecting them.
“This job cannot be done responsibly if senators aren’t getting straight answers to direct questions,” a frustrated Wyden said this past week. (Nor is it always possible in these things to simply pretend it’s Opposite Day. Under federal law, “content” includes details such as email subject lines. But the NSA has held that it isn’t parsing “content” when it evaluates the subject lines of emails it has collected. Or is that “collected”?)
In the high-tech and business worlds, big data is all upside and potential. Data on agiant scale exposes truths hidden in smaller sets. But in the policy realm, when big data is discussed at all, the conversation tends to focus on angst over personal privacy — and whether big data is a major threat.
That’s a limited view of what’s at stake. The government would have a strong self-interest in knowing, for instance, whether some small slice of the 1.1 billion Facebook users was discussing a coup, even if it couldn’t pinpoint the planners. That would be particularly true if, at the same time, it saw an upswing in people pulling up Google Maps images of the Ellipse across from the White House. Patterns are powerful.
Obama has sought to dismiss “the hype that we’ve been hearing,” as he put it, about the NSA’s data-crunching efforts by arguing that they’re complex answers to national security challenges — and ones that Congress has been fully briefed on. But there’s much to discuss about the nature of PRISM and similar programs before we get into the security nitty-gritty. And if it’s a complicated discussion, all the evidence suggests that it isn’t one that Congress is well equipped to have on its own.
What else should that discussion cover? There’s what the intelligence world calls the “mosaic effect” — when a nugget of data that is insignificant on its own takes on new meaning when combined with other bits of information. The White House warned of the risks of this effect in a new set of open-data rules it unveiled last month. There’s what big data means for the relationship between the government and large tech firms; beyond PRISM, for instance, the White House relies on data held by Google and Facebook to line up participants in its frequent online “hangouts” and chats. And then there’s what it means to be truly informed about what rights we’re giving away to the government — the end-user terms of service, in other words, for big data programs.
Of course, old frameworks take us only so far. “The constitutional text provides us with the general principle that we aren’t subject to unreasonable searches by the government,” wrote yet another former University of Chicago law professor. “It can’t tell us the Founders’ specific views on the reasonableness of an NSA computer data-mining operation.” That was Sen. Barack Obama in The Audacity of Hope, not long before entering the White House.
In his “military-industrial complex” speech in 1961, President Dwight Eisenhower warned the American people that “in holding scientific research and discovery in respect, as we should, we must also be alert to the equal and opposite danger that public policy could itself become the captive of a scientific-technological elite.” He said, “It is the task of statesmanship to mold, to balance and to integrate these and other forces, new and old, within the principles of our democratic system, ever aiming toward the supreme goals of our free society.”
More than 50 years later, the task of the modern statesman and stateswoman is to engage the public in the work of integrating the old and the new.
And if they don’t, well, see you on the Internet.
Nancy Scola is a journalist covering technology and politics. From 2001 to 2005, she served on the staff of the House government oversight committee.