‘All of our federal data assets are currently at risk’ — here’s how people are trying to protect them
Improving the country’s transportation infrastructure may be a Trump administration focus, but a group of activists — including even a few current government employees and contractors— is trying to shore up something more intangible than America’s bridges, and highways.
The fragile foundation that needs immediate support, they say, is the one that keeps the federal government’s data available for use by all.
A group of coders, librarians, scientists, storytellers and others passionate about data came together at Georgetown University in Washington, D.C., this weekend to preserve federal data that some worry could disappear under different Trump administration priorities. The goal of the DataRescueDC event: store federal climate and environmental data that is “vulnerable under an administration that denies the fact of ongoing climate change.”
But while fear of losing federal scientific data during the Trump administration has galvanized work across the country to preserve reputable copies of key data, during Saturday’s events experts involved in the project said that it also highlights the need for creating an official infrastructure for safeguarding federal data.
“We talk a lot in this country about our failing infrastructure, and it’s really obvious when drinking water supplies are dangerous to the people who drink them. And it’s really obvious when a bridge collapses over the Mississippi river. But what was not really obvious, I think, until this juncture that we are now at is how incredible vulnerable our infrastructure for federal data is. Like, there isn’t one really. It’s totally just absent in many — in very powerful ways,” said Bethany Wiggin, founding director of the University of Pennsylvania Program in Environmental Humanities, which is facilitating Data Refuge.
Known as DataRescueDC, the Georgetown event was one of five happening around the country that weekend to rescue federal data using a specific methodology — a workflow from the Data Refuge project — for collection and storage.
Some of the URLs flagged at events are crawled for storage by the Internet Archive’s End of Term project. Data that are difficult to capture with crawlers go to the Data Refuge.
Two Obama administration-era staffers spoke at the events on Saturday about the importance of government data.
Denice Ross, public interest technology fellow at New America, was a senior adviser in the Obama White House. As a Presidential Innovation Fellow in 2014, she co-founded the Police Data Initiative.
“All of our federal data assets are currently at risk. And they always have been,” she said Saturday, later adding “We need to be alert and not take any of our federal data for granted.”
Data is always dependent on the continued flow of resources and talent, Ross said to FedScoop on Saturday.
There are many places for potential breaks in the process of collecting and publishing data, and as Ross noted, “even when you have the best intentions to keep things open, sometimes they break.”
Registration for the weekend was optionally anonymous in part because government employees planned to attend. Attendees, including reporters, were encouraged by signage to ask permission before taking photos of participants. At least one federal employee attended, as well as two federal contractors.
The federal data bucket brigade
On Saturday, participants listened to talks at a “teach-in” on the importance of collecting and preserving data and were trained on how best to lead the mass of helpers that would descend on Reiss Science Building at Georgetown on Sunday to “rescue” federal data. On Sunday, participants worked for hours to find, review, harvest and bag the data in a kind of “bucket brigade.”
The team’s “seeders and sorters” began first with looking for important URLs on the EPA’s Office of Environmental Information site. Others worked on jobs such as harvesting previously-flagged data sets, checking to make sure data sets were complete and adding descriptions to explain them.
2 days. Over 250 people. 20 GB of data harvested. 4776 URLs seeded. 15 datasets bagged. 40 datasets described into @DataRefuge #datarescueDC pic.twitter.com/je4D7tAkAC
— Annalisa Dias (@ajdm) February 19, 2017
Reluctant to attend the post-inauguration Women’s March in Washington due to the large crowds, an attendee who goes by the hacker name Carabobcalcat said she felt more effective contributing from “behind a keyboard” — and in a measurable way — by saving data.
An EPA employee who attended and asked to remain anonymous said to FedScoop: “This represents decades of work that people have put in, when you consider the creation of a system, the maintaining of a system and the data quality. And the idea of that being washed away for any reason, political or non-political, is terrifying.”
“This is something that should be done no matter what the administration is. And the fact that it wasn’t done before this administration came into place is in itself a problem,” the attendee said.
The EPA employee’s role in the event was to help rescuers understand what they were getting, and “point people to some of the data sets that are key and critical for preservation purposes, because there are certain places you can go where you can collect a lot of information at once and save a lot of time, effort and heartache.”
“We haven’t reached a point yet where I’ve seen anything that actually gives me reason to fear that data would vanish,” the employee said. “But you can’t complain about the effectiveness of an agency and say that it’s OK for their data to go away for any reason at the same time, because we’re only as good as our data. If the data is bad, if the data is gone, doing a good job to the taxpaying public is impossible.”
Choosing vulnerable data
“This is a unique presidential transition because of the vast amount of information that’s been made available online over the past eight years, especially after the Open Government Directive and the Open Government Plan and all of the significant efforts that have been made in federal agencies,” said Michael Halpern, deputy director at the Center for Science and Democracy for the Union of Concerned Scientists, in an interview with FedScoop.
The Union of Concerned Scientists is one of Data Refuge’s partners, and when the project was just getting started the group asked its network of scientists to take the survey to see what datasets they cared about, Halpern said.
The group has also “been connecting the projects with recently-retired senior scientists from agencies who understand, you know, how the data is set up and what the sort of whole ecosystem looks like and how different data sets fit together so that they can provide that kind of advice,” he said.
“I think it’s important to back up all government data,” he said. “Just by conducting this exercise the information becomes more resilient because the incentive to take it down becomes less because it exists elsewhere in a usable format.”