Meet the U.S. Data Federation: A new hub for standardized, coordinated open data
The General Services Administration is working to create a place where data providers can go to see if their data fits into a set of standards others might be already using.
This new effort, the recently announced U.S. Data Federation, is a step forward in the open data movement toward not just publishing data on Data.gov but also coordinating it among specific topics to be interoperable and standardized, experts say.
Philip Ashlock, chief architect of Data.gov, talked to FedScoop about that goal and the overall vision for the federation, which launched in late September as a place data publishers can look for examples of successful standardized multi-agency data initiatives.
Future plans for the federation include tools to help publishers coordinate their efforts and use a preferred data standard, and a maturity model to monitor the progress of some of these initiatives, Ashlock said.
“Part of the challenges within government is just knowing that these initiatives exist, knowing what the technical details are as far as data specifications or standards around them,” Ashlock told FedScoop. “It’s increasingly more of a coordination challenge when you’re not just talking about federal agencies but potentially working with state and local governments as well.”
Data.gov is not only a catalogue of federal agency data — it has recently been getting more data from the state and local levels as well.
“The main concept is it’s a catalogue to explore open data resources from across government,” Ashlock said of Data.gov. “The U.S. Data Federation, on the other hand, is actually to identify and highlight initiatives that are focused on specific problems using data from multiple sources.”
The U.S. Data Federation was launched in conjunction with the White House’s first Open Data Innovation Summit, and Ashlock said it will help officials “contextualize [their] data publishing into these broader initiatives but also to show that there’s a particular way to do it that’s kind of considered a best practice, or that there may even be some requirements around.”
A long-term goal, Ashlock said, is to develop a “maturity model” to show where initiatives are in achieving their goals and what “the next phase … should be for those involved.”
“The concept … of data federation is basically how do you coordinate among multiple data publishers so that you can pull all the data together in one place so that it’s sort of one cohesive whole?” Ashlock said. “So this gets around sort of the concept of data standardization, or just the basic coordination of how information is published.”
One of the biggest lingering obstacles in the open data movement is siloed data, Socrata CEO Kevin Merritt said.
In an interview with FedScoop, Merritt explained how that problem is caused in part by the way government program funding works — that it often doesn’t include money for interoperability efforts.
“There’s an enormous data silo problem and it’s real and it exists in every government,” Merritt said. “And there’s no silver bullet for getting the data out of those systems; it takes work, it takes effort, it takes people to go in there and connect to those underlying systems and build conduits to get the data from those data silos into an environment where the data can be shared externally.”
When a new program gets funded in government, often new systems are created to support it, Merritt said.
“Those systems were never designed to talk to each other,” he said. “And if you want to do some analysis that has data from three or four different programs, it’s actually really hard to do unless you have got some sort of way to kind of stage the data and pull it together.”
Hudson Hollister, founder and executive director of the Data Coalition, said it is exciting to see the GSA put an emphasis on standardization.
“Data.gov up until now has been about publishing data sets, but not about standardizing them. Without data standards, published data might not be useful because it’s got to be extensively translated and transformed to be comparable across different agencies,” Hollister said. “It looks as though with the announcement of the U.S. Data Federation, GSA is recognizing that.”
When publishing data sets, agencies should look to see if a standard structure or format exists that they can use, Hollister said, “because that makes the data sets more likely to be comparable with things other agencies or other offices have published.”
Data.gov’s focus, Hollister said, was “let’s get as much stuff published as we can, and at least start people thinking about data publication and using the data, or trying to.”
“Moving towards standardization is really the next stage,” he said. “By highlighting standardization projects in specific verticals, the U.S. Data Federation encourages agencies that are publishing open data sets to pay attention to those standardization efforts and maybe put in place a preference for following whatever data standardization effort is in their vertical.”
The federation’s focus on getting initiatives with standards that reach across more than one agency is “the right framing,” Hollister said, particularly in the context of making data more useful to agencies for their own use.
Hollister said government data has the most value to its own internal users to help them make better decisions. And so the next phase of the open data movement will be where agencies are using open, standardized data from across government to make decisions.
When thinking about the federation’s audience, Alex Howard, senior analyst for the Sunlight Foundation, told FedScoop it is important that the federation is aimed at third parties who might reuse the data.
“The third-party reuse is where the greatest amplification and opportunity to inform people comes from,” Howard said. “That’s why it matters to get the people who build products, who do data science… the ones who know the data that they need and can put it to use.”
“Orienting the federation at those people is critical, just like orienting any data portal at those people is critical because you want to make sure the reuse happens so that there’s an exposure to what the data can provide at the point of decision,” Howard said.
Howard said having something like the federation that is provider-focused and emphasizes standardization is “critical.”
The goal for the federation, Ashlock said, is that government information feeds into a “national strategy that allows tools and applications to be developed that work nationally, as opposed to just for that one agency, or just for that one local government.”
“I think it’s kind of a point of maturity in the open data space where we’re not just talking about publishing data, but being a little bit more coordinated and thoughtful about how we do that at a national scale,” Ashlock said.