Analytics key to agencies in big data explosion
The federal government has seen an explosion of data at its disposal and has needed powerful analytics tools to put it to use, federal IT officials and industry executives said.
A single statistic drove the bulk of the conversation at Thursday’s Hitachi Data Systems Social Innovation Summit, produced by FedScoop: By 2020, analysts predict there will be more than 30 billion network-connected digital devices globally, all producing unprecedented volumes of data in a concept called the Internet of Things.
“Those devices, whether it be the phones we use, the cars we drive in, the medical devices used to keep us healthy, the buildings we work in, the ships and airplanes that protect our country, they’re all generating data, and it’s a question of how do we take that data and really put it to use?” said Mike Tanner, president and CEO of federal for Hitachi Data Systems.
“When used correctly, it can make societal changes,” Tanner said. “We can improve the lives of our citizens.”
Federal Communications Commission CIO David Bray said the growing volume of IP addresses in the transition from IPv4 to IPv6 addresses — a technical move that will allow the connection of more Internet-connected devices — would be like comparing a beach ball’s size to that of the sun.
“That’s not linear change — that’s exponential change,” Bray said. “There were 7 billion network devices on the face of the Earth in 2013, Fourteen billion in 2015, and in just four years there will be 50 billion,” he said, citing a higher estimate. There’s about 10 billion terabytes of data out there now, Bray said, and in five years that number will multiply by 20.
While that data brings with it endless opportunities, it also complicates things, particularly because humans alone are unable to do much with such massive data sets.
Without complex data analytics tools, Dave Bennett, director of the Defense Information Systems Agency’s Implementation and Sustainment Center, said there’s no way defense agencies would be able to react to issues in their systems in real time.
“The machine does the work for you in the speed of electrons to understand what’s going on so you have the ability to react,” Bennett said in his closing keynote. He added, “There’s no way … we can respond to whatever the issues are in the ecosystem if you’re trying to do it manually.” He doesn’t have the manpower to spare for that, he said.
Bennett hopes, though, that someday technology will reach a point where the analytics identify an issue and react automatically.
“We are seeing real-world scenarios right now, both in performance and cyber, that are kind of scary,” Bennett said. “We too often, without analytics, shoot in the dark.”
Similar kinds of analytics are what the Department of Homeland Security depends on to secure the nation and its systems, said Margie Graves, DHS’ principal deputy CIO. Without it, DHS agents would never be able to keep up with threats and filter out the noise in the cyberspace, which she said has become increasingly difficult with the advent of social media.
Graves said big data empowers DHS with situational awareness — “to predict in order to prevent or protect incidents from happening, and then if incidents do happen, to be able to respond an recover.”
But that same data can also befuddle DHS’ mission during the most critical of times, like the response to the Boston bombings.
“All of the data that was collected that day that had to be sifted through in order to develop products and intelligence and derive conclusions that would eventually allow us to either capture the perpetrators,” she said. “Those are the kinds of things that need to happen within DHS, and they need to happen effectively and rapidly.”
Even storing all that data is getting difficult, particularly at the National Archives and Records Administration, which is tasked with preserving federal records and data indefinitely.
Leslie Johnston, director of digital preservation for NARA, said the scale of the data her agency works with — agencies file not only personal email correspondence, but also entire hard drives of data as records — makes archiving much harder than it may seem. And because NARA deals with hundreds of agencies and government entities, formatting issues also arise.
“I am preserving every file format that has ever existed on the Web or that any of you have ever used in you work on a daily basis,” Johnston said. She added, “For us, the scale issue is an issue absolutely.”
Again, she pointed to analytics and machine learning as a key to easing that burden. Johnston said her agency has a very “bespoke workflow” to weed out the noise in its record keeping, but machine intelligence could help get past that.
“If we get a transfer of a million emails, how many of them are actually records, and how many of them are lunch orders?” she said.
Jeremy Snow contributed to this report.
Contact the reporter on this story via email at Billy.Mitchell@FedScoop.com or follow him on Twitter @BillyMitchell89. Subscribe to the Daily Scoop to get all the federal IT news you need in your inbox every morning at fdscp.com/sign-me-on.