With AI, agencies have secondary responsibility of providing data for industry
While many federal agencies primarily think of artificial intelligence as an emerging technology to support their own missions, they also have a secondary role to play in fueling America’s research, development and testing of AI by sharing their data, federal tech leaders said Wednesday.
The development of innovative artificial intelligence applications relies on powerful underlying data, which many federal agencies hold via the services they provide to Americans. But both U.S. CIO Suzette Kent and Lynne Parker, assistant director of AI in the White House‘s Office of Science and Technology Policy, identified agencies’ hesitancy to share their data with private and academic partners, as well as other agencies, as a leading challenge limiting the nation’s development of meaningful AI solutions.
Kent said one of her biggest concerns around AI is figuring out “how we make available the powerful data that are strategic assets of the federal government on behalf of its citizens.” Agencies are responsible for handling the data properly, but much of it belongs to the public.
“The agencies have a responsibility for the external components — many of the things … around making data available, responding to request from industry, supporting research and development, whether that is in direct grants or specific topic areas or making data or facilities available to support those sets of activities,” Kent said at a Bipartisan Policy Center event.
Kent’s and Parker’s comments come at a time when agencies have more mandates for sharing their data than ever before.
Data.gov, the federal government’s central repository, has been around for a decade, and the Federal Open Data Policy of 2013 required “newly-generated government data … to be made available in open, machine-readable formats, while continuing to ensure privacy and security.”
More recently, President Trump also signed the OPEN Government Data Act into law at the beginning of 2019 as part of the Foundations for Evidence-Based Policymaking Act, requiring that all non-sensitive government data be made available in machine-readable formats by default. And Kent’s office issued a Federal Data Strategy this summer that promotes the opening of priority data sets at agencies by August 2020.
Parker’s White House team has been responsible for leading the administration’s national AI initiatives, anchored by the February release of the American AI Initiative executive order. In July, the White House issued a request for information looking for feedback on which government data sets could be released or opened up or generally improved in order to help support the development of artificial intelligence.
The raw material of innovation
Advancing AI, Parker said at the event, “is not only about the cool ideas industry and academia have but it’s also about the data.” It’s a challenge making “more data and models available for AI research and development and testing,” she said, adding that the February executive order takes action to find “what are those federal data sets that can further AI R&D and testing.”
While industry and academia gave feedback via the White House’s RFI, still it’s “a challenge to actually make that data available,” Parker said. And “there are privacy concerns, security concerns, the data may just be in a messy format so it’s not discoverable.”
Kent was enlightened by the results of the RFI, pointing out that “about 30 percent of that was from the medical industry. So there’s a lot more that we could do, unleashing that, unharnessing, and doing it in a way that aligns with our values and builds trust of citizens.”
But it’s not as simple as just telling an agency to open up its data. It’s going to take gaining the trust of those agencies as well, as they account for things like privacy and security, Kent said.
“If I had a magic wand, we could translate what trust means into the mechanisms for harnessing the data and making it available to solve those really tough, complex questions that, while protecting civil liberties, they are about improving quality life, national security and economic prosperity across all of our industries.”
Geospatial data is one area in particular that federal agencies have been open to sharing data and the result is “many, many public-private partnerships that have been successful, hundreds of thousands of jobs created, a trillion-dollar industry where federal information is provided and industry consumes that information to create exciting new products, services and apps,” Kent said. But that’s because “the trust equation is much more simple. It’s not about personal behavior, personal information or those types of things. And we’ve been really successful. So that’s an area where we can look at successes and determine what are some of those practices we can replicate as we tackle the challenges with some of the other types of data.”
Kent also mentioned the Department of Health and Human Services as an agency she gets excited to work with because of its open stance on sharing data, allowing scientists to “look at data in one of our lab environments and actually improve early diagnosis.”
“I get excited when agencies agree to share data,” she said, “and as any of you who work with agencies know, that is not a very easy thing.”