Seven priorities for innovating with data in government
The recurring debate on how the federal government spends, or should spend, nearly $90 billion annually on information technology invariably revolves around a fundamental question: What must agencies do to manage and protect data more effectively?
More than a dozen senior government and private sector IT leaders gathered recently at FedScoop’s headquarters to explore that question. The not-for-attribution discussion touched on a variety of issues, from the demands of data stewardship and how data supports an agency’s mission, to opportunities for innovating with data, to the challenges of keeping up with the explosion of data streaming into, and being produced by, government agencies.
The executives, who included federal agency CIOs, chief technology officers and IT industry executives, were then tasked with developing a top priorities list for federal agencies to keep in mind as they consider their IT investment strategies.
Here are the seven priorities they recommended:
1. Insist on connecting the value of data to the mission.
The rapid evolution of technology and the rise of all things digital have unleashed a stunning amount of data flooding through federal agencies on a daily basis.
A good example: The National Oceanic and Atmospheric Administration collects 20 terabytes of satellite and sensor data every day, but until recently, was unable to share more than a tenth of that data with the public. That’s changing, thanks to a creative agreement NOAA struck with cloud computing providers.
But that kind of creativity — and the prescription for deciding how best to handle the overflow of data — demands a disciplined assessment of how new and existing streams of data could better support the fundamental mission of each agency.
The key driver with data, the executives agreed, comes down to asking: “What’s the problem we’re trying to solve and what’s the value we’re trying to create with data?”
2. Prototype data projects quickly to establish value and validity.
The influx of data from many new sources opens up all kinds of possibilities for agencies to innovate. But most agencies remain hard pressed to keep up with existing data gathering and production demands, let alone have the resources to find new forms of value in the mounting stacks of data.
Making datasets available to interested users and the public at large — through APIs, software development kits, and challenge programs — has proven to be a viable and economical alternative for agencies trying to harvest hidden value in all of that new data.
“We don’t need a large upfront budget or planning to get started,” said one government executive. Instead, he suggested borrowing a page from the “Lean Startup” playbook, where agencies might test data products, establish their value and validity using short-term cloud sprints, and then develop them in iterative cycles.
But agencies “need to start the process by defining the problem to be solved,” he said, and make sure “the solution solves problems for multiple groups,” and that “the end product can be leveraged in the future.”
3. Develop an ecosystem of talent to build value from data.
Even if agencies had the resources to hire more talent, the demand for data scientists and resident analysts continues to outpace supply.
The solution offered by one roundtable executive: Embrace “Joy’s law.” The principle, attributed to Sun Microsystems co-founder Bill Joy, and often cited by former U.S. CTO Todd Park, asserts that no matter who you have, the smartest people in the world work for someone else. The trick is, how to tap into them.
The take-away, the group agreed, is for agencies to re-engineer their approach to recruiting talent, and take greater advantage of professional networks and tools that expose their expertise.
One agency that has been doing that successfully for many years is NASA, which has tapped a global community of scientists, engineers and space enthusiasts to tackle all kinds of big data projects through a variety of virtual private and public networks.
4. Stick to common data standards, but pick the right technology for the problem you’re trying to solve.
When it comes to data standards, it is important to examine what’s been set as the standard for data consumption. These include output formats like JSON and XML, as well as exchange protocols like API’s. In the same vein, the executives noted it is also important to standardize the actual data you house and make it easily available to others.
While agency leaders are the first ones to chart a path for data initiatives, “part of defining this path requires having folks who are knowledgeable about these topics and technologies and who understand standards around data and data initiatives,” noted one executive.
It’s also important to choose the right technology for the problem you are trying to solve.
As one executive put it: “In a world where technology is constantly changing before our eyes, it is easy to get stuck in the buzzword game,” and mistakenly “choose the hot technology versus the right technology.”
That can “lead to long and costly efforts that don’t necessarily solve the problem they were set out to solve in the most efficient manner,” he said. Choosing the right standards and technology can also make your data more available to others and more useable, he added.
5. Press to have a single source of truth and support data integrity.
One of the biggest data challenges in government today is arriving at a single source of truth for the data in question, several executives agreed.
“Without confidence in and clarity around the ‘truth’ of the data, data-based decisions can be questioned, contested, ignored, or incorrect,” said one executive. “Organizations that tap into that single source of truth for their data bring remarkable agility, confidence, and pace to their business.”
How can a government organization become the trusted source for data? It’s not simple, but it is achievable with a data governance model, systems to ensure data integrity and a commitment to data cleansing practices.
For data governance, “it’s imperative to identify the key data elements, types, relationships, naming conventions, access controls, encryption requirements, and finally where will it live,” said one executive. For data consistency, agencies can also make better use of available data cleansing tools, such as Salesforce data.com or Data Ladder, that can bring greater consistency to searches, sorts, and identify duplicate data.
6. Push for agile acquisition models for managing and analyzing data.
While it may sound altruistic, agencies should “consider skipping a generation of data processing and storage technologies for newer data analytics systems and approaches,” said another executive.
High-volume, distributed data technologies such as Hadoop and Spark, that can ply through so called data lakes, are already beginning to replace traditional enterprise data warehouses. They have the advantage of running on inexpensive hardware and are well suited for analyzing complex and unstructured data sets relatively quickly without having to move data in and out of data warehouses.
Existing government acquisition rules, however, continue to hobble agencies from taking advantage of these capabilities. Agencies need more agile and flexible blanket purchase agreements, the executives agreed.
One encouraging development to watch is the outcome of a White House directive to two dozen top federal agencies to establish IT-focused “innovative acquisition labs.” The labs are modeled after initiatives started at the Department of Health and Human Services and the Department of Homeland Security, and efforts begun last year by U.S. Digital Service and the EPA.
7. Seek the right balance between security, accessibility and privacy.
Securing data is a top priority for every government IT professional. The frequent news on data breaches stands as evidence to the value of government data and the battle governments are in to secure it.
The temptation, several executives acknowledged, is for agencies to categorize all data as “highly sensitive.” At the same time, there is an ever-growing and countervailing demand for public, employee, citizen self-service and mobile access to that data.
The first step to combating that temptation and trying to make more data accessible requires a better understanding of the data and how to properly categorize it. “The government’s FISMA data classification methodology is a tried and true approach when applied with rigor,” said one executive.
Agency personnel should “think long and hard” before considering data for FISMA “high” classifications, which typically come with additional cost, complexity and controls, while reducing access and limiting self-service.
Alternatively, data classified in the FISMA moderate range opens up the potential to use a host of cloud services that can bring increased operational agility, speed and economy.
“Organizations should explore bringing more fidelity to the data classification,” suggested one executive. “Is it the entire data set, or is it possible that one or two elements are sensitive? Can these elements be encrypted, masked, or [provisionally] accessed? Many times, large blocks of data are FISMA moderate with only a few data elements raising the classification to higher levels.”
As organizations share, mine and combine more and more information, government IT professionals also need to keep up with increased public concern about information privacy, the executives universally agreed. And government has an additional obligation to make data available to the public.
Collectively, all these priorities reflect the deepening conviction that data — and the ability to manage it more effectively — is the fuel for innovation. But to put those priorities into action, agencies must also rethink how they’re allocating their IT resources for systems, services, training and support.
Contact the writer at wyatt.kash@fedscoop and on Twitter @wyattkash.