Tech

NIH makes its coronavirus genomic data publicly accessible in the cloud

Researchers can now quickly access the data for free, so long as they have an NIH award.

August 18, 2020

(Getty Images)

The National Institutes of Health is making genomic data about the coronavirus publicly accessible to researchers in the cloud for the first time.

Created by the National Center for Biotechnology Information, the Coronavirus Genome Sequence Dataset consists of researcher-submitted data, including normalized Sequence Read Archive (SRA) file formats. The SRA is a bioinformatics repository of DNA sequences.

Researchers with active NIH awards can now quickly access the dataset at no cost via the Registry of Open Data on Amazon Web Services, and the agency plans to make it available on more public data cloud platforms.

“Containing COVID-19 outbreaks and preparing for future pandemics will require a deep understanding of the SARS-CoV-2 genome in the context of other COVID-19 patients and the broader Coronaviridae family,” said Ryan Layer, assistant professor at the University of Colorado Boulder’s BioFrontiers Institute, in a statement. “The NCBI Coronavirus Genome Sequence Dataset makes over a decade of viral genome data publicly accessible for researchers, empowering anyone in the research community to participate in the pandemic response.”

The dataset contains more than 13,000 SRA runs, NIH says. The project is part of the NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) initiative. STRIDES is a collaboration between NIH and AWS to use the cloud to assist researchers with active NIH awards.

The data being made available will help researchers understand not only COVID-19 but other pandemic diseases. Differences in genetic sequences among infected patients help researchers determine how quickly the virus is evolving, and genetics are thought to play a role in how patients react to infection. Diagnostic testing can also be fine tuned.

The dataset itself consists of two buckets: one containing raw and normalized files categorized by SRA accession code and another containing accession metadata that will soon be queryable within the Amazon Athena interactive query service.

NIH makes its coronavirus genomic data publicly accessible in the cloud

More Like This

Alleged ICE, DHS location data purchases under scrutiny by Democrats

Interior needs to improve data-sharing capabilities, GAO says

Amazon commits up to $50 billion to boost AI, supercomputing infrastructure for agencies

Top Stories

CISA CIO Robert Costello exits agency

Data experts see new Labor Department portal as ‘an important first step’

The White House wants quicker AI adoption. Can agencies make it happen?

National labs work to optimize AI infrastructure amid Genesis Mission

Anthropic faces fallout across federal agencies from DOD clash

Bipartisan Senate bill to establish AI standards, testbeds gets renewed

IRS broke the law more than 40K times by sharing addresses with ICE, judge says

More Scoops

With shift to increased remote work and zero trust, NIH eyes cloud solution for identity

HHS makes Palantir data analytics platform available to all its agencies

Biden calls on Congress to fund ‘DARPA for health’ in State of the Union address

NIH awards Palantir further contract for COVID-19 data enclave

NIH’s COVID-19 data enclave continues to evolve with the virus

HHS data collection and sharing continues to evolve with the pandemic

VA expanding clinical data access to improve COVID-19, suicide prevention outcomes

Latest Podcasts

Alleged ICE, DHS location data purchases come under scrutiny of Democrats

Anthropic faces fallout across federal agencies from DOD clash

DHS shutdown puts strain on security, information-sharing ahead of World Cup

The US military’s cyber priorities

Tech

Defense

Cyber

FedScoop TV