Expanding and Enhancing Access to the Sequence Read Archive (SRA) Through a Complementary New Web-based Mirror

Public institutions such as the National Center for Biotechnology Information (NCBI) have made tremendous investments in generating and archiving a wide array of valuable genomic data for use by the research community. Expanding access to these valuable public data and streamlining the ability to integrate them into data management tools and powerful analyses, will further expedite their use and value in medical research, discovery and applications. Teaming up with Google, DNAnexus has developed a complementary hosted mirror of the NCBI's Sequence Read Archive (SRA) that provides researchers an additional way to access these important datasets. This freely accessible resource provides a new web-based user interface built using the latest “cloud” technologies and genomic data standards. As the most comprehensive archive of publicly available next-generation sequencing data, the SRA is an important resource to researchers around the world. The SRA remains the single best source of useful sequence data from research initiatives such as the 1,000 Genomes Project and institutions like the Broad Institute, Washington University, and the Wellcome Trust Sanger Institute. Here we discuss our work with the NCBI and Google to create a complementary mirror of the SRA available at sra.dnanexus.com. Through a typical user scenario, we will discuss the underlying data processing pipeline, key features of the new web-based interface that enables researchers to quickly identify and browse datasets of interest, link-outs to PubMed references, and integration of those data into an analysis workflow for hypothesis generation.