论文信息 - Petabyte-scale innovations at the European Nucleotide Archive

Petabyte-scale innovations at the European Nucleotide Archive

Dramatic increases in the throughput of nucleotide sequencing machines, and the promise of ever greater performance, have thrust bioinformatics into the era of petabyte-scale data sets. Sequence repositories, which provide the feed for these data sets into the worldwide computational infrastructure, are challenged by the impact of these data volumes. The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/embl), comprising the EMBL Nucleotide Sequence Database and the Ensembl Trace Archive, has identified challenges in the storage, movement, analysis, interpretation and visualization of petabyte-scale data sets. We present here our new repository for next generation sequence data, a brief summary of contents of the ENA and provide details of major developments to submission pipelines, high-throughput rule-based validation infrastructure and data integration approaches.

[1] Cathy H. Wu,et al. The Universal Protein Resource (UniProt) , 2004, Nucleic Acids Res..

[2] David L. Wheeler,et al. GenBank , 2015, Nucleic Acids Res..

[3] Helen E. Parkinson,et al. ArrayExpress—a public database of microarray experiments and gene expression profiles , 2006, Nucleic Acids Res..

[4] Hideaki Sugawara,et al. DDBJ with new system and face , 2007, Nucleic Acids Res..

[5] Sue Povey,et al. The HGNC Database in 2008: a resource for the human genome , 2007, Nucleic Acids Res..

[6] Judith A. Blake,et al. The Mouse Genome Database (MGD): mouse biology and model systems , 2007, Nucleic Acids Res..

[7] Dan Wu,et al. Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database , 2007, Nucleic Acids Res..

[8] J. Wain,et al. High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi , 2008, Nature Genetics.

[9] Chris F. Taylor,et al. The minimum information about a genome sequence (MIGS) specification , 2008, Nature Biotechnology.

[10] Andreas Prlic,et al. Ensembl 2008 , 2007, Nucleic Acids Res..

[11] J. Poulain,et al. Comparative Analysis of Acinetobacters: Three Genomes for Three Lifestyles , 2008, PloS one.