New high-throughput DNA sequencing technologies have revolutionized how scientists study the organisms around us. In particular, microbiology - the study of the smallest, unseen organisms that pervade our lives - has embraced these new techniques to characterize and analyze the cellular constituents and use this information to develop novel tools, techniques, and therapeutics. So-called next-generation DNA sequencing platforms have resulted in huge increases in the amount of raw data that can be rapidly generated. Argonne National Laboratory developed the premier platform for the analysis of this new data (mg-rast) that is used by microbiologists worldwide. This paper uses the accounting from the computational analysis of more than 10,000,000,000 bp of DNA sequence data, describes an analysis of the advanced computational requirements, and suggests the level of analysis that will be essential as microbiologists move to understand how these tiny organisms affect our every day lives. The results from this analysis indicate that data analysis is a linear problem, but that most analyses are held up in queues. With sufficient resources, computations could be completed in a few hours for a typical dataset. These data also suggest execution times that delimit timely completion of computational analyses, and provide bounds for problematic processes.
[1]
James R. Knight,et al.
Genome sequencing in microfabricated high-density picolitre reactors
,
2005,
Nature.
[2]
James R. Cole,et al.
The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis
,
2004,
Nucleic Acids Res..
[3]
F. Sanger,et al.
DNA sequencing with chain-terminating inhibitors.
,
1977,
Proceedings of the National Academy of Sciences of the United States of America.
[4]
Eoin L. Brodie,et al.
Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB
,
2006,
Applied and Environmental Microbiology.
[5]
Rick L. Stevens,et al.
The RAST Server: Rapid Annotations using Subsystems Technology
,
2008,
BMC Genomics.
[6]
R. Knight,et al.
The Human Microbiome Project
,
2007,
Nature.
[7]
Naryttza N. Diaz,et al.
The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes
,
2005,
Nucleic acids research.
[8]
R. Edwards,et al.
Marine Environmental Genomics: Unlocking the Ocean's Secrets
,
2007
.