Genome research in the cloud.

High-throughput genome research has long been associated with bioinformatics, as it assists genome sequencing and annotation projects. Along with databases, to store, properly manage, and retrieve biological data, a large number of computational tools have been developed to decode biological information from this data. However, with the advent of next-generation sequencing (NGS) technology the sequence data starts generating at a pace never before seen. Consequently researchers are facing a threat as they are experiencing a potential shortage of storage space and tools to analyze the data. Moreover, the voluminous data increases traffic in the network by uploading and downloading large data sets, and thus consume much of the network's available bandwidth. All of these obstacles have led to the solution in the form of cloud computing.

[1]  Geoffrey C. Fox,et al.  Hybrid cloud and cluster computing paradigms for life science applications , 2010, BMC Bioinformatics.

[2]  Yongchao Liu,et al.  MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities , 2010, Bioinform..

[3]  Peter J. Tonellato,et al.  Cloud computing for comparative genomics , 2010, BMC Bioinformatics.

[4]  Catherine Shaffer Next-generation sequencing outpaces expectations , 2007, Nature Biotechnology.

[5]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[6]  Michael C. Schatz,et al.  CloudBurst: highly sensitive read mapping with MapReduce , 2009, Bioinform..

[7]  R. Fleischmann,et al.  The Minimal Gene Complement of Mycoplasma genitalium , 1995, Science.

[8]  J. Mullikin,et al.  SSAHA: a fast search method for large DNA databases. , 2001, Genome research.

[9]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[11]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[12]  Francisco Azuaje,et al.  Gene set analysis in the cloud , 2012 .

[13]  Mladen A. Vouk,et al.  Cloud Computing – Issues, Research and Implementations , 2008, CIT 2008.

[14]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[15]  Huanming Yang,et al.  SNP detection for massively parallel whole-genome resequencing. , 2009, Genome research.

[16]  Jin Soo Lee,et al.  FX: an RNA-Seq analysis tool on the cloud , 2012, Bioinform..

[17]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[18]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[19]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[20]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[21]  G. Nolan,et al.  Computational solutions to large-scale data management and analysis , 2010, Nature Reviews Genetics.

[22]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[23]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[24]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[25]  B. Langmead,et al.  Cloud-scale RNA-sequencing differential expression analysis with Myrna , 2010, Genome Biology.

[26]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[27]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[28]  Alexander A. Morgan,et al.  Translational bioinformatics in the cloud: an affordable alternative , 2010, Genome Medicine.

[29]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[30]  Fernando Guirado,et al.  Cloud-Coffee: implementation of a parallel consistency-based multiple alignment algorithm in the T-Coffee package and its benchmarking on the Amazon Elastic-Cloud , 2010, Bioinform..

[31]  Alex Bateman,et al.  Cloud computing , 2009, Bioinform..

[32]  Chris Rose,et al.  A Break in the Clouds: Towards a Cloud Definition , 2011 .

[33]  D. P. Wall,et al.  Detecting putative orthologs , 2003, Bioinform..

[34]  M. Schatz,et al.  Searching for SNPs with cloud computing , 2009, Genome Biology.

[35]  Joel T Dudley,et al.  In silico research in the era of cloud computing , 2010, Nature Biotechnology.

[36]  Michael C. Schatz,et al.  Cloud Computing and the DNA Data Race , 2010, Nature Biotechnology.

[37]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[38]  B. Barrell,et al.  Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence , 1998, Nature.

[39]  José A. B. Fortes,et al.  CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications , 2008, 2008 IEEE Fourth International Conference on eScience.

[40]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[41]  Paul Hofmann,et al.  Cloud Computing: The Limits of Public Clouds for Business Applications , 2010, IEEE Internet Computing.

[42]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[43]  Armando Fox,et al.  Cloud Computing—What's in It for Me as a Scientist? , 2011, Science.

[44]  L. Stein The case for cloud computing in genome informatics , 2010, Genome Biology.

[45]  Caspar Zialor DNA sequencing with chain terminating inhibitors , 2014 .

[46]  Ronald C. Taylor An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics , 2010, BMC Bioinformatics.