Big Data: Astronomical or Genomical?

Genomics is a Big Data science and is going to get much bigger, very soon, but it is not known whether the needs of genomics will exceed other Big Data domains. Projecting to the year 2025, we compared genomics with three other major generators of Big Data: astronomy, YouTube, and Twitter. Our estimates show that genomics is a “four-headed beast”—it is either on par with or the most demanding of the domains analyzed here in terms of data acquisition, storage, distribution, and analysis. We discuss aspects of new technologies that will need to be developed to rise up and meet the computational challenges that genomics poses for the near future. Now is the time for concerted, community-wide planning for the “genomical” challenges of the next decade.

[1]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[2]  Jonathan A Eisen,et al.  Environmental Shotgun Sequencing: Its Potential and Challenges for Studying the Hidden World of Microbes , 2007, PLoS biology.

[3]  Gabriel H. Loh,et al.  3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.

[4]  Xiaohui Xie,et al.  Sequence analysis Human genomes as email attachments , 2022 .

[5]  M. Schatz,et al.  Searching for SNPs with cloud computing , 2009, Genome Biology.

[6]  Michael C. Schatz,et al.  Cloud Computing and the DNA Data Race , 2010, Nature Biotechnology.

[7]  Monya Baker,et al.  Next-generation sequencing: adjusting to data overload , 2010, Nature Methods.

[8]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[9]  Ray P. Norris Data Challenges for Next-generation Radio Telescopes , 2010, 2010 Sixth IEEE International Conference on e-Science Workshops.

[10]  M. Snir,et al.  Big data, but are we ready? , 2011, Nature Reviews Genetics.

[11]  Markus Hsi-Yang Fritz,et al.  Efficient storage of high throughput DNA sequencing data using reference-based compression. , 2011, Genome research.

[12]  Susan J. Brown,et al.  Creating a buzz about insect genomes. , 2011, Science.

[13]  C. Mora,et al.  How Many Species Are There on Earth and in the Ocean? , 2011, PLoS biology.

[14]  Raymond K. Auerbach,et al.  The real cost of sequencing: higher than you think! , 2011, Genome Biology.

[15]  Jannik N. Andersen,et al.  Cancer genomics: from discovery science to personalized medicine , 2011, Nature Medicine.

[16]  Katherine H. Huang,et al.  The Human Microbiome Project: A Community Resource for the Healthy Human Microbiome , 2012, PLoS biology.

[17]  Djoerd Hiemstra,et al.  Brute Force Information Retrieval Experiments using MapReduce , 2012, ERCIM News.

[18]  B. Berger,et al.  Compressive genomics , 2012, Nature Biotechnology.

[19]  Michael C Schatz,et al.  Computational thinking in the era of big data biology , 2012, Genome Biology.

[20]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[21]  Jingde Zhu,et al.  A year of great leaps in genome research , 2012, Genome Medicine.

[22]  Hugo Y. K. Lam,et al.  Personal Omics Profiling Reveals Dynamic Molecular and Medical Phenotypes , 2012, Cell.

[23]  Jim Giles,et al.  Computational social science: Making the links , 2012, Nature.

[24]  Vincent W. S. Chan,et al.  Optical flow switching: An end-to-end “UltraFlow” architecture , 2013, 2013 15th International Conference on Transparent Optical Networks (ICTON).

[25]  V. Marx Drilling into big cancer-genome data , 2013, Nature Methods.

[26]  Robert Patro,et al.  Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms , 2013, ArXiv.

[27]  Gend Lal Prajapati,et al.  The Square Kilometer Array - Some Notes Regarding the Largest Telescope Being Planned and Why it is the Ultimate Big-Data Challenge? , 2013 .

[28]  Michael P Snyder,et al.  High-throughput sequencing for biology and medicine , 2013, Molecular systems biology.

[29]  Aaron Golden,et al.  Astrogenomics: big data, old problems, old solutions? , 2013, Genome Biology.

[30]  Huan Liu,et al.  Twitter Data Analytics , 2013, SpringerBriefs in Computer Science.

[31]  Jun Wang,et al.  The 3,000 rice genomes project: new opportunities and challenges for future rice research , 2014, GigaScience.

[32]  Yaniv Erlich,et al.  Routes for breaching and protecting genetic privacy , 2013, Nature Reviews Genetics.

[33]  Science Visualized: The gene sequencing future is here , 2014 .

[34]  Rob Patro,et al.  Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms , 2013, Nature Biotechnology.

[35]  K. Yamamoto,et al.  GLOBAL ALLIANCE FOR GENOMICS AND HEALTH , 2015 .

[36]  H. Stefánsson,et al.  Identification of a large set of rare complete human knockouts , 2015, Nature Genetics.

[37]  Lee Murray,et al.  The 100,000 Genomes Project , 2015 .

[38]  White House fleshes out Obama’s $215 million plan for precision medicine , 2015 .

[39]  S. Koren,et al.  One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. , 2015, Current opinion in microbiology.

[40]  M. Peplow The 100 000 Genomes Project , 2016, British Medical Journal.