Genome Modeling System: A Knowledge Management Platform for Genomics

In this work, we present the Genome Modeling System (GMS), an analysis information management system capable of executing automated genome analysis pipelines at a massive scale. The GMS framework provides detailed tracking of samples and data coupled with reliable and repeatable analysis pipelines. The GMS also serves as a platform for bioinformatics development, allowing a large team to collaborate on data analysis, or an individual researcher to leverage the work of others effectively within its data management system. Rather than separating ad-hoc analysis from rigorous, reproducible pipelines, the GMS promotes systematic integration between the two. As a demonstration of the GMS, we performed an integrated analysis of whole genome, exome and transcriptome sequencing data from a breast cancer cell line (HCC1395) and matched lymphoblastoid line (HCC1395BL). These data are available for users to test the software, complete tutorials and develop novel GMS pipeline configurations. The GMS is available at https://github.com/genome/gms.

Obi L. Griffith | Ken Chen | Vincent J. Magrini | Malachi Griffith | Joshua F. McMichael | Benjamin J. Ainscough | Zachary L. Skidmore | Avinash Ramu | James M. Eldred | David E. Larson | Richard K. Wilson | Elaine R. Mardis | Christopher A. Miller | Jasreet Hundal | Eric Clark | Li Ding | Todd Wylie | Charles Lu | Michael D. McLellan | Xian Fan | Daniel C. Koboldt | Cyriac Kandoth | Robert L. Long | Xiaoqi Shi | Nathan D. Dees | William S. Schierding | Christopher A. Maher | Travis E. Abbott | Indraniel Das | Gary Stiehr | David J. Dooling | Scott M. Smith | Adam C. Coffman | Zachary L. Skidmore | David L. Morton | Lynn K. Carmichael | Christopher C. Harris | Jason R. Walker | Craig S. Pohl | Todd G. Hepler | Benjamin J. Oberkfell | Ian T. Ferguson | Matthew B. Callaway | Anthony M. Brummett | Michael J. Kiwala | Allison A. Regier | Gabriel E. Sanderson | Thomas P. Mooney | Nathaniel G. Nutter | Edward A. Belter | Feiyu Du | Mark M. Burnett | James V. Weible | Joshua B. Peck | Adam Dukes | Justin T. Lolofie | Brian R. Derickson | Kyung H. Kim | Nicole Maher | Benjamin S. Abbott | Amy E. Hawkins | Shawn M. Leonard | William E. Schroeder | Matthew R. Weil | Richard W. Wohlstadter | R. Wilson | E. Mardis | O. Griffith | L. Ding | M. McLellan | D. Larson | Xiaoqi Shi | Ken Chen | D. Koboldt | C. Pohl | D. Dooling | M. Griffith | C. Maher | C. Kandoth | Charles Lu | I. Das | M. Callaway | J. Eldred | Jason R. Walker | Scott M. Smith | V. Magrini | Adam Dukes | W. Schierding | N. Dees | Feiyu Du | T. Wylie | Xian Fan | S. Leonard | Z. Skidmore | A. Ramu | E. Belter | Eric Clark | A. Regier | G. Stiehr | J. Hundal | C. Harris | R. L. Long | Ian T. Ferguson | David L. Morton | M. M. Burnett | Kyung H. Kim | Nicole Maher | Benjamin S. Abbott | W. E. Schroeder | M. R. Weil | R. Wilson | Indraniel Das | Avinash Ramu

[1]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of human colon and rectal cancer , 2012, Nature.

[2]  Nuno A. Fonseca,et al.  Tools for mapping high-throughput sequencing data , 2012, Bioinform..

[3]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[4]  Steven J. M. Jones,et al.  Integrated genomic characterization of endometrial carcinoma , 2013, Nature.

[5]  Mingming Jia,et al.  COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer , 2010, Nucleic Acids Res..

[6]  Benjamin J. Raphael,et al.  Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. , 2013, The New England journal of medicine.

[7]  Ken Chen,et al.  SomaticSniper: identification of somatic point mutations in whole genome sequencing data , 2012, Bioinform..

[8]  Gary D Bader,et al.  Computational approaches to identify functional genetic variants in cancer genomes , 2013, Nature Methods.

[9]  W. Huber,et al.  Differential expression analysis for sequence count data , 2010 .

[10]  Yanjiao Zhou,et al.  The conjunctival microbiome in health and trachomatous disease: a case control study , 2014, Genome Medicine.

[11]  Kristine M. Wylie,et al.  Genome Sequence of Enterovirus D68 from St. Louis, Missouri, USA, 2016 , 2017, Genome Announcements.

[12]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[13]  Joshua B. Gross,et al.  The cavefish genome reveals candidate genes for eye loss , 2014, Nature Communications.

[14]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[15]  Johanna M Seddon,et al.  Whole-exome sequencing identifies rare, functional CFH variants in families with macular degeneration. , 2014, Human molecular genetics.

[16]  Li Ding,et al.  Genomic Landscape of Non-Small Cell Lung Cancer in Smokers and Never-Smokers , 2012, Cell.

[17]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[18]  R. Wilson,et al.  Assessing telomeric DNA content in pediatric cancers using whole-genome sequencing data , 2012, Genome Biology.

[19]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[20]  Alison S. Waller,et al.  Genomic variation landscape of the human gut microbiome , 2012, Nature.

[21]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[22]  Michael C. Rusch,et al.  CREST maps somatic structural variation in cancer genomes with base-pair resolution , 2011, Nature Methods.

[23]  Li Ding,et al.  The Pediatric Cancer Genome Project , 2012, Nature Genetics.

[24]  Robert Schmieder,et al.  SEQanswers: an open access community for collaboratively decoding genomes , 2012, Bioinform..

[25]  Ken Chen,et al.  Recurring mutations found by sequencing an acute myeloid leukemia genome. , 2009, The New England journal of medicine.

[26]  Tatiana A. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2004, Nucleic Acids Res..

[27]  Christopher A. Maher,et al.  ChimeraScan: a tool for identifying chimeric transcription in sequencing data , 2011, Bioinform..

[28]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[29]  M. Westerfield,et al.  Characterization of paired tumor and non‐tumor cell lines established from patients with breast cancer , 1998, International journal of cancer.

[30]  R. Wilson,et al.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation , 2009, Nature Methods.

[31]  Amy E. Hawkins,et al.  DNA sequencing of a cytogenetically normal acute myeloid leukemia genome , 2008, Nature.

[32]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[33]  E. Mardis Genome sequencing and cancer. , 2012, Current opinion in genetics & development.

[34]  Yumei Li,et al.  Application of next-generation sequencing to identify genes and mutations causing autosomal dominant retinitis pigmentosa (adRP). , 2014, Advances in experimental medicine and biology.

[35]  Heng Li,et al.  Tabix: fast retrieval of sequence features from generic TAB-delimited files , 2011, Bioinform..

[36]  Terrence S. Furey,et al.  The UCSC Genome Browser Database: update 2006 , 2005, Nucleic Acids Res..

[37]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[38]  Matthew W. Hahn,et al.  Comparative analysis of the domestic cat genome reveals genetic signatures underlying feline biology and domestication , 2014, Proceedings of the National Academy of Sciences.

[39]  Joshua F. McMichael,et al.  DGIdb - Mining the druggable genome , 2013, Nature Methods.

[40]  G. Weinstock,et al.  Metagenomic analysis of double-stranded DNA viruses in healthy adults , 2014, BMC Biology.

[41]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[42]  Benjamin J. Raphael,et al.  Integrated Genomic Analyses of Ovarian Carcinoma , 2011, Nature.

[43]  Brent S. Pedersen,et al.  BioStar: An Online Question & Answer Resource for the Bioinformatics Community , 2011, PLoS Comput. Biol..

[44]  J. L. Montagne,et al.  Emerging infectious diseases. , 1994, The Journal of infectious diseases.

[45]  Tanya M. Teslovich,et al.  Re-sequencing Expands Our Understanding of the Phenotypic Impact of Variants at GWAS Loci , 2014, PLoS genetics.

[46]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[47]  Wendy S. W. Wong,et al.  Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs , 2012, Bioinform..

[48]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[49]  Daniel E. Warren,et al.  The western painted turtle genome, a model for the evolution of extreme physiological adaptations in a slowly evolving lineage , 2013, Genome Biology.

[50]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[51]  Kristine M. Wylie,et al.  Genome Sequence of Enterovirus D68 from St. Louis, Missouri, USA , 2015, Emerging infectious diseases.

[52]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[53]  E. Mardis Next-generation sequencing platforms. , 2013, Annual review of analytical chemistry.

[54]  T. Hubbard,et al.  A census of human cancer genes , 2004, Nature Reviews Cancer.

[55]  Heather L. Mulder,et al.  Whole-genome sequencing identifies genetic alterations in pediatric low-grade gliomas , 2013, Nature Genetics.

[56]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .