SeqGene: a comprehensive software solution for mining exome- and transcriptome- sequencing data

BackgroundThe popularity of massively parallel exome and transcriptome sequencing projects demands new data mining tools with a comprehensive set of features to support a wide range of analysis tasks.ResultsSeqGene, a new data mining tool, supports mutation detection and annotation, dbSNP and 1000 Genome data integration, RNA-Seq expression quantification, mutation and coverage visualization, allele specific expression (ASE), differentially expressed genes (DEGs) identification, copy number variation (CNV) analysis, and gene expression quantitative trait loci (eQTLs) detection. We also developed novel methods for testing the association between SNP and expression and identifying genotype-controlled DEGs. We showed that the results generated from SeqGene compares favourably to other existing methods in our case studies.ConclusionSeqGene is designed as a general-purpose software package. It supports both paired-end reads and single reads generated on most sequencing platforms; it runs on all major types of computers; it supports arbitrary genome assemblies for arbitrary organisms; and it scales well to support both large and small scale sequencing projects. The software homepage is http://seqgene.sourceforge.net.

[1]  L. Liang,et al.  Mapping complex disease traits with global gene expression , 2009, Nature Reviews Genetics.

[2]  L. Liang,et al.  A genome-wide association study of global gene expression , 2007, Nature Genetics.

[3]  Ken Chen,et al.  VarScan: variant detection in massively parallel sequencing of individual and pooled samples , 2009, Bioinform..

[4]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[5]  Kevin P. Murphy,et al.  SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors , 2010, Bioinform..

[6]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[7]  I. Goodhead,et al.  Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution , 2008, Nature.

[8]  E. S. Venkatraman,et al.  A faster circular binary segmentation algorithm for the analysis of array CGH data , 2007, Bioinform..

[9]  Dawei Li,et al.  The diploid genome sequence of an Asian individual , 2008, Nature.

[10]  Ken Chen,et al.  Recurring mutations found by sequencing an acute myeloid leukemia genome. , 2009, The New England journal of medicine.

[11]  Andrew M. Jenkinson,et al.  Ensembl 2009 , 2008, Nucleic Acids Res..

[12]  S. Ranade,et al.  Stem cell transcriptome profiling via massive-scale mRNA sequencing , 2008, Nature Methods.

[13]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[14]  David Haussler,et al.  The UCSC genome browser database: update 2007 , 2006, Nucleic Acids Res..

[15]  P. Shannon,et al.  Exome sequencing identifies the cause of a Mendelian disorder , 2009, Nature Genetics.

[16]  Susumu Goto,et al.  KEGG for representation and analysis of molecular networks involving diseases and drugs , 2009, Nucleic Acids Res..

[17]  S. Hunt,et al.  Genome-Wide Associations of Gene Expression Variation in Humans , 2005, PLoS genetics.

[18]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[19]  Donald B. Johnson,et al.  Efficient Algorithms for Shortest Paths in Sparse Networks , 1977, J. ACM.

[20]  Xuegong Zhang,et al.  DEGseq: an R package for identifying differentially expressed genes from RNA-seq data , 2010, Bioinform..

[21]  Bing Ren,et al.  Systematic detection of putative tumor suppressor genes through the combined use of exome and transcriptome sequencing , 2010, Genome Biology.

[22]  Yong-mei Zhu,et al.  Exome sequencing identifies somatic mutations of DNA methyltransferase gene DNMT3A in acute monocytic leukemia , 2011, Nature Genetics.

[23]  R. Guigó,et al.  Transcriptome genetics using second generation sequencing in a Caucasian population , 2010, Nature.

[24]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[25]  P. Bork,et al.  Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.

[26]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[27]  Asan,et al.  Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude , 2010, Science.

[28]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[29]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[30]  Huanming Yang,et al.  SNP detection for massively parallel whole-genome resequencing. , 2009, Genome research.

[31]  Gonçalo R. Abecasis,et al.  Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma , 2007, Nature.

[32]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[33]  Mary Goldman,et al.  The UCSC Genome Browser database: update 2011 , 2010, Nucleic Acids Res..

[34]  D. Clayton,et al.  Genome-wide analysis of allelic expression imbalance in human primary cells by high-throughput transcriptome resequencing , 2009, Human molecular genetics.

[35]  R. Lister,et al.  Highly Integrated Single-Base Resolution Maps of the Epigenome in Arabidopsis , 2008, Cell.

[36]  Joseph K. Pickrell,et al.  Understanding mechanisms underlying human gene expression variation with RNA sequencing , 2010, Nature.

[37]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[38]  S. Jhanwar,et al.  Chromosome 19 abnormalities are commonly seen in AML, M7. , 2002, Blood.

[39]  Thomas D. Wu,et al.  A highly annotated whole-genome sequence of a Korean individual , 2009, Nature.

[40]  Stefano Volinia,et al.  GAMES identifies and annotates mutations in next-generation sequencing projects , 2011, Bioinform..

[41]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.