ASEQ: fast allele-specific studies from next-generation sequencing data

BackgroundSingle base level information from next-generation sequencing (NGS) allows for the quantitative assessment of biological phenomena such as mosaicism or allele-specific features in healthy and diseased cells. Such studies often present with computationally challenging burdens that hinder genome-wide investigations across large datasets that are now becoming available through the 1,000 Genomes Project and The Cancer Genome Atlas (TCGA) initiatives.ResultsWe present ASEQ, a tool to perform gene-level allele-specific expression (ASE) analysis from paired genomic and transcriptomic NGS data without requiring paternal and maternal genome data. ASEQ offers an easy-to-use set of modes that transparently to the user takes full advantage of a built-in fast computational engine. We report its performances on a set of 20 individuals from the 1,000 Genomes Project and show its detection power on imprinted genes. Next we demonstrate high level of ASE calls concordance when comparing it to AlleleSeq and MBASED tools. Finally, using a prostate cancer dataset we report on a higher fraction of ASE genes with respect to healthy individuals and show allele-specific events nominated by ASEQ in genes that are implicated in the disease.ConclusionsASEQ can be used to rapidly and reliably screen large NGS datasets for the identification of allele specific features. It can be integrated in any NGS pipeline and runs on computer systems with multiple CPUs, CPUs with multiple cores or across clusters of machines.

[1]  Eric S. Lander,et al.  The genomic complexity of primary human prostate cancer , 2010, Nature.

[2]  Oleg Mayba,et al.  MBASED: allele-specific expression detection in cancer tissues and cell lines , 2014, Genome Biology.

[3]  Daniel A. Skelly,et al.  A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from RNA-seq data. , 2011, Genome research.

[4]  Weiping Ma,et al.  Embryogenesis Microarray for Profiling Gene Expression Patterns during 15,000 Unique Zebrafish Est Clusters and Their Future Use in Material Supplemental , 2022 .

[5]  Xia Li,et al.  iASeq: integrating multiple chip-seq datasets for detecting allele-specific binding , 2012, BMC Bioinformatics.

[6]  K. Buetow,et al.  Allelic variation in gene expression is common in the human genome. , 2003, Genome research.

[7]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[8]  F. Demichelis,et al.  Tumor clone dynamics in lethal prostate cancer , 2014, Science Translational Medicine.

[9]  A. Sivachenko,et al.  Punctuated Evolution of Prostate Cancer Genomes , 2013, Cell.

[10]  J. Byrd,et al.  Germline Allele-Specific Expression of DAPK1 in Chronic Lymphocytic Leukemia , 2013, PloS one.

[11]  Mark Gerstein,et al.  Bioinformatics Applications Note Gene Expression Rseqtools: a Modular Framework to Analyze Rna-seq Data Using Compact, Anonymized Data Summaries , 2022 .

[12]  N. Alon,et al.  Monoallelic expression determines oncogenic progression and outcome in benign and malignant brain tumors. , 2012, Cancer research.

[13]  R. Ádány,et al.  Chromosomal imbalances in primary and metastatic melanomas revealed by comparative genomic hybridization. , 2001, Cytometry.

[14]  B. Ponder,et al.  Allele-Specific Up-Regulation of FGFR2 Increases Susceptibility to Breast Cancer , 2008, PLoS biology.

[15]  H. Tanke,et al.  Identification of Genetic Markers for Prostatic Cancer Progression , 2000, Laboratory Investigation.

[16]  A. Børresen-Dale,et al.  The Life History of 21 Breast Cancers , 2012, Cell.

[17]  Emilie Lalonde,et al.  RNA sequencing reveals the role of splicing polymorphisms in regulating human gene expression. , 2011, Genome research.

[18]  Alessandro Romanel,et al.  Unraveling the clonal hierarchy of somatic genomic aberrations , 2014, Genome Biology.

[19]  A. Sivachenko,et al.  Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer , 2012, Nature Genetics.

[20]  M. Gerstein,et al.  AlleleSeq: analysis of allele-specific expression and binding in a network framework , 2011, Molecular systems biology.

[21]  Shengrui Wang,et al.  A novel hierarchical clustering algorithm for gene sequences , 2012, BMC Bioinformatics.

[22]  Christian Schlötterer,et al.  Allelic imbalance metre (Allim), a new tool for measuring allele-specific gene expression with RNA-seq data , 2013, Molecular ecology resources.

[23]  M. Lee Allele-specific gene expression and epigenetic modifications and their application to understanding inheritance and cancer. , 2012, Biochimica et biophysica acta.

[24]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[25]  F. Speleman,et al.  A novel gene family NBPF: intricate structure generated by gene duplications during primate evolution. , 2005, Molecular biology and evolution.

[26]  S. Ho,et al.  PMP24, a gene identified by MSRF, undergoes DNA hypermethylation-associated gene silencing during cancer progression in an LNCaP model , 2004, Oncogene.

[27]  John N. Hutchinson,et al.  Widespread Monoallelic Expression on Human Autosomes , 2007, Science.

[28]  G. Glinsky,et al.  Malignancy-associated regions of transcriptional activation: gene expression profiling identifies common chromosomal regions of a recurrent transcriptional activation in human prostate, breast, ovarian, and colon cancers. , 2003, Neoplasia.

[29]  John C. Marioni,et al.  Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data , 2009, Bioinform..

[30]  M. Rubin,et al.  Variants at IRX4 as prostate cancer expression quantitative trait loci , 2013, European Journal of Human Genetics.