IDP-ASE: haplotyping and quantifying allele-specific expression at the gene and gene isoform level by hybrid sequencing

Abstract Allele-specific expression (ASE) is a fundamental problem in studying gene regulation and diploid transcriptome profiles, with two key challenges: (i) haplotyping and (ii) estimation of ASE at the gene isoform level. Existing ASE analysis methods are limited by a dependence on haplotyping from laborious experiments or extra genome/family trio data. In addition, there is a lack of methods for gene isoform level ASE analysis. We developed a tool, IDP-ASE, for full ASE analysis. By innovative integration of Third Generation Sequencing (TGS) long reads with Second Generation Sequencing (SGS) short reads, the accuracy of haplotyping and ASE quantification at the gene and gene isoform level was greatly improved as demonstrated by the gold standard data GM12878 data and semi-simulation data. In addition to methodology development, applications of IDP-ASE to human embryonic stem cells and breast cancer cells indicate that the imbalance of ASE and non-uniformity of gene isoform ASE is widespread, including tumorigenesis relevant genes and pluripotency markers. These results show that gene isoform expression and allele-specific expression cooperate to provide high diversity and complexity of gene regulation and expression, highlighting the importance of studying ASE at the gene isoform level. Our study provides a robust bioinformatics solution to understand ASE using RNA sequencing data only.

[1]  Tyson A. Clark,et al.  Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing , 2015, Nucleic acids research.

[2]  M. Gerstein,et al.  AlleleSeq: analysis of allele-specific expression and binding in a network framework , 2011, Molecular systems biology.

[3]  A. Califano,et al.  Regulation of extra-embryonic endoderm stem cell differentiation by Nodal and Cripto signaling , 2011, Development.

[4]  Wei Sun,et al.  A Statistical Framework for eQTL Mapping Using RNA‐seq Data , 2012, Biometrics.

[5]  W. Wong,et al.  Improving PacBio Long Read Accuracy by Short Read Alignment , 2012, PloS one.

[6]  J. Knight,et al.  Allele-specific gene expression uncovered. , 2004, Trends in genetics : TIG.

[7]  Matti Pirinen,et al.  Assessing allele-specific expression across multiple tissues from RNA-seq read data , 2015, Bioinform..

[8]  Donald Sharon,et al.  A single-molecule long-read survey of the human transcriptome , 2013, Nature Biotechnology.

[9]  John N. Hutchinson,et al.  Widespread Monoallelic Expression on Human Autosomes , 2007, Science.

[10]  Wing Hung Wong,et al.  Statistical inferences for isoform expression in RNA-Seq , 2009, Bioinform..

[11]  Jehyuk Lee,et al.  Digital RNA Allelotyping Reveals Tissue-specific and Allele-specific Gene Expression in Human , 2009, Nature Methods.

[12]  L. Coin,et al.  Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads , 2011, Genome Biology.

[13]  Xiaoquan Wen,et al.  QuASAR: quantitative allele-specific analysis of reads , 2015, Bioinform..

[14]  Wing Hung Wong,et al.  Characterization of the human ESC transcriptome by hybrid sequencing , 2013, Proceedings of the National Academy of Sciences.

[15]  C. L. Baker,et al.  PRDM9 Drives Evolutionary Erosion of Hotspots in Mus musculus through Haplotype-Specific Initiation of Meiotic Recombination , 2015, PLoS genetics.

[16]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[17]  Kin-Fan Au,et al.  PacBio Sequencing and Its Applications , 2015, Genom. Proteom. Bioinform..

[18]  Daniel A. Skelly,et al.  A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from RNA-seq data. , 2011, Genome research.

[19]  Differential protein occupancy profiling of the mRNA transcriptome , 2014, Genome Biology.

[20]  T Laver,et al.  Assessing the performance of the Oxford Nanopore Technologies MinION , 2015, Biomolecular detection and quantification.

[21]  Alessandro Romanel,et al.  ASEQ: fast allele-specific studies from next-generation sequencing data , 2015, BMC Medical Genomics.

[22]  Andrew Quinn,et al.  Estimates of allele-specific expression in Drosophila with a single genome sequence and RNA-seq data , 2014, Bioinform..

[23]  S. Goff,et al.  Allele Workbench: Transcriptome Pipeline and Interactive Graphics for Allele-Specific Expression , 2014, PloS one.

[24]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[25]  Donald Sharon,et al.  Defining a personal, allele-specific, and single-molecule long-read transcriptome , 2014, Proceedings of the National Academy of Sciences.

[26]  Emily K. Tsang,et al.  The landscape of genomic imprinting across diverse adult human tissues , 2015, Genome research.

[27]  Olufunmilayo I. Olopade,et al.  Breast cancer risk associated with BRCA1 and BRCA2 in diverse populations , 2007, Nature Reviews Cancer.

[28]  Jehyuk Lee,et al.  A Robust Approach to Identifying Tissue-Specific Gene Expression Regulatory Variants Using Personalized Human Induced Pluripotent Stem Cells , 2009, PLoS genetics.

[29]  S. Walker Invited comment on the paper "Slice Sampling" by Radford Neal , 2003 .

[30]  R. Gibbs,et al.  Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology , 2012, PloS one.

[31]  N. D. Clarke,et al.  A genome-wide RNAi screen reveals determinants of human embryonic stem cell identity , 2010, Nature.

[32]  Oleg Mayba,et al.  MBASED: allele-specific expression detection in cancer tissues and cell lines , 2014, Genome Biology.

[33]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[34]  A. Chess,et al.  Mechanisms and consequences of widespread random monoallelic expression , 2012, Nature Reviews Genetics.

[35]  David C. Jones,et al.  Landscape of somatic mutations in 560 breast cancer whole genome sequences , 2016, Nature.

[36]  C. Gregg Known unknowns for allele-specific expression and genomic imprinting effects , 2014, F1000prime reports.

[37]  Timothy E. Reddy,et al.  Genomic approaches for understanding the genetics of complex disease , 2015, Genome research.

[38]  M. Schatz,et al.  Hybrid error correction and de novo assembly of single-molecule sequencing reads , 2012, Nature Biotechnology.

[39]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[40]  T. Pastinen Genome-wide allele-specific analysis: insights into regulatory variation , 2010, Nature Reviews Genetics.

[41]  Mathieu Blanchette,et al.  Global patterns of cis variation in human cells revealed by high-density allelic expression analysis , 2009, Nature Genetics.

[42]  Kui Zhang,et al.  Haplotype-association analysis. , 2008, Advances in genetics.

[43]  A. Halpern,et al.  An MCMC algorithm for haplotype assembly from whole-genome sequence data. , 2008, Genome research.

[44]  C. Polychronakos,et al.  Parental genomic imprinting of the human IGF2 gene , 1993, Nature Genetics.

[45]  Christian Schlötterer,et al.  Allelic imbalance metre (Allim), a new tool for measuring allele-specific gene expression with RNA-seq data , 2013, Molecular ecology resources.

[46]  H. Willard,et al.  X-inactivation profile reveals extensive variability in X-linked gene expression in females , 2005, Nature.

[47]  Jan H Bergmann,et al.  Random monoallelic gene expression increases upon embryonic stem cell differentiation. , 2014, Developmental cell.