AStrap: identification of alternative splicing from transcript sequences without a reference genome

Summary Alternative splicing (AS) is a well-established mechanism for increasing transcriptome and proteome diversity, however, detecting AS events and distinguishing among AS types in organisms without available reference genomes remains challenging. We developed a de novo approach called AStrap for AS analysis without using a reference genome. AStrap identifies AS events by extensive pair-wise alignments of transcript sequences and predicts AS types by a machine-learning model integrating more than 500 assembled features. We evaluated AStrap using collected AS events from reference genomes of rice and human as well as single-molecule real-time sequencing data from Amborella trichopoda. Results show that AStrap can identify much more AS events with comparable or higher accuracy than the competing method. AStrap also possesses a unique feature of predicting AS types, which achieves an overall accuracy of ∼0.87 for different species. Extensive evaluation of AStrap using different parameters, sample sizes and machine-learning models on different species also demonstrates the robustness and flexibility of AStrap. AStrap could be a valuable addition to the community for the study of AS in nonmodel organisms with limited genetic resources. Availability AStrap is available for download at https://github.com/BMILAB/AStrap. Supp. information Supplementary data are available at Bioinformatics online.

[1]  T. Blauwkamp,et al.  Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events , 2015, Nature Biotechnology.

[2]  W Brad Barbazuk,et al.  Detecting alternatively spliced transcript isoforms from single‐molecule long‐read sequences without a reference genome , 2017, Molecular ecology resources.

[3]  F. Baralle,et al.  Alternative splicing as a regulator of development and tissue identity , 2017, Nature Reviews Molecular Cell Biology.

[4]  J. Harrow,et al.  Assessment of transcript reconstruction methods for RNA-seq , 2013, Nature Methods.

[5]  Sylvain Foissac,et al.  ASTALAVISTA: dynamic and flexible analysis of alternative splicing events in custom gene datasets , 2007, Nucleic Acids Res..

[6]  Faye D. Schilkey,et al.  A survey of the sorghum transcriptome using single-molecule long reads , 2016, Nature Communications.

[7]  Fan Liang,et al.  Hybrid sequencing and map finding (HySeMaFi): optional strategies for extensively deciphering gene splicing and expression in organisms without reference genome , 2017, Scientific Reports.

[8]  Tyson A. Clark,et al.  Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing , 2016, Nature Communications.

[9]  Stephen M. Mount,et al.  Comprehensive analysis of alternative splicing in rice and comparative analyses with Arabidopsis , 2006, BMC Genomics.

[10]  Zhongchi Liu,et al.  Global identification of alternative splicing via comparative analysis of SMRT‐ and Illumina‐based RNA‐seq in strawberry , 2017, The Plant journal : for cell and molecular biology.

[11]  Chentao Lin,et al.  Comprehensive profiling of rhizome‐associated alternative splicing and alternative polyadenylation in moso bamboo (Phyllostachys edulis) , 2017, The Plant journal : for cell and molecular biology.

[12]  Donald Sharon,et al.  A single-molecule long-read survey of the human transcriptome , 2013, Nature Biotechnology.