Identification of alternatively spliced gene isoforms and novel noncoding RNAs by single-molecule long-read sequencing in Camellia

ABSTRACT Direct single-molecule sequencing of full-length transcripts allows efficient identification of gene isoforms, which is apt to alternative splicing (AS), polyadenylation, and long non-coding RNA analyses. However, the identification of gene isoforms and long non-coding RNAs with novel regulatory functions remains challenging, especially for species without a reference genome. Here, we present a comprehensive analysis of a combined long-read and short-read transcriptome sequencing in Camellia japonica. Through a novel bioinformatic pipeline of reverse-tracing the split-sites, we have uncovered 257,692 AS sites from 61,838 transcripts; and 13,068 AS isoforms have been validated by aligning the short reads. We have identified the tissue-specific AS isoforms along with 6,373 AS events that were found in all tissues. Furthermore, we have analysed the polyadenylation (polyA) patterns of transcripts, and found that the preference for polyA signals was different between the AS and non-AS transcripts. Moreover, we have predicted the phased small interfering RNA (phasiRNA) loci through integrative analyses of transcriptome and small RNA sequencing. We have shown that a newly evolved phasiRNA locus from lipoxygenases generated 12 consecutive 21 bp secondary RNAs, which were responsive to cold and heat stress in Camellia. Our studies of the isoform transcriptome provide insights into gene splicing and functions that may facilitate the mechanistic understanding of plants.

[1]  M. Crespi,et al.  Alternative Splicing in the Regulation of Plant-Microbe Interactions. , 2019, Plant & cell physiology.

[2]  D. Ware,et al.  Reviving the Transcriptome Studies: An Insight Into the Emergence of Single-Molecule Transcriptome Sequencing , 2019, Front. Genet..

[3]  M. Kalyna,et al.  Does co-transcriptional regulation of alternative splicing mediate plant stress responses? , 2019, Nucleic acids research.

[4]  B. Meyers,et al.  24-nt reproductive phasiRNAs are broadly present in angiosperms , 2019, Nature Communications.

[5]  Songnian Hu,et al.  PacBio full‐length cDNA sequencing integrated with RNA‐seq reads drastically improves the discovery of splicing transcripts in rice , 2018, The Plant journal : for cell and molecular biology.

[6]  Heng-fu Yin,et al.  Unraveling the Roles of Regulatory Genes during Domestication of Cultivated Camellia: Evidence and Insights from Comparative and Evolutionary Genomics , 2018, Genes.

[7]  Y. Chao,et al.  Analysis of transcripts and splice isoforms in red clover (Trifolium pratense L.) by single-molecule long-read sequencing , 2018, BMC Plant Biology.

[8]  Xiaomei Yan,et al.  Characterization and Alternative Splicing Profiles of the Lipoxygenase Gene Family in Tea Plant (Camellia sinensis) , 2018, Plant & cell physiology.

[9]  W. McCombie,et al.  A comparative transcriptional landscape of maize and sorghum obtained by single-molecule sequencing , 2018, Genome research.

[10]  X. Dai,et al.  psRNATarget: a plant small RNA target analysis server (2017 release) , 2018, Nucleic Acids Res..

[11]  J. Bennetzen,et al.  Draft genome sequence of Camellia sinensis var. sinensis provides insights into the evolution of the tea genome and tea quality , 2018, Proceedings of the National Academy of Sciences.

[12]  Shilin Chen,et al.  IDP-denovo: de novo transcriptome assembly and isoform annotation by hybrid sequencing , 2018, Bioinform..

[13]  Wenqin Wang,et al.  Isoform Sequencing and State-of-Art Applications for Unravelling Complexity of Plant Transcriptomes , 2018, Genes.

[14]  W Brad Barbazuk,et al.  Detecting alternatively spliced transcript isoforms from single‐molecule long‐read sequences without a reference genome , 2017, Molecular ecology resources.

[15]  A. Furtado,et al.  Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts , 2017, GigaScience.

[16]  Chentao Lin,et al.  Comprehensive profiling of rhizome‐associated alternative splicing and alternative polyadenylation in moso bamboo (Phyllostachys edulis) , 2017, The Plant journal : for cell and molecular biology.

[17]  Zhongchi Liu,et al.  Global gene expression defines faded whorl specification of double flower domestication in Camellia , 2017, Scientific Reports.

[18]  En-Hua Xia,et al.  The Tea Tree Genome Provides Insights into Tea Flavor and Independent Evolution of Caffeine Biosynthesis. , 2017, Molecular plant.

[19]  Matthew B. Stocks,et al.  Comprehensive processing of high-throughput small RNA sequencing data including quality checking, normalization, and differential expression analysis using the UEA sRNA Workbench. , 2017, RNA.

[20]  Nam V. Hoang,et al.  A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing , 2017, BMC Genomics.

[21]  Yan Li,et al.  Sequencing and de novo assembly of a near complete indica rice genome , 2017, Nature Communications.

[22]  Akino Shiroma,et al.  Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area , 2017, Human Cell.

[23]  Tyson A. Clark,et al.  Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing , 2016, Nature Communications.

[24]  Zhongchi Liu,et al.  Phylogenetic tree-informed microRNAome analysis uncovers conserved and lineage-specific miRNAs in Camellia during floral organ development. , 2016, Journal of experimental botany.

[25]  M. Szcześniak,et al.  CANTATAdb: A Collection of Plant Long Non-Coding RNAs , 2015, Plant & cell physiology.

[26]  R. Martienssen,et al.  The expanding world of small RNAs in plants , 2015, Nature Reviews Molecular Cell Biology.

[27]  Kin-Fan Au,et al.  PacBio Sequencing and Its Applications , 2015, Genom. Proteom. Bioinform..

[28]  B. Meyers,et al.  Extensive Families of miRNAs and PHAS Loci in Norway Spruce Demonstrate the Origins of Complex phasiRNA Networks in Seed Plants , 2015, Molecular biology and evolution.

[29]  Huan Wang,et al.  Long noncoding RNA transcriptome of plants. , 2015, Plant biotechnology journal.

[30]  Aimin Li,et al.  PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme , 2014, BMC Bioinformatics.

[31]  John W. S. Brown,et al.  Alternative Splicing at the Intersection of Biological Timing, Development, and Stress Responses[OPEN] , 2013, Plant Cell.

[32]  B. Meyers,et al.  Phased, Secondary, Small Interfering RNAs in Posttranscriptional Regulatory Networks[OPEN] , 2013, Plant Cell.

[33]  Meng Wang,et al.  Widespread Long Noncoding RNAs as Endogenous Target Mimics for MicroRNAs in Plants1[W] , 2013, Plant Physiology.

[34]  Zongrang Liu,et al.  Apple miRNAs and tasiRNAs with novel regulatory networks , 2012, Genome Biology.

[35]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[36]  Detlef Weigel,et al.  Plant secondary siRNA production determined by microRNA-duplex structure , 2012, Proceedings of the National Academy of Sciences.

[37]  N. Proudfoot Ending the message: poly(A) signals then and now. , 2011, Genes & development.

[38]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[39]  Shouan Liu,et al.  Differential expression pattern of an acidic 9/13-lipoxygenase in flower opening and senescence and in leaf response to phloem feeders in the tea plant , 2010, BMC Plant Biology.

[40]  Davis J. McCarthy,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[41]  J. Poulain,et al.  The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla , 2007, Nature.

[42]  Yong Zhang,et al.  CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine , 2007, Nucleic Acids Res..

[43]  M. Gribskov,et al.  The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray) , 2006, Science.

[44]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[45]  G. Rubin,et al.  A computer program for aligning a cDNA sequence with a genomic DNA sequence. , 1998, Genome research.

[46]  Adam Ameur,et al.  Single-Molecule Sequencing: Towards Clinical Applications. , 2019, Trends in biotechnology.

[47]  Guoli Ji,et al.  Computational analysis of plant polyadenylation signals. , 2015, Methods in molecular biology.

[48]  Huan Wang,et al.  Databases and ontologies Advance Access publication March 7, 2013 PLncDB: plant long non-coding RNA database , 2013 .

[49]  Sibum Sung,et al.  Long noncoding RNA: unveiling hidden layer of gene regulatory networks. , 2012, Trends in plant science.

[50]  Michael Kohl,et al.  Cytoscape: software for visualization and analysis of biological networks. , 2011, Methods in molecular biology.

[51]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[52]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[53]  Matthew B. Stocks,et al.  Bioinformatics Applications Note Sequence Analysis the Uea Srna Workbench: a Suite of Tools for Analysing and Visualizing next Generation Sequencing Microrna and Small Rna Datasets , 2022 .