Linc2function: A Comprehensive Pipeline and Webserver for Long Non-Coding RNA (lncRNA) Identification and Functional Predictions Using Deep Learning Approaches

Long non-coding RNAs (lncRNAs), comprising a significant portion of the human transcriptome, serve as vital regulators of cellular processes and potential disease biomarkers. However, the function of most lncRNAs remains unknown, and furthermore, existing approaches have focused on gene-level investigation. Our work emphasizes the importance of transcript-level annotation to uncover the roles of specific transcript isoforms. We propose that understanding the mechanisms of lncRNA in pathological processes requires solving their structural motifs and interactomes. A complete lncRNA annotation first involves discriminating them from their coding counterparts and then predicting their functional motifs and target bio-molecules. Current in silico methods mainly perform primary-sequence-based discrimination using a reference model, limiting their comprehensiveness and generalizability. We demonstrate that integrating secondary structure and interactome information, in addition to using transcript sequence, enables a comprehensive functional annotation. Annotating lncRNA for newly sequenced species is challenging due to inconsistencies in functional annotations, specialized computational techniques, limited accessibility to source code, and the shortcomings of reference-based methods for cross-species predictions. To address these challenges, we developed a pipeline for identifying and annotating transcript sequences at the isoform level. We demonstrate the effectiveness of the pipeline by comprehensively annotating the lncRNA associated with two specific disease groups. The source code of our pipeline is available under the MIT licensefor local use by researchers to make new predictions using the pre-trained models or to re-train models on new sequence datasets. Non-technical users can access the pipeline through a web server setup.

[1]  Truong Nguyen Khanh Hung,et al.  Development and Validation of an Explainable Machine Learning-Based Prediction Model for Drug–Food Interactions from Chemical Structures , 2023, Sensors.

[2]  N. Le,et al.  Prediction of anticancer peptides based on an ensemble model of deep learning and machine learning using ordinal positional encoding , 2023, Briefings Bioinform..

[3]  Christopher D. Brown,et al.  Population-scale tissue transcriptomics maps long non-coding RNAs to complex disease , 2021, Cell.

[4]  Levin Kuhlmann,et al.  Towards a comprehensive pipeline to identify and functionally annotate long noncoding RNA (lncRNA) , 2020, Comput. Biol. Medicine.

[5]  C. Kwoh,et al.  Deep learning based DNA:RNA triplex forming potential prediction , 2020, BMC Bioinformatics.

[6]  Piero Carninci,et al.  Genome-Wide Technologies to Study RNA–Chromatin Interactions , 2020, Non-coding RNA.

[7]  Christina Backes,et al.  DynaVenn: web-based computation of the most significant overlap between ordered sets , 2019, BMC Bioinformatics.

[8]  Yaoqi Zhou,et al.  RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning , 2019, Nature Communications.

[9]  Anna Marie Pyle,et al.  Phylogenetic Analysis with Improved Parameters Reveals Conservation in lncRNA Structures. , 2019, Journal of molecular biology.

[10]  L. Floeter-Winter,et al.  Long Non-Coding RNAs in the Regulation of Gene Expression: Physiology and Disease , 2019, Non-coding RNA.

[11]  Nicole I Bieberstein,et al.  Splicing of long non-coding RNAs primarily depends on polypyrimidine tract and 5′ splice-site sequences due to weak interactions with SR proteins , 2018, Nucleic acids research.

[12]  Laurent Gil,et al.  Ensembl variation resources , 2018, Database J. Biol. Databases Curation.

[13]  Mark Gerstein,et al.  GENCODE reference annotation for the human and mouse genomes , 2018, Nucleic Acids Res..

[14]  V. Bajic,et al.  LncBook: a curated knowledgebase of human long non-coding RNAs , 2018, Nucleic acids research.

[15]  Zhen Yang,et al.  LncRNADisease 2.0: an updated database of long non-coding RNA-associated diseases , 2018, Nucleic Acids Res..

[16]  A. Zampetaki,et al.  Long Non-coding RNA Structure and Function: Is There a Link? , 2018, Front. Physiol..

[17]  Byunghan Lee,et al.  LncRNAnet: long non‐coding RNA identification using deep learning , 2018, Bioinform..

[18]  Yaoqi Zhou,et al.  EVLncRNAs: a manually curated database for long non-coding RNAs validated by low-throughput experiments , 2017, Nucleic Acids Res..

[19]  Sibum Sung,et al.  Modular function of long noncoding RNA, COLDAIR, in the vernalization response , 2017, PLoS genetics.

[20]  Jordan A. Ramilowski,et al.  An atlas of human long non-coding RNAs with accurate 5′ ends , 2017, Nature.

[21]  S. Eddy,et al.  A statistical test for conserved RNA structure shows lack of evidence for structure in lncRNAs , 2016, Nature Methods.

[22]  Tsukasa Fukunaga,et al.  RIblast: an ultrafast RNA–RNA interaction prediction system based on a seed-and-extension approach , 2016, bioRxiv.

[23]  Xing Chen,et al.  Long non-coding RNAs and complex diseases: from experimental results to computational models , 2016, Briefings Bioinform..

[24]  M. Szcześniak,et al.  lncRNA-RNA Interactions across the Human Transcriptome , 2016, PloS one.

[25]  Howard Y. Chang,et al.  Unique features of long non-coding RNA biogenesis and function , 2015, Nature Reviews Genetics.

[26]  M. Snyder,et al.  High-throughput sequencing technologies. , 2015, Molecular cell.

[27]  Eric A. Ortlund,et al.  The structure, function and evolution of proteins that bind DNA and RNA , 2014, Nature Reviews Molecular Cell Biology.

[28]  K. Morris,et al.  Evolutionary conservation of long non-coding RNAs; sequence, structure, function. , 2014, Biochimica et biophysica acta.

[29]  J. Mattick,et al.  Structure and function of long noncoding RNAs in epigenetic regulation , 2013, Nature Structural &Molecular Biology.

[30]  Xing Chen,et al.  LncRNADisease: a database for long-non-coding RNA-associated diseases , 2012, Nucleic Acids Res..

[31]  O. Abdel-Wahab,et al.  Faculty Opinions recommendation of lincRNAs act in the circuitry controlling pluripotency and differentiation. , 2011 .

[32]  P. Pandolfi,et al.  A ceRNA Hypothesis: The Rosetta Stone of a Hidden RNA Language? , 2011, Cell.

[33]  Fabian A. Buske,et al.  Potential in vivo roles of nucleic acid triple-helices , 2011, RNA biology.

[34]  Kate B. Cook,et al.  RBPDB: a database of RNA-binding specificities , 2010, Nucleic Acids Res..

[35]  Leonard Lipovich,et al.  Genome-wide computational identification and manual annotation of human long noncoding RNA genes. , 2010, RNA.

[36]  P. Pandolfi,et al.  A coding-independent function of gene and pseudogene mRNAs regulates tumour biology , 2010, Nature.

[37]  T. Hughes,et al.  Most “Dark Matter” Transcripts Are Associated With Known Genes , 2010, PLoS biology.

[38]  J. Rinn,et al.  Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression , 2009, Proceedings of the National Academy of Sciences.

[39]  A. Srinivasan,et al.  CID-miRNA: a web server for prediction of novel miRNA precursors in human genome. , 2008, Biochemical and biophysical research communications.

[40]  J. Mattick,et al.  Non-coding RNA. , 2006, Human molecular genetics.

[41]  S. Cusack RNA-protein complexes. , 1999, Current opinion in structural biology.

[42]  Piero Carninci,et al.  Discovery and functional analysis of lncRNAs: Methodologies to investigate an uncharacterized transcriptome. , 2016, Biochimica et biophysica acta.

[43]  T Derrien,et al.  Long noncoding RNAs as enhancers of gene expression. , 2010, Cold Spring Harbor symposia on quantitative biology.