A convNet based multi label microRNA sub cellular location predictor, by incorporating k-mer positional encoding

MicroRNAs are special RNA sequences containing 22 nucleotides and are capable of regulating almost 60% of highly complex mammalian transcriptome. Presently, there exists very limited approaches capable of visualizing miRNA locations inside cell to reveal the hidden pathways, and mechanisms behind miRNA functionality, transport, and biogenesis. State-of-the-art miRNA sub-cellular location prediction MIRLocatar approach makes use of sequence to sequence model along with pre-train k-mer embeddings. Existing pre-train k-mer embedding generation methodologies focus on the extraction of semantics of k-mers. In RNA sequences, rather than semantics, positional information of nucleotides is more important because distinct positions of four basic nucleotides actually define the functionality of RNA molecules. Considering the dynamicity and importance of nucleotides positions, instead of learning representation on the basis of k-mers semantics, we propose a novel kmerRP2vec feature representation approach that fuses positional information of k-mers to randomly initialized neural k-mer embeddings. Effectiveness of proposed feature representation approach is evaluated with two deep learning based convolutional neural network CNN and recurrent neural network RNN methodologies using 8 evaluation measures. Experimental results on a public benchmark miRNAsubloc dataset prove that proposed kmerRP2vec approach along with a simple CNN model outperforms state-of-the-art MirLocator approach with a significant margin of 18% and 19% in terms of precision and recall.

[1]  Raul Vicente,et al.  ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples , 2019, bioRxiv.

[2]  Jijun Tang,et al.  Prediction of human protein subcellular localization using deep learning , 2017, J. Parallel Distributed Comput..

[3]  Tim R. Mercer,et al.  Differentiating Protein-Coding and Noncoding RNA: Challenges and Ambiguities , 2008, PLoS Comput. Biol..

[4]  P. Tomançak,et al.  Global Analysis of mRNA Localization Reveals a Prominent Role in Organizing Cellular Architecture and Function , 2007, Cell.

[5]  Muhammad Imran Malik,et al.  A Robust and Precise ConvNet for Small Non-Coding RNA Classification (RPC-snRC) , 2019, IEEE Access.

[6]  Zhi-Hua Zhou,et al.  A Unified View of Multi-Label Performance Measures , 2016, ICML.

[7]  Maqsood Hayat,et al.  Predicting subcellular localization of multi-label proteins by incorporating the sequence features into Chou's PseAAC. , 2019, Genomics.

[8]  Ning Chen,et al.  Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding , 2017, Bioinform..

[9]  Liangjiang Wang,et al.  Prediction of LncRNA Subcellular Localization with Deep Learning from Sequence Features , 2018, Scientific Reports.

[10]  Mohamed Chaabane,et al.  circDeep: deep learning approach for circular RNA classification from other long non-coding RNA , 2019, Bioinform..

[11]  Oliver Weichenrieder,et al.  Structure of the PAN3 pseudokinase reveals the basis for interactions with the PAN2 deadenylase and the GW182 proteins. , 2013, Molecular cell.

[12]  C. Palmeira,et al.  The Role of microRNAs in Mitochondria: Small Players Acting Wide , 2014, Genes.

[13]  Patrick Ng,et al.  dna2vec: Consistent vector representations of variable-length k-mers , 2017, ArXiv.

[14]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[15]  A. Leung,et al.  The Whereabouts of microRNA Actions: Cytoplasm and Beyond. , 2015, Trends in cell biology.

[16]  Mathieu Blanchette,et al.  Prediction of mRNA subcellular localization using deep recurrent neural networks , 2019, Bioinform..

[17]  Ole Winther,et al.  DeepLoc: prediction of protein subcellular localization using deep learning , 2017, Bioinform..

[18]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[19]  Noorul Amin,et al.  Evaluation of deep learning in non-coding RNA classification , 2019, Nature Machine Intelligence.

[20]  Xiangxiang Zeng,et al.  Predicting disease-associated circular RNAs using deep forests combined with positive-unlabeled learning methods , 2020, Briefings Bioinform..

[21]  C. Cogoni,et al.  MicroRNA in Control of Gene Expression: An Overview of Nuclear Functions , 2016, International journal of molecular sciences.

[22]  Dong Wang,et al.  iLoc‐lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC , 2018, Bioinform..

[23]  C. Bramham,et al.  Dendritic mRNA: transport, translation and function , 2007, Nature Reviews Neuroscience.

[24]  James M. Hogan,et al.  Distributed Representations for Biological Sequence Analysis , 2016, ArXiv.

[25]  Stefanie Nowak,et al.  Performance measures for multilabel evaluation: a case study in the area of image classification , 2010, MIR '10.

[26]  Abdur Rehman,et al.  Accuracy Based Feature Ranking Metric for Multi-Label Text Classification , 2017 .

[27]  Yan Huang,et al.  RNALocate: a resource for RNA subcellular localizations , 2016, Nucleic Acids Res..

[28]  Xuefei Shi,et al.  Long non-coding RNAs: a new frontier in the study of human diseases. , 2013, Cancer letters.

[29]  Lila Kari,et al.  An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes , 2018, bioRxiv.

[30]  M. Esteller Non-coding RNAs in human disease , 2011, Nature Reviews Genetics.

[31]  Alessandro Vullo,et al.  Ensembl 2017 , 2016, Nucleic Acids Res..

[32]  Xuanjing Huang,et al.  Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.

[33]  E. Izaurralde,et al.  Towards a molecular understanding of microRNA-mediated gene silencing , 2015, Nature Reviews Genetics.

[34]  Pan Hui,et al.  DeepHealth: Deep Learning for Health Informatics , 2019, ArXiv.

[35]  Washington Seattle An integrated encyclopedia of DNA elements in the human genome , 2016 .

[36]  Hai Zhao,et al.  Prediction of MicroRNA Subcellular Localization by Using a Sequence-to-Sequence Model , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[37]  Zhen Cao,et al.  The lncLocator: a subcellular localization predictor for long non‐coding RNAs based on a stacked ensemble classifier , 2018, Bioinform..

[38]  Mohammad S. Sorower A Literature Survey on Algorithms for Multi-label Learning , 2010 .

[39]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[40]  G Salton,et al.  Developments in Automatic Text Retrieval , 1991, Science.

[41]  J. Mendell,et al.  Functional Classification and Experimental Dissection of Long Noncoding RNAs , 2018, Cell.

[42]  Florent Perronnin,et al.  Aggregating Continuous Word Embeddings for Information Retrieval , 2013, CVSM@ACL.

[43]  V. Ambros The functions of animal microRNAs , 2004, Nature.

[44]  Mathieu Blanchette,et al.  CeFra-seq reveals broad asymmetric mRNA and noncoding RNA distribution profiles in Drosophila and human cells , 2018, RNA.

[45]  Samy Bengio,et al.  Order Matters: Sequence to sequence for sets , 2015, ICLR.

[46]  Eric Lécuyer,et al.  RNA localization: Making its way to the center stage. , 2017, Biochimica et biophysica acta. General subjects.