Prediction of MicroRNA Subcellular Localization by Using a Sequence-to-Sequence Model

The subcellular localization of microRNAs (miR-NAs) is closely related with their biological functions. Some recent studies have discovered that microRNAs can target to various cellular compartments, and have abundant localization patterns in cells. However, to the best of our knowledge, there has been no computational tool for predicting miRNA subcellular locations to date. The major reason is that the lack of useful information source largely limits the prediction performance using traditional statistical learning approaches. In this study, we regard this prediction task as a Sequence-to-Sequence learning process and propose an attention-based encoder-decoder model, miRLocator, to identify subcellular locations of human miRNAs. The designed miRLocator uses a bidirectional long short-term memory (BiLSTM) module to encode the input sequences, and an LSTM module to decode these context vectors as location sets. Especially, a new encoding method for RNAs, RNA2Vec, and an entropy-based method are incorporated in the model to determine the input and output representations, respectively. The experimental results show that miRLocator achieves promising prediction accuracy with the limited input information, and outperforms the models using hand-designed features and conventional RNN models.