MPPIF-Net: Identification of Plasmodium Falciparum Parasite Mitochondrial Proteins Using Deep Features with Multilayer Bi-directional LSTM

Mitochondrial proteins of Plasmodium falciparum (MPPF) are an important target for anti-malarial drugs, but their identification through manual experimentation is costly, and in turn, their related drugs production by pharmaceutical institutions involves a prolonged time duration. Therefore, it is highly desirable for pharmaceutical companies to develop computationally automated and reliable approach to identify proteins precisely, resulting in appropriate drug production in a timely manner. In this direction, several computationally intelligent techniques are developed to extract local features from biological sequences using machine learning methods followed by various classifiers to discriminate the nature of proteins. Unfortunately, these techniques demonstrate poor performance while capturing contextual features from sequence patterns, yielding non-representative classifiers. In this paper, we proposed a sequence-based framework to extract deep and representative features that are trust-worthy for Plasmodium mitochondrial proteins identification. The backbone of the proposed framework is MPPF identification-net (MPPFI-Net), that is based on a convolutional neural network (CNN) with multilayer bi-directional long short-term memory (MBD-LSTM). MPPIF-Net inputs protein sequences, passes through various convolution and pooling layers to optimally extract learned features. We pass these features into our sequence learning mechanism, MBD-LSTM, that is particularly trained to classify them into their relevant classes. Our proposed model is experimentally evaluated on newly prepared dataset PF2095 and two existing benchmark datasets i.e., PF175 and MPD using the holdout method. The proposed method achieved 97.6%, 97.1%, and 99.5% testing accuracy on PF2095, PF175, and MPD datasets, respectively, which outperformed the state-of-the-art approaches.

[1]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[2]  X. Chen,et al.  SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence , 2003, Nucleic Acids Res..

[3]  Gajendra P. S. Raghava,et al.  ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST , 2004, Nucleic Acids Res..

[4]  Cangzhi Jia,et al.  Prediction of mitochondrial proteins of malaria parasite using bi-profile Bayes feature extraction. , 2011, Biochimie.

[5]  Eoin Fahy,et al.  MITOPRED: a genome-scale method for prediction of nucleus-encoded mitochondrial proteins , 2004, Bioinform..

[6]  Khurshid Ahmad,et al.  Identification of DNA binding proteins using evolutionary profiles position specific scoring matrix , 2016, Neurocomputing.

[7]  Sung Wook Baik,et al.  Activity Recognition Using Temporal Optical Flow Convolutional Features and Multilayer LSTM , 2019, IEEE Transactions on Industrial Electronics.

[8]  G. Raghava,et al.  Prediction of mitochondrial proteins of malaria parasite using split amino acid composition and PSSM profile , 2010, Amino Acids.

[9]  Qian-zhong Li,et al.  Using increment of diversity to predict mitochondrial proteins of malaria parasite: integrating pseudo-amino acid composition and structural alphabet , 2010, Amino Acids.

[10]  Manish Kumar,et al.  Proteome-wide prediction and annotation of mitochondrial and sub-mitochondrial proteins by incorporating domain information. , 2017, Mitochondrion.

[11]  Castrense Savojardo,et al.  DeepMito: accurate prediction of protein sub-mitochondrial localization using convolutional neural networks , 2019, Bioinform..

[12]  Photini Sinnis,et al.  Important Extracellular Interactions between Plasmodium Sporozoites and Host Cells Required for Infection. , 2019, Trends in parasitology.

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Sung Wook Baik,et al.  Cloud-Assisted Multiview Video Summarization Using CNN and Bidirectional LSTM , 2020, IEEE Transactions on Industrial Informatics.

[15]  Asifullah Khan,et al.  MemHyb: predicting membrane protein types by hybridizing SAAC and PSSM. , 2012, Journal of theoretical biology.

[16]  Michael J. Devine,et al.  Mitochondria at the neuronal presynapse in health and disease , 2018, Nature Reviews Neuroscience.

[17]  Sung Wook Baik,et al.  DeepStar: Detecting Starring Characters in Movies , 2019, IEEE Access.

[18]  Minghui Wang,et al.  Predicting protein submitochondrial locations by incorporating the pseudo-position specific scoring matrix into the general Chou's pseudo-amino acid composition. , 2018, Journal of theoretical biology.

[19]  De-Shuang Huang,et al.  Weakly-Supervised Convolutional Neural Network Architecture for Predicting Protein-DNA Binding , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  David K. Gifford,et al.  Convolutional neural network architectures for predicting DNA–protein binding , 2016, Bioinform..

[21]  F. Tan,et al.  Prediction of mitochondrial proteins based on genetic algorithm – partial least squares and support vector machine , 2007, Amino Acids.

[22]  Sung Wook Baik,et al.  Short-Term Prediction of Residential Power Energy Consumption via CNN and Multi-Layer Bi-Directional LSTM Networks , 2020, IEEE Access.

[23]  G. Schneider,et al.  Properties and prediction of mitochondrial transit peptides from Plasmodium falciparum. , 2003, Molecular and biochemical parasitology.

[24]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[25]  Xiujun Gong,et al.  On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach , 2017, PloS one.

[26]  Sung Wook Baik,et al.  Cover the Violence: A Novel Deep-Learning-Based Approach Towards Violence-Detection in Movies , 2019, Applied Sciences.

[27]  Mustaqeem,et al.  A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition , 2019, Sensors.

[28]  Ya Ding,et al.  Mitochondria: promising organelle targets for cancer diagnosis and treatment. , 2018, Biomaterials science.

[29]  Tariq Habib Afridi,et al.  Mito-GSAAC: mitochondria prediction using genetic ensemble classifier and split amino acid composition , 2012, Amino Acids.

[30]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[31]  H. Ding,et al.  Identification of mitochondrial proteins of malaria parasite using analysis of variance , 2014, Amino Acids.

[32]  Kelong Wang,et al.  Prediction of Mitochondrial Proteins Using Discrete Wavelet Transform , 2006, The protein journal.