A novel deep learning-assisted hybrid network for plasmodium falciparum parasite mitochondrial proteins classification

Plasmodium falciparum is a parasitic protozoan that can cause malaria, which is a deadly disease. Therefore, the accurate identification of malaria parasite mitochondrial proteins is essential for understanding their functions and identifying novel drug targets. For classifying protein sequences, several adaptive statistical techniques have been devised. Despite significant gains, prediction performance is still constrained by the lack of appropriate feature descriptors and learning strategies in current systems. Moreover, good ground truth data is important for Artificial Intelligence (AI)-based models but there is a lack of that data in the literature. Therefore, in this work, we propose a novel hybrid network that combines 1D Convolutional Neural Network (CNN) and Bidirectional Gated Recurrent Unit (BGRU) to classify the malaria parasite mitochondrial proteins. Furthermore, we curate a sequential data that are collected from National Center for Biotechnology Information (NCBI) and UniProtKB/Swiss-Prot proteins databanks to prepare a dataset that can be used by the research community for AI-based algorithms evaluation. We obtain 4204 cases after preprocessing of the collected data and denote this set of proteins as PF4204. Finally, we conduct an ablation study on several conventional and deep models using PF4204 and the benchmark PF2095 datasets. The proposed model ‘CNN-BGRU’ obtains the accuracy values of 0.9096 and 0.9857 on PF4204 and PF2095 datasets, respectively. In addition, the CNN-BGRU is compared with state-of-the-arts, where the results illustrate that it can extract robust features and identify proteins accurately.

[1]  Muhammad Islam,et al.  To Assist Oncologists: An Efficient Machine Learning-Based Approach for Anti-Cancer Peptides Classification , 2022, Sensors.

[2]  C. D. Ruberto,et al.  An Empirical Evaluation of Convolutional Networks for Malaria Diagnosis , 2022, J. Imaging.

[3]  Dada Emmanuel Gbenga,et al.  A Novel Data Augmentation Convolutional Neural Network for Detecting Malaria Parasite in Blood Smear Images , 2022, Appl. Artif. Intell..

[4]  Sung Wook Baik,et al.  Atrous Convolutions and Residual GRU Based Architecture for Matching Power Demand with Supply , 2021, Sensors.

[5]  Mi Young Lee,et al.  AB-Net: A Novel Deep Learning Assisted Framework for Renewable Energy Generation Forecasting , 2021, Mathematics.

[6]  Sung Wook Baik,et al.  DB-Net: A novel dilated CNN based multi-step forecasting model for power consumption in integrated local energy systems , 2021 .

[7]  Kinde Anlay Fante,et al.  Malaria parasite detection in thick blood smear microscopic images using modified YOLOV3 and YOLOV4 models , 2020, BMC Bioinformatics.

[8]  Sung Wook Baik,et al.  Batteries State of Health Estimation via Efficient Neural Networks With Multiple Channel Charging Profiles , 2021, IEEE Access.

[9]  Sezen Vatansever,et al.  Artificial intelligence and machine learning‐aided drug discovery in central nervous system diseases: State‐of‐the‐arts and future directions , 2020, Medicinal research reviews.

[10]  Sung Wook Baik,et al.  Towards Efficient Building Designing: Heating and Cooling Load Prediction via Multi-Output Model , 2020, Sensors.

[11]  X. Su,et al.  Host-Malaria Parasite Interactions and Impacts on Mutual Evolution , 2020, Frontiers in Cellular and Infection Microbiology.

[12]  Sung Wook Baik,et al.  SD-Net: Understanding overcrowded scenes in real-time via an efficient dilated convolutional neural network , 2020, Journal of Real-Time Image Processing.

[13]  Maozu Guo,et al.  Recognition of Mitochondrial Proteins in Plasmodium Based on the Tripeptide Composition , 2020, Frontiers in Cell and Developmental Biology.

[14]  Samee Ullah Khan,et al.  MPPIF-Net: Identification of Plasmodium Falciparum Parasite Mitochondrial Proteins Using Deep Features with Multilayer Bi-directional LSTM , 2020, Processes.

[15]  De-Shuang Huang,et al.  Weakly-Supervised Convolutional Neural Network Architecture for Predicting Protein-DNA Binding , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Castrense Savojardo,et al.  DeepMito: accurate prediction of protein sub-mitochondrial localization using convolutional neural networks , 2019, Bioinform..

[17]  Dong-Qing Wei,et al.  PredT4SE-Stack: Prediction of Bacterial Type IV Secreted Effectors From Protein Sequences Using a Stacked Ensemble Method , 2018, Front. Microbiol..

[18]  Ya Ding,et al.  Mitochondria: promising organelle targets for cancer diagnosis and treatment. , 2018, Biomaterials science.

[19]  Minghui Wang,et al.  Predicting protein submitochondrial locations by incorporating the pseudo-position specific scoring matrix into the general Chou's pseudo-amino acid composition. , 2018, Journal of theoretical biology.

[20]  F. Villarroya,et al.  Mitochondrial DNA and TLR9 drive muscle inflammation upon Opa1 deficiency , 2018, The EMBO journal.

[21]  Oliver Billker,et al.  Single-cell RNA-seq reveals hidden transcriptional variation in malaria parasites , 2018, eLife.

[22]  Michael J. Devine,et al.  Mitochondria at the neuronal presynapse in health and disease , 2018, Nature Reviews Neuroscience.

[23]  Xiujun Gong,et al.  On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach , 2017, PloS one.

[24]  The UniProt Consortium UniProt: the universal protein knowledgebase , 2016, Nucleic Acids Res..

[25]  Pedro Alonso,et al.  Malaria: Global progress 2000 – 2015 and future challenges , 2016, Infectious Diseases of Poverty.

[26]  David K. Gifford,et al.  Convolutional neural network architectures for predicting DNA–protein binding , 2016, Bioinform..

[27]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[28]  Sun-Yuan Kung,et al.  mPLR-Loc: an adaptive decision multi-label classifier based on penalized logistic regression for protein subcellular localization prediction. , 2015, Analytical biochemistry.

[29]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[30]  Asifullah Khan,et al.  MemHyb: predicting membrane protein types by hybridizing SAAC and PSSM. , 2012, Journal of theoretical biology.

[31]  Tariq Habib Afridi,et al.  Mito-GSAAC: mitochondria prediction using genetic ensemble classifier and split amino acid composition , 2012, Amino Acids.

[32]  Cangzhi Jia,et al.  Prediction of mitochondrial proteins of malaria parasite using bi-profile Bayes feature extraction. , 2011, Biochimie.

[33]  Kenji Mizuguchi,et al.  Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites , 2010, Bioinform..

[34]  Sandra M. Fernández-Moya,et al.  Posttranscriptional control and the role of RNA‐binding proteins in gene regulation in trypanosomatid protozoan parasites , 2010, Wiley interdisciplinary reviews. RNA.

[35]  G. Raghava,et al.  Prediction of mitochondrial proteins of malaria parasite using split amino acid composition and PSSM profile , 2010, Amino Acids.

[36]  F. Cox History of the discovery of the malaria parasites and their vectors , 2010, Parasites & Vectors.

[37]  G. Schneider,et al.  Properties and prediction of mitochondrial transit peptides from Plasmodium falciparum. , 2003, Molecular and biochemical parasitology.

[38]  X. Chen,et al.  SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence , 2003, Nucleic Acids Res..

[39]  Jonathan E. Allen,et al.  Genome sequence of the human malaria parasite Plasmodium falciparum , 2002, Nature.