A brief survey of Machine Learning Methods in Identification of Mitochondria Proteins in Malaria Parasite.

The number of human death caused by malaria is increasing day-by-day. In fact, the mitochondrial proteins of the malaria parasite play vital roles in the organism. For developing effective drugs and vaccines against infection, it is necessary to accurately identify mitochondrial proteins of the malaria parasite. Although, precise details for the mitochondrial proteins can be provided by biochemical experiments, they are expensive and time-consuming. In this review, we summarized the machine learning-based methods for mitochondrial proteins identification in malaria parasite and compared the construction strategies of these computational methods. Finally, we also discussed the future development of mitochondrial proteins recognition with algorithms.

[1]  Yu-Dong Cai,et al.  Prediction of protein-peptide interaction with nearest neighbor algorithm , 1969 .

[2]  R. Laxton The measure of diversity. , 1978, Journal of theoretical biology.

[3]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[4]  H. Hofmann,et al.  On the theoretical prediction of protein antigenic determinants from amino acid sequences. , 1987, Biomedica biochimica acta.

[5]  C. Zhang,et al.  Predicting protein folding types by distance functions that make allowances for amino acid interactions. , 1994, The Journal of biological chemistry.

[6]  K. Chou A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition space , 1995, Proteins.

[7]  K. Chou,et al.  Prediction of protein secondary structure content. , 1999, Protein engineering.

[8]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001 .

[9]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[10]  Q. Z. Li,et al.  The prediction of the structural class of protein: application of the measure of diversity. , 2001, Journal of theoretical biology.

[11]  G. Schneider,et al.  Properties and prediction of mitochondrial transit peptides from Plasmodium falciparum. , 2003, Molecular and biochemical parasitology.

[12]  V. V. Krishnan,et al.  Protein structural class identification directly from NMR spectra using averaged chemical shifts , 2003, Bioinform..

[13]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[14]  V. Krishnan,et al.  An empirical correlation between secondary structure content and averaged chemical shifts in proteins. , 2003, Biophysical journal.

[15]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  A. Cappello,et al.  Feature selection of stabilometric parameters based on principal component analysis , 2006, Medical and Biological Engineering and Computing.

[17]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[18]  K. Chou,et al.  Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. , 2006, Biochemical and biophysical research communications.

[19]  Peer Bork,et al.  SMART 5: domains in the context of genomes and networks , 2005, Nucleic Acids Res..

[20]  Narmada Thanki,et al.  CDD: a conserved domain database for interactive domain family analysis , 2006, Nucleic Acids Res..

[21]  M. Mather,et al.  Mitochondria in malaria and related parasites: ancient, diverse and streamlined , 2008, Journal of bioenergetics and biomembranes.

[22]  F. Prado-Prado,et al.  Predicting antimicrobial drugs and targets with the MARCH-INSIDE approach. , 2008, Current topics in medicinal chemistry.

[23]  Lourdes Santana,et al.  Proteomics, networks and connectivity indices , 2008, Proteomics.

[24]  Rachid Aissaoui,et al.  Automatic Classification of Asymptomatic and Osteoarthritis Knee Gait Patterns Using Kinematic Data Features and the Nearest Neighbor Classifier , 2008, IEEE Transactions on Biomedical Engineering.

[25]  Qian-zhong Li,et al.  Using reduced amino acid composition to predict defensin family and subfamily: Integrating similarity measure and structural alphabet , 2009, Peptides.

[26]  G. Raghava,et al.  Prediction of mitochondrial proteins of malaria parasite using split amino acid composition and PSSM profile , 2010, Amino Acids.

[27]  Dong Xu,et al.  Computational Identification of Protein Methylation Sites through Bi-Profile Bayes Feature Extraction , 2009, PloS one.

[28]  M. Mather,et al.  Mitochondrial evolution and functions in malaria parasites. , 2009, Annual review of microbiology.

[29]  Hao Lin,et al.  Prediction of subcellular location of mycobacterial protein using feature selection techniques , 2010, Molecular Diversity.

[30]  Qian-zhong Li,et al.  Using K-minimum increment of diversity to predict secretory proteins of malaria parasite based on groupings of amino acids , 2010, Amino Acids.

[31]  Humberto González Díaz,et al.  Computational chemistry study of 3D‐structure‐function relationships for enzymes based on Markov models for protein electrostatic, HINT, and van der Waals potentials , 2009, J. Comput. Chem..

[32]  Gajendra P. S. Raghava,et al.  Identification of ATP binding residues of a protein from its primary sequence , 2009, BMC Bioinformatics.

[33]  Xia Wang,et al.  Predicting the state of cysteines based on sequence information. , 2010, Journal of theoretical biology.

[34]  M. Esmaeili,et al.  Using the concept of Chou's pseudo amino acid composition for risk type prediction of human papillomaviruses. , 2010, Journal of theoretical biology.

[35]  Geoffrey I. Webb,et al.  Cascleave: towards more accurate prediction of caspase substrate cleavage sites , 2010, Bioinform..

[36]  Qian-zhong Li,et al.  Using increment of diversity to predict mitochondrial proteins of malaria parasite: integrating pseudo-amino acid composition and structural alphabet , 2010, Amino Acids.

[37]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[38]  Meng-long Li,et al.  Identification of RNA-binding sites in proteins by integrating various sequence information , 2010, Amino Acids.

[39]  Xiuzhen Hu,et al.  Predicting enzyme subclasses by using support vector machine with composite vectors. , 2010, Protein and peptide letters.

[40]  Jianding Qiu,et al.  Using the concept of Chou's pseudo amino acid composition to predict enzyme family classes: an approach with support vector machine based on discrete wavelet transform. , 2010, Protein and peptide letters.

[41]  W. Martin,et al.  The energetics of genome complexity , 2010, Nature.

[42]  Yanzhi Guo,et al.  Prediction of Lipid-Binding Sites Based on Support Vector Machine and Position Specific Scoring Matrix , 2010, The protein journal.

[43]  Asifullah Khan,et al.  G-protein-coupled receptor prediction using pseudo-amino-acid composition and multiscale energy representation of different physiochemical properties. , 2011, Analytical biochemistry.

[44]  Qian-zhong Li,et al.  Predicting protein submitochondria locations by combining different descriptors into the general form of Chou’s pseudo amino acid composition , 2011, Amino Acids.

[45]  Asifullah Khan,et al.  Erratum to: GPCR-MPredictor: multi-level prediction of G protein-coupled receptors using genetic ensemble , 2011, Amino Acids.

[46]  Asifullah Khan,et al.  Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. , 2011, Journal of theoretical biology.

[47]  Cangzhi Jia,et al.  Prediction of mitochondrial proteins of malaria parasite using bi-profile Bayes feature extraction. , 2011, Biochimie.

[48]  C. Pace,et al.  Contribution of hydrophobic interactions to protein stability. , 2011, Journal of molecular biology.

[49]  Tariq Habib Afridi,et al.  Mito-GSAAC: mitochondria prediction using genetic ensemble classifier and split amino acid composition , 2012, Amino Acids.

[50]  Majid Mohammad Beigi,et al.  Predicting antibacterial peptides by the concept of Chou's pseudo-amino acid composition and machine learning methods. , 2012 .

[51]  Asifullah Khan,et al.  GPCR-MPredictor: multi-level prediction of G protein-coupled receptors using genetic ensemble , 2012, Amino acids.

[52]  Xia Wang,et al.  Prediction of flavin mono-nucleotide binding sites using modified PSSM profile and ensemble support vector machine , 2012, Comput. Biol. Medicine.

[53]  Dinesh Gupta,et al.  Identifying Bacterial Virulent Proteins by Fusing a Set of Classifiers Based on Variants of Chou's Pseudo Amino Acid Composition and on Evolutionary Information , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[54]  Majid Mohammad Beigi,et al.  Prediction of allergenic proteins by means of the concept of Chou's pseudo amino acid composition and a machine learning approach. , 2012 .

[55]  Tonghua Li,et al.  Identification of the subcellular localization of mycobacterial proteins using localization motifs. , 2012, Biochimie.

[56]  Wei Chen,et al.  Naïve Bayes Classifier with Feature Selection to Identify Phage Virion Proteins , 2013, Comput. Math. Methods Medicine.

[57]  Wei Chen,et al.  Identification of Antioxidants from Sequence Information Using Naïve Bayes , 2013, Comput. Math. Methods Medicine.

[58]  Muhammad Tahir,et al.  MitProt-Pred: Predicting mitochondrial proteins of Plasmodium falciparum parasite using diverse physiochemical properties and ensemble classification , 2013, Comput. Biol. Medicine.

[59]  H. Ding,et al.  Identification of mitochondrial proteins of malaria parasite using analysis of variance , 2014, Amino Acids.

[60]  Wei Chen,et al.  Predicting the Types of J-Proteins Using Clustered Amino Acids , 2014, BioMed research international.

[61]  K. Chou,et al.  iCTX-Type: A Sequence-Based Predictor for Identifying the Types of Conotoxins in Targeting Ion Channels , 2014, BioMed research international.

[62]  H. Mohabatkar,et al.  Predicting anticancer peptides with Chou's pseudo amino acid composition and investigating their mutagenicity via Ames test. , 2014, Journal of theoretical biology.

[63]  Wei Chen,et al.  Predicting peroxidase subcellular location by hybridizing different descriptors of Chou' pseudo amino acid patterns. , 2014, Analytical biochemistry.

[64]  K. Kita,et al.  Mitochondria of Malaria Parasites as a Drug Target , 2015 .

[65]  B. Liu,et al.  Protein remote homology detection by combining Chou’s distance-pair pseudo amino acid composition and principal component analysis , 2015, Molecular Genetics and Genomics.

[66]  Wei Chen,et al.  Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition. , 2015, Molecular bioSystems.

[67]  Yongchun Zuo,et al.  iDPF-PseRAAAC: A Web-Server for Identifying the Defensin Peptide Family and Subfamily Using Pseudo Reduced Amino Acid Alphabet Composition , 2015, PloS one.

[68]  Ying Ju,et al.  Improving tRNAscan‐SE Annotation Results via Ensemble Classifiers , 2015, Molecular informatics.

[69]  Wei Chen,et al.  Identification of apolipoprotein using feature selection technique , 2016, Scientific Reports.

[70]  Ying Ju,et al.  Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy , 2016, BMC Systems Biology.

[71]  Hua Tang,et al.  Identification of immunoglobulins using Chou's pseudo amino acid composition with feature selection technique. , 2016, Molecular bioSystems.

[72]  Wei Chen,et al.  PHYPred: a tool for identifying bacteriophage enzymes and hydrolases , 2016, Virologica Sinica.

[73]  Hua Tang,et al.  Identification of Bacterial Cell Wall Lyases via Pseudo Amino Acid Composition , 2016, BioMed research international.

[74]  Qing Chang,et al.  Feature selection methods for big data bioinformatics: A survey from the search perspective. , 2016, Methods.

[75]  Hua Tang,et al.  Identification of Secretory Proteins in Mycobacterium tuberculosis Using Pseudo Amino Acid Composition , 2016, BioMed research international.

[76]  Liujuan Cao,et al.  A novel features ranking metric with application to scalable visual and bioinformatics data classification , 2016, Neurocomputing.

[77]  A. Smilde,et al.  Identification of Analytical Factors Affecting Complex Proteomics Profiles Acquired in a Factorial Design Study with Analysis of Variance: Simultaneous Component Analysis. , 2016, Analytical chemistry.

[78]  Wei Chen,et al.  Pro54DB: a database for experimentally verified sigma‐54 promoters , 2016, Bioinform..

[79]  Jing Ye,et al.  Predicting the Types of Plant Heat Shock Proteins , 2017 .

[80]  Jiajie Peng,et al.  InfAcrOnt: calculating cross-ontology term similarities using information flow by a random walk , 2018, BMC Genomics.

[81]  Jijun Tang,et al.  Predicting S-sulfenylation Sites Using Physicochemical Properties Differences , 2017 .

[82]  Wei Chen,et al.  AOD: the antioxidant protein database , 2017, Scientific Reports.

[83]  Z. Liao,et al.  Improved Identification of Cytokines Using Feature Selection Techniques , 2017 .

[84]  N. Xia,et al.  Using a Machine-Learning Approach to Predict Discontinuous Antibody-Specific B-Cell Epitopes , 2017 .

[85]  Wei Chen,et al.  iDNA4mC: identifying DNA N4‐methylcytosine sites based on nucleotide chemical properties , 2017, Bioinform..

[86]  Yue Zhao,et al.  RAID v2.0: an updated resource of RNA-associated interactions across organisms , 2016, Nucleic Acids Res..

[87]  Balachandran Manavalan,et al.  MLACP: machine-learning-based prediction of anticancer peptides , 2017, Oncotarget.

[88]  Jooyoung Lee,et al.  SVMQA: support‐vector‐machine‐based protein single‐model quality assessment , 2017, Bioinform..

[89]  Guangpeng Li,et al.  PseKRAAC: a flexible web server for generating pseudo K-tuple reduced amino acids composition , 2017, Bioinform..

[90]  Yan Huang,et al.  RNALocate: a resource for RNA subcellular localizations , 2016, Nucleic Acids Res..

[91]  Zhao Wei,et al.  Using Quadratic Discriminant Analysis to Predict Protein Secondary Structure Based on Chemical Shifts , 2017 .

[92]  Gwang Lee,et al.  PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine , 2018, Front. Microbiol..

[93]  Rong Chen,et al.  HBPred: a tool to identify growth hormone-binding proteins , 2018, International journal of biological sciences.

[94]  Adel Hafiane,et al.  Deep leaning approach with colorimetric spaces and vegetation indices for vine diseases detection in UAV images , 2018, Comput. Electron. Agric..

[95]  Yue Zhao,et al.  MNDR v2.0: an updated resource of ncRNA–disease associations in mammals , 2017, Nucleic Acids Res..

[96]  Kuo-Chen Chou,et al.  iRSpot-Pse6NC: Identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC , 2018, International journal of biological sciences.

[97]  Yang Zhang,et al.  WDL‐RF: predicting bioactivities of ligand molecules acting with G protein‐coupled receptors by combining weighted deep learning and random forest , 2018, Bioinform..

[98]  Wei Chen,et al.  Identifying RNA N6-Methyladenosine Sites in Escherichia coli Genome , 2018, Front. Microbiol..

[99]  Wei Chen,et al.  iRNA-2OM: A Sequence-Based Predictor for Identifying 2′-O-Methylation Sites in Homo sapiens , 2018, J. Comput. Biol..

[100]  J. Hess,et al.  Analysis of variance , 2018, Transfusion.

[101]  Xia Sun,et al.  Drug and Nondrug Classification Based on Deep Learning with Various Feature Selection Strategies , 2018 .

[102]  Jie Sun,et al.  DincRNA: a comprehensive web-based bioinformatics toolkit for exploring disease associations and ncRNA function , 2018, Bioinform..

[103]  Quan Zou,et al.  O‐GlcNAcPRED‐II: an integrated classification algorithm for identifying O‐GlcNAcylation sites based on fuzzy undersampling and a K‐means PCA oversampling technique , 2018, Bioinform..

[104]  Marcel H. Schulz,et al.  Predicting transcription factor binding using ensemble random forest models , 2018, F1000Research.

[105]  Balachandran Manavalan,et al.  DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest , 2017, bioRxiv.

[106]  Quan Zou,et al.  Incorporating Distance-based Top-n-gram and Random Forest to Identify Electron Transport Proteins. , 2019, Journal of proteome research.

[107]  Guangmin Liang,et al.  k-Skip-n-Gram-RF: A Random Forest Based Method for Alzheimer's Disease Protein Identification , 2019, Front. Genet..

[108]  Wei Chen,et al.  i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome , 2019, Bioinform..

[109]  Yongchun Zuo,et al.  EmExplorer: a database for exploring time activation of gene expression in mammalian embryos , 2019, Open Biology.

[110]  Feng Yonge,et al.  Identification of Mitochondrial Proteins of Malaria Parasite Adding the New Parameter , 2019, Letters in Organic Chemistry.

[111]  Q. Zou,et al.  Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA , 2018, RNA.

[112]  Xing Gao,et al.  Integration of deep feature representations and handcrafted features to improve the prediction of N6-methyladenosine sites , 2019, Neurocomputing.

[113]  Jiu-Xin Tan,et al.  Identification of hormone binding proteins based on machine learning methods. , 2019, Mathematical biosciences and engineering : MBE.

[114]  Quan Zou,et al.  ELM-MHC: An Improved MHC Identification Method with Extreme Learning Machine Algorithm. , 2019, Journal of proteome research.

[115]  Wei Chen,et al.  iProEP: A Computational Predictor for Predicting Promoter , 2019, Molecular therapy. Nucleic acids.

[116]  Xinyi Liu,et al.  Deep-Resp-Forest: A deep forest model to predict anti-cancer drug response. , 2019, Methods.

[117]  Meng Zhou,et al.  MetSigDis: a manually curated resource for the metabolic signatures of diseases , 2019, Briefings Bioinform..

[118]  Yan Lin,et al.  iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators , 2018, Bioinform..

[119]  Wei Chen,et al.  Predicting protein structural classes for low-similarity sequences by evaluating different features , 2019, Knowl. Based Syst..

[120]  Jian Huang,et al.  A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization , 2019, Current Bioinformatics.

[121]  Yongchun Zuo,et al.  Function determinants of TET proteins: the arrangements of sequence motifs with specific codes , 2019, Briefings Bioinform..

[122]  Qinghua Guo,et al.  LncRNA2Target v2.0: a comprehensive database for target genes of lncRNAs in human and mouse , 2018, Nucleic Acids Res..

[123]  Shuai Liu,et al.  Transcriptome Comparisons of Multi-Species Identify Differential Genome Activation of Mammals Embryogenesis , 2019, IEEE Access.

[124]  Jingpu Zhang,et al.  Predicting Gene Ontology Function of Human MicroRNAs by Integrating Multiple Networks , 2019, Frontiers in Genetics.

[125]  Jiu-Xin Tan,et al.  A Survey for Predicting Enzyme Family Classes Using Machine Learning Methods. , 2019, Current drug targets.

[126]  Hao Lin,et al.  Identifying Sigma70 Promoters with Novel Pseudo Nucleotide Composition , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[127]  Fu-Ying Dao,et al.  A comparison and assessment of computational method for identifying recombination hotspots in Saccharomyces cerevisiae , 2019, Briefings Bioinform..

[128]  A. Ballestrero,et al.  Development of a long non-coding RNA signature for prediction of response to neoadjuvant chemoradiotherapy in locally advanced rectal adenocarcinoma , 2020, PloS one.