Identification of Multi-Functional Enzyme with Multi-Label Classifier

Enzymes are important and effective biological catalyst proteins participating in almost all active cell processes. Identification of multi-functional enzymes is essential in understanding the function of enzymes. Machine learning methods perform better in protein structure and function prediction than traditional biological wet experiments. Thus, in this study, we explore an efficient and effective machine learning method to categorize enzymes according to their function. Multi-functional enzymes are predicted with a special machine learning strategy, namely, multi-label classifier. Sequence features are extracted from a position-specific scoring matrix with autocross-covariance transformation. Experiment results show that the proposed method obtains an accuracy rate of 94.1% in classifying six main functional classes through five cross-validation tests and outperforms state-of-the-art methods. In addition, 91.25% accuracy is achieved in multi-functional enzyme prediction, which is often ignored in other enzyme function prediction studies. The online prediction server and datasets can be accessed from the link http://server.malab.cn/MEC/.

[1]  C Y Wang,et al.  imDC: an ensemble learning method for imbalanced classification with miRNA data. , 2015, Genetics and molecular research : GMR.

[2]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[3]  Xiangxiang Zeng,et al.  Computing with viruses , 2016, Theor. Comput. Sci..

[4]  Xing Gao,et al.  An Improved Protein Structural Classes Prediction Method by Incorporating Both Sequence and Structure Information , 2015, IEEE Transactions on NanoBioscience.

[5]  R. Ji,et al.  Improved and Promising Identification of Human MicroRNAs by Incorporating a High-Quality Negative Set , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Junjie Chen,et al.  iMiRNA-SSF: Improving the Identification of MicroRNA Precursors by Combining Negative Sets with Different Distributions , 2016, Scientific Reports.

[7]  Sridhar Krishnan,et al.  Knee Joint Vibration Signal Analysis with Matching Pursuit Decomposition and Dynamic Weighted Classifier Fusion , 2013, Comput. Math. Methods Medicine.

[8]  Linqiang Pan,et al.  Spiking Neural P Systems With Rules on Synapses Working in Maximum Spikes Consumption Strategy , 2015, IEEE Transactions on NanoBioscience.

[9]  Grigorios Tsoumakas,et al.  An Empirical Study of Lazy Multilabel Classification Algorithms , 2008, SETN.

[10]  Eyke Hüllermeier,et al.  Combining instance-based learning and logistic regression for multilabel classification , 2009, Machine Learning.

[11]  Qinghua Hu,et al.  HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy , 2015, Bioinform..

[12]  Hua Tang,et al.  Identification of immunoglobulins using Chou's pseudo amino acid composition with feature selection technique. , 2016, Molecular bioSystems.

[13]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[14]  K. Chou,et al.  iRNA-Methyl: Identifying N(6)-methyladenosine sites using pseudo nucleotide composition. , 2015, Analytical biochemistry.

[15]  Hui Ding,et al.  The prediction of protein structural class using averaged chemical shifts , 2012, Journal of biomolecular structure & dynamics.

[16]  Wei Chen,et al.  iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. , 2014, Analytical biochemistry.

[17]  Hao Lin,et al.  Predicting subcellular localization of mycobacterial proteins by using Chou's pseudo amino acid composition. , 2008, Protein and peptide letters.

[18]  Hui Ding,et al.  Prediction of the types of ion channel-targeted conotoxins based on radial basis function network. , 2013, Toxicology in vitro : an international journal published in association with BIBRA.

[19]  Wei Chen,et al.  Predicting the Types of J-Proteins Using Clustered Amino Acids , 2014, BioMed research international.

[20]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[21]  Eyke Hüllermeier,et al.  Combining Instance-Based Learning and Logistic Regression for Multilabel Classification , 2009, ECML/PKDD.

[22]  Xiaolong Wang,et al.  iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach , 2016, Journal of biomolecular structure & dynamics.

[23]  Wei Chen,et al.  Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition. , 2015, Molecular bioSystems.

[24]  K. Chou,et al.  iCTX-Type: A Sequence-Based Predictor for Identifying the Types of Conotoxins in Targeting Ion Channels , 2014, BioMed research international.

[25]  Pengmian Feng,et al.  Prediction of DNase I Hypersensitive Sites by Using Pseudo Nucleotide Compositions , 2014, TheScientificWorldJournal.

[26]  Q. Zou,et al.  Hierarchical Classification of Protein Folds Using a Novel Ensemble Classifier , 2013, PloS one.

[27]  J. Skolnick,et al.  EFICAz: a comprehensive approach for accurate genome-scale enzyme function inference. , 2004, Nucleic acids research.

[28]  Q. Zou,et al.  Similarity computation strategies in the microRNA-disease network: a survey. , 2015, Briefings in functional genomics.

[29]  Xing Gao,et al.  Enhanced Protein Fold Prediction Method Through a Novel Feature Extraction Technique , 2015, IEEE Transactions on NanoBioscience.

[30]  Jeffrey Skolnick,et al.  EFICAz2.5: application of a high-precision enzyme function predictor to 396 proteomes , 2012, Bioinform..

[31]  Hao Lin,et al.  Prediction of cell wall lytic enzymes using Chou's amphiphilic pseudo amino acid composition. , 2009, Protein and peptide letters.

[32]  Jian Pei,et al.  Editorial [State of the Transactions] , 2014, IEEE Trans. Knowl. Data Eng..

[33]  Hui Ding,et al.  Identify Golgi protein types with modified Mahalanobis discriminant algorithm and pseudo amino acid composition. , 2011, Protein and peptide letters.

[34]  Shuigeng Zhou,et al.  A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation , 2009, Bioinform..

[35]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[36]  Junjie Chen,et al.  Application of learning to rank to protein remote homology detection , 2015, Bioinform..

[37]  Jano I. van Hemert,et al.  EnzML: multi-label prediction of enzyme classes using InterPro signatures , 2012, BMC Bioinformatics.

[38]  B. Liu,et al.  iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance-Pairs and Reduced Alphabet Profile into the General Pseudo Amino Acid Composition , 2014, PloS one.

[39]  Liujuan Cao,et al.  A novel features ranking metric with application to scalable visual and bioinformatics data classification , 2016, Neurocomputing.

[40]  Yong Huang,et al.  Identifying Multi-Functional Enzyme by Hierarchical Multi-Label Classifier , 2013 .

[41]  Ke Chen,et al.  Survey of MapReduce frame operation in bioinformatics , 2013, Briefings Bioinform..

[42]  Grigorios Tsoumakas,et al.  Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[43]  B. Liu,et al.  DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation , 2015, Scientific Reports.

[44]  B. Liu,et al.  An Approach for Identifying Cytokines Based on a Novel Ensemble Classifier , 2013, BioMed research international.

[45]  Ying Huang,et al.  EFICAz2: enzyme function inference by a combined approach enhanced by machine learning , 2009, BMC Bioinformatics.

[46]  K. Chou,et al.  iSS-PseDNC: Identifying Splicing Sites Using Pseudo Dinucleotide Composition , 2014, BioMed research international.

[47]  Leyi Wei,et al.  An Improved Protein Structural Classes Prediction Method by Incorporating Both Sequence and Structure Information. , 2015, IEEE transactions on nanobioscience.

[48]  Yair Neuman The Definition of Life and the Life of a Definition , 2012, Journal of biomolecular structure & dynamics.

[49]  Xiangxiang Zeng,et al.  Prediction and Validation of Disease Genes Using HeteSim Scores , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[50]  K. Chou,et al.  EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. , 2007, Biochemical and biophysical research communications.

[51]  Hao Lin,et al.  Identifying Antioxidant Proteins by Using Optimal Dipeptide Compositions , 2016, Interdisciplinary Sciences: Computational Life Sciences.

[52]  Xiangxiang Zeng,et al.  On Some Classes of Sequential Spiking Neural P Systems , 2014, Neural Computation.

[53]  Xiangxiang Zeng,et al.  Spiking Neural P Systems with Thresholds , 2014, Neural Computation.

[54]  Xiaolong Wang,et al.  Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection , 2013, Bioinform..

[55]  Ye Tian,et al.  An Efficient Approach to Non-dominated Sorting for Evolutionary Multi-objective Optimization , 2014 .

[56]  Xiangxiang Zeng,et al.  Small universal simple spiking neural P systems with weights , 2013, Science China Information Sciences.

[57]  Wei Chen,et al.  iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition , 2014, Nucleic acids research.

[58]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[59]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[60]  Andrei Paun,et al.  On the Universality of Axon P Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[61]  K. Chou,et al.  iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. , 2013, Analytical biochemistry.

[62]  B. Liu,et al.  PseDNA‐Pro: DNA‐Binding Protein Identification by Combining Chou’s PseAAC and Physicochemical Distance Transformation , 2015, Molecular informatics.

[63]  Linqiang Pan,et al.  On the Universality and Non-Universality of Spiking Neural P Systems With Rules on Synapses , 2015, IEEE Transactions on NanoBioscience.

[64]  Xiangxiang Zeng,et al.  Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks , 2016, Briefings Bioinform..

[65]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[66]  Hao Lin The modified Mahalanobis Discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition. , 2008, Journal of theoretical biology.

[67]  Shiow-Fen Hwang,et al.  Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method , 2007, Biosyst..

[68]  Ying Ju,et al.  Finding the Best Classification Threshold in Imbalanced Classification , 2016, Big Data Res..

[69]  Yanjun Liu,et al.  Computational power of tissue P systems for generating control languages , 2014, Inf. Sci..

[70]  Ye Tian,et al.  A Knee Point-Driven Evolutionary Algorithm for Many-Objective Optimization , 2015, IEEE Transactions on Evolutionary Computation.

[71]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[72]  Junjie Chen,et al.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[73]  Xiangxiang Zeng,et al.  nDNA-prot: identification of DNA-binding proteins based on unbalanced classification , 2014, BMC Bioinformatics.

[74]  Linqiang Pan,et al.  Spiking neural P systems with request rules , 2016, Neurocomputing.

[75]  Xiangxiang Zeng,et al.  Identification of cytokine via an improved genetic algorithm , 2014, Frontiers of Computer Science.

[76]  Wei Chen,et al.  Prediction of CpG island methylation status by integrating DNA physicochemical properties. , 2014, Genomics.

[77]  Wei Chen,et al.  iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition , 2014, Bioinform..

[78]  Xiaolong Wang,et al.  repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects , 2015, Bioinform..

[79]  Ren Long,et al.  iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition , 2016, Bioinform..

[80]  Chen Lin,et al.  LibD3C: Ensemble classifiers with a clustering and dynamic selection strategy , 2014, Neurocomputing.

[81]  B. Liu,et al.  Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection , 2012, PloS one.

[82]  Ye Tian,et al.  An Efficient Approach to Nondominated Sorting for Evolutionary Multiobjective Optimization , 2015, IEEE Transactions on Evolutionary Computation.

[83]  Q. Zou,et al.  A Global Characterization and Identification of Multifunctional Enzymes , 2012, PloS one.