An improved efficient rotation forest algorithm to predict the interactions among proteins

Protein–protein interactions (PPIs) are the basis to interpret biological mechanisms of life activity, and play vital roles in the execution of various cellular processes. The development of computer technology provides a new way for the effective prediction of PPIs and greatly arouses people’s interest. The challenge of this task is that PPIs data is typically represented in high-dimensional and is likely to contain noise, which will greatly affect the performance of the classifier. In this paper, we propose a novel feature weighted rotation forest algorithm (FWRF) to solve this problem. We calculate the weight of the feature by the $$\chi ^{2}$$χ2 statistical method and remove the low weight value features according to the selection rate. With this FWRF algorithm, the proposed method can eliminate the interference of useless information and make full use of the useful features to predict the interactions among proteins. In cross-validation experiment, our method obtained excellent prediction performance with the average accuracy, precision, sensitivity, MCC and AUC of 91.91, 92.51, 91.22, 83.84 and 91.60% on the H. pylori data set. We compared our method with other existing methods and the well-known classifiers, such as SVM and original rotation forest on the H. pylori data set. In addition, in order to demonstrate the ability of the FWRF algorithm, we also verified on the Yeast data set. The experimental results show that our method is more effective and robust in predicting PPIs. As a web server, the source code, H. pylori data sets and Yeast data sets used in this article are freely available at http://202.119.201.126:8888/FWRF/.

[1]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[2]  Ville Ojansivu,et al.  Blur Insensitive Texture Classification Using Local Phase Quantization , 2008, ICISP.

[3]  Bernhard Sendhoff,et al.  Pareto-Based Multiobjective Machine Learning: An Overview and Case Studies , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[4]  Loris Nanni,et al.  An ensemble of K-local hyperplanes for predicting protein-protein interactions , 2006, Bioinform..

[5]  Edwin Olson,et al.  A General Purpose Feature Extractor for Light Detection and Ranging Data , 2010, Sensors.

[6]  G. Mengozzi,et al.  Assessment of Diagnostic and Prognostic Role of Copeptin in the Clinical Setting of Sepsis , 2016, BioMed research international.

[7]  Yong Zhou,et al.  Ens-PPI: A Novel Ensemble Classifier for Predicting the Interactions of Proteins Using Autocovariance Transformation from PSSM , 2016, BioMed research international.

[8]  A. Valencia,et al.  Similarity of phylogenetic trees as indicator of protein-protein interaction. , 2001, Protein engineering.

[9]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[10]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[11]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[12]  Bernhard Sendhoff,et al.  A framework for evolutionary optimization with approximate fitness functions , 2002, IEEE Trans. Evol. Comput..

[13]  Zhen Ji,et al.  High-throughput DNA sequence data compression , 2015, Briefings Bioinform..

[14]  Jean-Loup Faulon,et al.  Predicting protein-protein interactions using signature products , 2005, Bioinform..

[15]  Zhu-Hong You,et al.  Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data , 2010, Bioinform..

[16]  MengChu Zhou,et al.  Highly Efficient Framework for Predicting Interactions Between Proteins , 2017, IEEE Transactions on Cybernetics.

[17]  Zhu-Hong You,et al.  Increasing the reliability of protein-protein interaction networks via non-convex semantic embedding , 2013, Neurocomputing.

[18]  Zhu-Hong You,et al.  Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis , 2013, BMC Bioinformatics.

[19]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[21]  Zhu-Hong You,et al.  t-LSE: A Novel Robust Geometric Approach for Modeling Protein-Protein Interaction Networks , 2013, PloS one.

[22]  Zhen Ji,et al.  DNA Sequence Compression Using Adaptive Particle Swarm Optimization-Based Memetic Algorithm , 2011, IEEE Transactions on Evolutionary Computation.

[23]  Yaochu Jin,et al.  Fuzzy modeling of high-dimensional systems: complexity reduction and interpretability improvement , 2000, IEEE Trans. Fuzzy Syst..

[24]  Aiguo Song,et al.  Psychophysiological classification and experiment study for spontaneous EEG based on two novel mental tasks. , 2015, Technology and health care : official journal of the European Society for Engineering and Medicine.

[25]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[26]  Loris Nanni,et al.  Hyperplanes for predicting protein-protein interactions , 2005, Neurocomputing.

[27]  Athanasios K. Tsakalidis,et al.  Computational Approaches for the Prediction of Protein-Protein Interactions: A Survey , 2011 .

[28]  Yiwen Sun,et al.  Three-dimensional Gabor feature extraction for hyperspectral imagery classification using a memetic framework , 2015, Inf. Sci..

[29]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Leif E. Peterson,et al.  Transcriptome analysis of human adipocytes implicates the NOD-like receptor pathway in obesity-induced adipose inflammation , 2014, Molecular and Cellular Endocrinology.

[31]  Bin Liu,et al.  QChIPat: a quantitative method to identify distinct binding patterns for two biological ChIP-seq samples in different experimental conditions , 2013, BMC Genomics.

[32]  Vincent Schächter,et al.  erratum: The protein–protein interaction map of Helicobacter pylori , 2001, Nature.

[33]  Yanli Yang,et al.  CompMap: a reference-based compression program to speed up read mapping to related reference sequences , 2015, Bioinform..

[34]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Loris Nanni,et al.  Ensemble generation and feature selection for the identification of students with learning disabilities , 2009, Expert Syst. Appl..

[36]  Ozlem Keskin,et al.  A survey of available tools and web servers for analysis of protein-protein interactions and interfaces , 2008, Briefings Bioinform..

[37]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[38]  Ioannis Xenarios,et al.  Mining literature for protein-protein interactions , 2001, Bioinform..

[39]  Anton J. Enright,et al.  Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.

[40]  Benno Schwikowski,et al.  Discovering regulatory and signalling circuits in molecular interaction networks , 2002, ISMB.

[41]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[42]  David A. Gough,et al.  Whole-proteome interaction mining , 2003, Bioinform..

[43]  Youxian Sun,et al.  Fault Diagnosis Based on Fuzzy Support Vector Machine with Parameter Tuning and Feature Selection , 2007 .

[44]  Xiaobo Zhou,et al.  A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network , 2010, BMC Bioinformatics.

[45]  Edwin Olson,et al.  Structure tensors for general purpose LIDAR feature extraction , 2011, 2011 IEEE International Conference on Robotics and Automation.

[46]  Xiaobo Zhou,et al.  Using iterative cluster merging with improved gap statistics to perform online phenotype discovery in the context of high-throughput RNAi screens , 2008, BMC Bioinformatics.

[47]  Min Zhu,et al.  Using ensemble methods to deal with imbalanced data in predicting protein-protein interactions , 2012, Comput. Biol. Chem..

[48]  Zhu-Hong You,et al.  Predicting dynamic deformation of retaining structure by LSSVR-based time series method , 2014, Neurocomputing.

[49]  J. Wojcik,et al.  The protein–protein interaction map of Helicobacter pylori , 2001, Nature.

[50]  A. Valencia,et al.  Correlated mutations contain information about protein-protein interaction. , 1997, Journal of molecular biology.