Predicting protein-protein interactions using high-quality non-interacting pairs

BackgroundIdentifying protein-protein interactions (PPIs) is of paramount importance for understanding cellular processes. Machine learning-based approaches have been developed to predict PPIs, but the effectiveness of these approaches is unsatisfactory. One major reason is that they randomly choose non-interacting protein pairs (negative samples) or heuristically select non-interacting pairs with low quality.ResultsTo boost the effectiveness of predicting PPIs, we propose two novel approaches (NIP-SS and NIP-RW) to generate high quality non-interacting pairs based on sequence similarity and random walk, respectively. Specifically, the known PPIs collected from public databases are used to generate the positive samples. NIP-SS then selects the top-m dissimilar protein pairs as negative examples and controls the degree distribution of selected proteins to construct the negative dataset. NIP-RW performs random walk on the PPI network to update the adjacency matrix of the network, and then selects protein pairs not connected in the updated network as negative samples. Next, we use auto covariance (AC) descriptor to encode the feature information of amino acid sequences. After that, we employ deep neural networks (DNNs) to predict PPIs based on extracted features, positive and negative examples. Extensive experiments show that NIP-SS and NIP-RW can generate negative samples with higher quality than existing strategies and thus enable more accurate prediction.ConclusionsThe experimental results prove that negative datasets constructed by NIP-SS and NIP-RW can reduce the bias and have good generalization ability. NIP-SS and NIP-RW can be used as a plugin to boost the effectiveness of PPIs prediction. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=NIP.

[1]  Zhu-Hong You,et al.  Predicting Protein-Protein Interactions from Primary Protein Sequences Using a Novel Multi-Scale Local Feature Representation Scheme and the Random Forest , 2015, PloS one.

[2]  Kim-Kwang Raymond Choo,et al.  Spectral–spatial multi-feature-based deep learning for hyperspectral remote sensing image classification , 2016, Soft Computing.

[3]  Meng Zhang,et al.  Neural Network Methods for Natural Language Processing , 2017, Computational Linguistics.

[4]  Chengbang Huang,et al.  Predicting Protein-Protein Interactions from Protein Domains Using a Set Cover Approach , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Shmuel Sattath,et al.  How reliable are experimental protein-protein interaction data? , 2003, Journal of molecular biology.

[6]  Zhu-Hong You,et al.  Detecting Protein-Protein Interactions with a Novel Matrix-Based Protein Sequence Representation and Support Vector Machines , 2015, BioMed research international.

[7]  Purvesh Khatri,et al.  Onto-Tools: an ensemble of web-accessible, ontology-based tools for the functional design and interpretation of high-throughput gene expression experiments , 2004, Nucleic Acids Res..

[8]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[9]  Zyad Shaaban,et al.  Data Mining: A Preprocessing Engine , 2006 .

[10]  Jean-Loup Faulon,et al.  Predicting protein-protein interactions using signature products , 2005, Bioinform..

[11]  William Stafford Noble,et al.  Choosing negative examples for the prediction of protein-protein interactions , 2006, BMC Bioinformatics.

[12]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[13]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  A. Komoriya,et al.  Local interactions as a structure determinant for protein molecules: II. , 1979, Biochimica et biophysica acta.

[16]  C. Tanford Contribution of Hydrophobic Interactions to the Stability of the Globular Conformation of Proteins , 1962 .

[17]  Sophie Alvarez,et al.  Data on the identification of protein interactors with the Evening Complex and PCH1 in Arabidopsis using tandem affinity purification and mass spectrometry (TAP–MS) , 2016, Data in brief.

[18]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[19]  A. Valencia,et al.  Similarity of phylogenetic trees as indicator of protein-protein interaction. , 2001, Protein engineering.

[20]  Yu Yao,et al.  DeepPPI: Boosting Prediction of Protein-Protein Interactions with Deep Neural Networks , 2017, J. Chem. Inf. Model..

[21]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[22]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[23]  Long Zhang,et al.  Protein-Protein Interactions Prediction Using a Novel Local Conjoint Triad Descriptor of Amino Acid Sequences , 2017, International journal of molecular sciences.

[24]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[25]  David R. Kelley,et al.  Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks , 2015, bioRxiv.

[26]  Kyungsook Han,et al.  Sequence-based prediction of protein-protein interactions by means of rotation forest and autocorrelation descriptor. , 2010, Protein and peptide letters.

[27]  Dmitrij Frishman,et al.  Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein structure analysis , 2013, Nucleic Acids Res..

[28]  S. Wold,et al.  DNA and peptide sequences and chemical processes multivariately modelled by principal component analysis and partial least-squares projections to latent structures , 1993 .

[29]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[30]  Limsoon Wong,et al.  Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions , 2006, BioDM.

[31]  A. Valencia,et al.  In silico two‐hybrid system for the selection of physically interacting protein pairs , 2002, Proteins.

[32]  Xiaobo Zhou,et al.  A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network , 2010, BMC Bioinformatics.

[33]  B. Honig,et al.  Structure-based prediction of protein-protein interactions on a genome-wide scale , 2012, Nature.

[34]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[35]  Zhen Ji,et al.  Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set , 2014, BMC Bioinformatics.

[36]  The UniProt Consortium,et al.  Update on activities at the Universal Protein Resource (UniProt) in 2013 , 2012, Nucleic Acids Res..

[37]  G. Rose,et al.  Hydrophobicity of amino acid residues in globular proteins. , 1985, Science.

[38]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[39]  Cai Lu,et al.  Predicting protein-protein interaction based on protein secondary structure information using Bayesian classifier , 2010 .

[40]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[41]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[42]  Pichao Wang,et al.  Depth Pooling Based Large-Scale 3-D Action Recognition With Convolutional Neural Networks , 2018, IEEE Transactions on Multimedia.

[43]  Paul A. Bates,et al.  Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis , 2006, BMC Bioinformatics.

[44]  A. Komoriya,et al.  Local interactions as a structure determinant for protein molecules: III. , 1979, Biochimica et biophysica acta.

[45]  Magdalena Foltman,et al.  Studying Protein-Protein Interactions in Budding Yeast Using Co-immunoprecipitation. , 2016, Methods in molecular biology.

[46]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[47]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[48]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[49]  Peter Uetz,et al.  Mapping protein-protein interactions using yeast two-hybrid assays. , 2015, Cold Spring Harbor protocols.

[50]  Xingming Zhao,et al.  Predicting protein–protein interactions from protein sequences using meta predictor , 2010, Amino Acids.

[51]  O. Stegle,et al.  Deep learning for computational biology , 2016, Molecular systems biology.

[52]  Jing Zhang,et al.  Action Recognition From Depth Maps Using Deep Convolutional Neural Networks , 2016, IEEE Transactions on Human-Machine Systems.

[53]  Wu,et al.  Genetic algorithm-base virtual screening of combinative mode for peptide/protein , 2006 .

[54]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[55]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[56]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[57]  Zhu-Hong You,et al.  Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data , 2010, Bioinform..

[58]  Peng Chen,et al.  Predicting protein interaction sites from residue spatial sequence profile and evolution rate , 2006, FEBS Letters.

[59]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[60]  Jie Gui,et al.  Prediction of protein-protein interactions from protein sequence using local descriptors. , 2010, Protein and peptide letters.

[61]  M. Charton,et al.  The structural dependence of amino acid hydrophobicity parameters. , 1982, Journal of theoretical biology.

[62]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[63]  Yun Gao,et al.  Prediction of Protein-Protein Interactions Using Local Description of Amino Acid Sequence , 2011 .

[64]  Wei Wei,et al.  Comparing genetic pathways variation of immunoinhibitory receptor LAIR-1 in murine vs human internal organs , 2014, BMC Bioinformatics.

[65]  P. Y. Chou,et al.  Prediction of protein conformation. , 1974, Biochemistry.

[66]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[67]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.