A novel entropy-based mapping method for determining the protein-protein interactions in viral genomes by using coevolution analysis

Abstract Protein-protein interactions have a vital role in DNA transcription, immune system, and signal transmission between cells. Determining the interactions between proteins can give information about the functional structure of a cell and the functions of target organisms. Protein-protein interactions are determined by experimental approaches, yet, there is still a huge gap in specifying all possible protein interactions in an organism. Furthermore, since these approaches use cloning, labeling, and affinity mass spectrometry, the analysis process is time-consuming and expensive. However, analyzing the protein interactions with computational approaches based on coevolution theory eliminate these kinds of limitations, since in the coevolution theory model, interacting proteins show coevolutionary mutations and form similar phylogenetic trees. Current coevolution methods are based on the multiple-sequence alignment process; yet many high false positive interactions arise with these methods. Therefore, it is important to perform computational-based coevolution analysis. Protein-protein interaction using coevolution analysis has been employed in conjunction with experimental approaches to explore new protein interactions. However, in order to predict protein interactions with computational-based coevolution analysis, protein sequences need to be mapped. There are various types of protein mapping methods belonging to certain categories in the literature. These methods are frequently used in studies of predicting protein interactions. In this study, as an alternative to these methods, we proposed a novel entropy-based protein mapping method and predicted protein-protein interactions in viral genomes by using coevolution analysis. The study consists of 5 stages. In the first stage, the protein sequences of viral genomes were mapped using both the proposed numerical mapping method and state-of-arts protein mapping methods. In the second stage, Fourier transform was applied to each mapped protein sequences. In the third stage, the distance matrix was generated by finding the distances between the proteins belonging to the same virus genome. In the fourth stage, Pearson correlation values between the distances were calculated and coevolution analysis was performed. In the last stage, the proposed mapping method was compared with state-of-arts protein mapping methods and MirrorTree approach. Coevolution analysis was performed on two different virus genomes; Ebola virus and Influenza A virus. With the proposed method, a high degree of correlation has been obtained between proteins of the Ebola virus. For Ebola virus, the lowest correlation result (0.75) was obtained between the NP-VP35 protein pair. The highest correlation (0.99) was observed between the NP-VP24 and NP-VP40 protein pairs. For Influenza A, the lowest correlation (0.09) was obtained between the M1-PA(X) protein pair with the proposed method. The highest correlation value (0.98) with the proposed method was calculated between the M1-M2 protein pair. The proposed method verified the interactions between protein pairs, which have been experimentally proven, with a high degree correlation value. These results indicated that the proposed method can be effective in predicting protein interactions.

[1]  Dimitris Anastassiou,et al.  Genomic signal processing , 2001, IEEE Signal Process. Mag..

[2]  J. Evans Straightforward Statistics for the Behavioral Sciences , 1995 .

[3]  Tien Pham,et al.  Using Shannon Entropy as EEG Signal Feature for Fast Person Identification , 2014, ESANN.

[4]  M. Ueffing,et al.  Tandem affinity purification of protein complexes from mammalian cells by the Strep/FLAG (SF)-TAP tag. , 2009, Methods in molecular biology.

[5]  A. Karci New Approach for Fractional Order Derivatives: Fundamentals and Analytic Properties , 2016 .

[6]  Tenreiro Machado,et al.  Shannon Entropy Analysis of the Genome Code , 2012 .

[7]  D. Eisenberg,et al.  The hydrophobic moment detects periodicity in protein hydrophobicity. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Changchuan Yin,et al.  A Fourier Characteristic of Coding Sequences: Origins and a Non-Fourier Approximation , 2005, J. Comput. Biol..

[9]  Ralf Bartenschlager,et al.  The Interactomes of Influenza Virus NS1 and NS2 Proteins Identify New Host Factors and Provide Insights for ADAR1 Playing a Supportive Role in Virus Replication , 2013, PLoS pathogens.

[10]  Yuh-Jyh Hu,et al.  Protein-protein interaction prediction using a hybrid feature representation and a stacked generalization scheme , 2019, BMC Bioinformatics.

[11]  S. Salzberg,et al.  Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution , 2005, Nature.

[12]  Ziying Han,et al.  Contribution of Ebola Virus Glycoprotein, Nucleoprotein, and VP24 to Budding of VP40 Virus-Like Particles , 2004, Journal of Virology.

[13]  Changchuan Yin,et al.  A new method to cluster DNA sequences using Fourier power spectrum , 2015, Journal of Theoretical Biology.

[14]  Marta Borowska,et al.  Entropy-Based Algorithms in the Analysis of Biomedical Signals , 2015 .

[15]  David Thomas,et al.  DNA entropy reveals a significant difference in complexity between housekeeping and tissue specific gene promoters , 2015, Comput. Biol. Chem..

[16]  Wenbing Hou,et al.  A new method to analyze protein sequence similarity using Dynamic Time Warping , 2016, Genomics.

[17]  R. Lamb,et al.  The M1 and M2 proteins of influenza A virus are important determinants in filamentous particle formation. , 1998, Virology.

[18]  Changchuan Yin,et al.  An improved model for whole genome phylogenetic analysis by Fourier transform. , 2015, Journal of theoretical biology.

[19]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[20]  Changchuan Yin,et al.  A coevolution analysis for identifying protein-protein interactions by Fourier transform , 2017, PloS one.

[21]  Ricardo del Corazón Grau-Ábalo,et al.  Detection of Coding Regions in Large DNA Sequences Using the Short Time Fourier Transform with Reduced Computational Load , 2006, CIARP.

[22]  P. Gómez-Puertas,et al.  Influenza A Virus NEP (NS2 Protein) Downregulates RNA Synthesis of Model Template RNAs , 2001, Journal of Virology.

[23]  Ali Karci,et al.  Fractional order entropy: New perspectives , 2016 .

[24]  N. Hacohen,et al.  A Physical and Regulatory Map of Host-Influenza Interactions Reveals Pathways in H1N1 Infection , 2009, Cell.

[25]  Ruqian Lu,et al.  Amino Acid Encoding Methods for Protein Sequences: A Comprehensive Review and Assessment , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[26]  Lingqing Wang,et al.  Predicting Protein-Protein Interactions from Matrix-Based Protein Sequence Using Convolution Neural Network and Feature-Selective Rotation Forest , 2019, Scientific Reports.

[27]  Zejun Li,et al.  Identification of Amino Acids in HA and PB2 Critical for the Transmission of H5N1 Avian Influenza Viruses in a Mammalian Host , 2009, PLoS pathogens.

[28]  SIGNET: A Neural Network Architecture for Predicting Protein-Protein Interactions , 2017 .

[29]  Yuan Ping,et al.  Integrated entropy-based approach for analyzing exons and introns in DNA sequences , 2019, BMC Bioinformatics.

[30]  Jacob Benesty,et al.  Pearson Correlation Coefficient , 2009 .

[31]  Steven J. Gamblin,et al.  Influenza Hemagglutinin and Neuraminidase Membrane Glycoproteins , 2010, The Journal of Biological Chemistry.

[32]  Amir Niknejad,et al.  DNA sequence representation without degeneracy. , 2003, Nucleic acids research.

[33]  Chen Lin,et al.  LibD3C: Ensemble classifiers with a clustering and dynamic selection strategy , 2014, Neurocomputing.

[34]  Gerhard Nahler,et al.  Pearson Correlation Coefficient , 2020, Definitions.

[35]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[36]  W. Atchley,et al.  Solving the protein sequence metric problem. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Yi Pan,et al.  Knowledge Discovery in Bioinformatics: Techniques, Methods, and Applications , 2007 .

[38]  Talha Burak Alakus,et al.  Detection of pre-epileptic seizure by using wavelet packet decomposition and artifical neural networks , 2017, 2017 10th International Conference on Electrical and Electronics Engineering (ELECO).

[39]  Huijie Yang,et al.  In search of coding and non-coding regions of DNA sequences based on balanced estimation of diffusion entropy , 2016, Journal of biological physics.

[40]  R. Lamb,et al.  The Influenza Virus M2 Protein Cytoplasmic Tail Interacts with the M1 Protein and Influences Virus Assembly at the Site of Virus Budding , 2008, Journal of Virology.

[41]  K. Chou,et al.  Glioma stages prediction based on machine learning algorithm combined with protein-protein interaction networks. , 2019, Genomics.

[42]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[43]  Talha Burak Alakus,et al.  Prediction of Protein-Protein Interactions with LSTM Deep Learning Model , 2019, 2019 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT).

[44]  M. Fares,et al.  Reducing the false positive rate in the non-parametric analysis of molecular coevolution , 2008, BMC Evolutionary Biology.

[45]  Greg W. Clark,et al.  Using coevolution to predict protein-protein interactions. , 2011, Methods in molecular biology.

[46]  David Koslicki,et al.  Topological entropy of DNA sequences , 2011, Bioinform..

[47]  B. Kozarzewski A Method for Nucleotide Sequence Analysis , 2012 .

[48]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[49]  J. D. Arango-Rodriguez,et al.  Machine learning based protein-protein interaction prediction using physical-chemical representations , 2016, 2016 XXI Symposium on Signal Processing, Images and Artificial Vision (STSIVA).

[50]  P. Palese,et al.  The influenza virus NEP (NS2 protein) mediates the nuclear export of viral ribonucleoproteins , 1998, The EMBO journal.

[51]  T. Noda,et al.  Ebola virus (EBOV) VP24 inhibits transcription and replication of the EBOV genome. , 2007, The Journal of infectious diseases.

[52]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[53]  G Vriend,et al.  Correlated Mutation Analyses on Very Large Sequence Families , 2002, Chembiochem : a European journal of chemical biology.

[54]  Luhua Lai,et al.  Sequence-based prediction of protein protein interaction using a deep-learning algorithm , 2017, BMC Bioinformatics.

[55]  J. Mornon,et al.  Hydrophobic cluster analysis: An efficient new way to compare and analyse amino acid sequences , 1987, FEBS letters.

[56]  Philip Sedgwick,et al.  Pearson’s correlation coefficient , 2012, BMJ : British Medical Journal.

[57]  David L. Robertson,et al.  Specificity in protein interactions and its relationship with sequence diversity and coevolution , 2007, Proceedings of the National Academy of Sciences.

[58]  Claude E. Shannon,et al.  A mathematical theory of communication , 1948, MOCO.

[59]  R. Webster,et al.  The Surface Glycoproteins of H5 Influenza Viruses Isolated from Humans, Chickens, and Wild Aquatic Birds Have Distinguishable Properties , 1999, Journal of Virology.

[60]  J R Banavar,et al.  Learning effective amino acid interactions through iterative stochastic techniques , 2000, Proteins.

[61]  G. Air,et al.  Sialic acid is cleaved from glycoconjugates at the cell surface when influenza virus neuraminidases are expressed from recombinant vaccinia viruses. , 1989, Virology.

[62]  P. Digard,et al.  The influenza virus nucleoprotein: a multifunctional RNA-binding protein pivotal to virus replication. , 2002, The Journal of general virology.

[63]  A. Valencia,et al.  Similarity of phylogenetic trees as indicator of protein-protein interaction. , 2001, Protein engineering.