Towards Elucidating the Structural Principles of Host-Pathogen Protein-Protein Interaction Networks: A Bioinformatics Survey

The ultimate goal of systems biology research area is to accurately predict the behavior of biological systems through the construction of computational models, using the related molecular-level data as the input, especially when the structural information of such biological system is available. Combining the three-dimensional (3D) structural information of the cohort of macromolecules underpinning the biological system, the researchers are poised with an unprecedented opportunity to gain a full understanding on how the molecules interact with each other, particularly for an interaction network, e.g. protein-protein interaction networks. Specifically, there are currently a limited number of studies focused on the reconstruction and modelling of the structural interaction networks (SIN) between hosts-pathogens protein-protein interaction networks. In this paper, we will survey the SIN on protein-protein interactions network, in which we focus on the interactions between pathogen and host species (PHPPI). As one of the most important component of inter-species PPI study, in-depth study of PHPPI at atomic-resolution level would reveal novel insights into the underlying principles of the organization and complexity of host-pathogen PPI networks. Several related sub areas are discussed, and the related typical Big Data methods including machine learning methodologies and statistics models will also be discussed. This paper contributes to a new, yet challenging, research area in applying data analytic and machine learning technologies in bioinformatics.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  P. Stenson,et al.  The Human Gene Mutation Database: 2008 update , 2009, Genome Medicine.

[3]  J. Gibrat,et al.  GOR method for predicting protein secondary structure from amino acid sequence. , 1996, Methods in enzymology.

[4]  P. Zielenkiewicz,et al.  Why similar protein sequences encode similar three-dimensional structures? , 2010 .

[5]  Jian Zhou,et al.  Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction , 2014, ICML.

[6]  Andrzej Kloczkowski,et al.  GOR V server for protein secondary structure prediction , 2005, Bioinform..

[7]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners , 2007, PLoS Comput. Biol..

[8]  David Meyre,et al.  From big data analysis to personalized medicine for all: challenges and opportunities , 2015, BMC Medical Genomics.

[9]  Zhiyong Wang,et al.  Protein 8-class secondary structure prediction using Conditional Neural Fields , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[10]  R. Russell,et al.  Structural systems biology: modelling protein interactions , 2006, Nature Reviews Molecular Cell Biology.

[11]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[12]  Yaoqi Zhou,et al.  Achieving 80% ten‐fold cross‐validated accuracy for secondary structure prediction by large‐scale training , 2006, Proteins.

[13]  Jiangning Song,et al.  Towards Data Analytics of Pathogen-Host Protein-Protein Interaction: A Survey , 2016, 2016 IEEE International Congress on Big Data (BigData Congress).

[14]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[15]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[16]  William Stafford Noble,et al.  Machine learning applications in genetics and genomics , 2015, Nature Reviews Genetics.

[17]  Xue-wen Chen,et al.  Sequence-based prediction of protein interaction sites with an integrative method , 2009, Bioinform..

[18]  Robert D. Finn,et al.  iPfam: a database of protein family and domain interactions found in the Protein Data Bank , 2013, Nucleic Acids Res..

[19]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[20]  Sailu Yellaboina,et al.  DOMINE: a comprehensive collection of known and predicted domain-domain interactions , 2010, Nucleic Acids Res..

[21]  Yaohang Li,et al.  Template-based C8-SCORPION: a protein 8-state secondary structure prediction method using structural information and context-based features , 2014, BMC Bioinformatics.

[22]  J. Vizcaíno,et al.  Exploring the potential of public proteomics data , 2015, Proteomics.

[23]  Susan Khor Inferring Domain-Domain Interactions from Protein-Protein Interactions with Formal Concept Analysis , 2014, PloS one.

[24]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[25]  Christodoulos A. Floudas,et al.  Advances in protein structure prediction and de novo protein design : A review , 2006 .

[26]  Stefan C. Kremer,et al.  Protein secondary structure prediction using an evolutionary computation method and clustering , 2015, 2015 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[27]  P Stolorz,et al.  Predicting protein secondary structure using neural net and statistical methods. , 1992, Journal of molecular biology.

[28]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[29]  Jing-Yu Yang,et al.  Protein-protein interaction sites prediction by ensembling SVM and sample-weighted random forests , 2016, Neurocomputing.

[30]  J L Sussman,et al.  Protein Data Bank (PDB): database of three-dimensional structural information of biological macromolecules. , 1998, Acta crystallographica. Section D, Biological crystallography.

[31]  Haiyuan Yu,et al.  Three-dimensional reconstruction of protein networks provides insight into human genetic disease , 2012, Nature Biotechnology.

[32]  Alejandro Panjkovich,et al.  3did Update: domain–domain and peptide-mediated interactions of known 3D structure , 2008, Nucleic Acids Res..

[33]  Rolf Apweiler,et al.  Proteomics and data standardisation , 2004 .

[34]  Xingming Zhao,et al.  Computational Systems Biology , 2013, TheScientificWorldJournal.

[35]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[36]  Zhen Li,et al.  Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks , 2016, IJCAI.

[37]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[38]  Medha Bhagwat,et al.  PSI-BLAST tutorial. , 2007, Methods in molecular biology.

[39]  Yu Xia,et al.  Toward a three-dimensional view of protein networks between species , 2012, Front. Microbio..

[40]  Jianlin Cheng,et al.  A Deep Learning Network Approach to ab initio Protein Secondary Structure Prediction , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[41]  Arnaud Céol,et al.  3did: a catalog of domain-based interactions of known three-dimensional structure , 2013, Nucleic Acids Res..

[42]  B. Rost PHD: predicting one-dimensional protein structure by profile-based neural networks. , 1996, Methods in enzymology.

[43]  Yu Xia,et al.  Structural principles within the human-virus protein-protein interaction network , 2011, Proceedings of the National Academy of Sciences.

[44]  Shashank Shekhar,et al.  Bhageerath-H: A homology/ab initio hybrid server for predicting tertiary structures of monomeric soluble proteins , 2014, BMC Bioinformatics.

[45]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[46]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Yücel Altunbasak,et al.  Protein secondary structure prediction for a single-sequence using hidden semi-Markov models , 2006, BMC Bioinformatics.

[48]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[49]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.