Network approach integrates 3D structural and sequence data to improve protein classification

Motivation: Early approaches for protein (structural) classification were sequence-based. Since amino acids that are distant in the sequence can be close in the 3-dimensional (3D) structure, 3D contact approaches can complement sequence approaches. Traditional 3D contact approaches study 3D structures directly. Instead, 3D structures can first be modeled as protein structure networks (PSNs). Then, network approaches can be used to classify the PSNs. Network approaches may improve upon traditional 3D contact approaches. We cannot use existing PSN approaches to test this, because: 1) They rely on naive measures of network topology that cannot capture the complexity of PSNs. 2) They are not robust to PSN size. They cannot integrate 3) multiple PSN measures or 4) PSN data with sequence data, although this could help because the different data types capture complementary biological knowledge. Results: We address these limitations by: 1) exploiting well-established graphlet measures via a new network approach for protein classification, 2) introducing novel normalized graphlet measures to remove the bias of PSN size, 3) allowing for integrating multiple PSN measures, and 4) using ordered graphlets to combine the complementary ideas of PSN data and sequence data. We classify both synthetic networks and real-world PSNs more accurately and faster than existing network, 3D contact, or sequence approaches. Our approach finds PSN patterns that may be biochemically interesting.

[1]  David A. Lee,et al.  Predicting protein function from sequence and structure , 2007, Nature Reviews Molecular Cell Biology.

[2]  Ryan W. Solava,et al.  Revealing Missing Parts of the Interactome via Link Prediction , 2014, PloS one.

[3]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[4]  Natasa Przulj,et al.  GR-Align: fast and flexible alignment of protein 3D structures using graphlet degree similarity , 2014, Bioinform..

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  Jun Gao,et al.  Conserved network properties of helical membrane protein structures and its implication for improving membrane protein homology modeling at the twilight zone , 2009, J. Comput. Aided Mol. Des..

[7]  Zoran Levnajic,et al.  Revealing the Hidden Language of Complex Networks , 2014, Scientific Reports.

[8]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[9]  Vladimir Vacic,et al.  Graphlet Kernels for Prediction of Functional Residues in Protein Structures , 2010, J. Comput. Biol..

[10]  Evgeny B. Krissinel,et al.  On the relationship between sequence and structure similarities in proteomics , 2007, Bioinform..

[11]  A. Bonato,et al.  Dominating Biological Networks , 2011, PloS one.

[12]  Tijana Milenkovic,et al.  An integrative approach to modeling biological networks , 2010, J. Integr. Bioinform..

[13]  F. Morcos,et al.  Genomics-aided structure prediction , 2012, Proceedings of the National Academy of Sciences.

[14]  Tijana Milenkovic,et al.  Graphlet-based edge clustering reveals pathogen-interacting proteins , 2012, Bioinform..

[15]  Patricia L. Clark,et al.  Rare Codons Cluster , 2008, PloS one.

[16]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[17]  Liisa Holm,et al.  Dali server: conservation mapping in 3D , 2010, Nucleic Acids Res..

[18]  Lukasz A. Kurgan,et al.  Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources , 2010, Bioinform..

[19]  Zhijun Li,et al.  Network pattern of residue packing in helical membrane proteins and its application in membrane protein structure prediction. , 2008, Protein engineering, design & selection : PEDS.

[20]  Charu C. Aggarwal,et al.  Data Mining: The Textbook , 2015 .

[21]  Michael Lappe,et al.  Optimized Null Model for Protein Structure Networks , 2009, PloS one.

[22]  Tijana Milenkovic,et al.  Dynamic networks reveal key players in aging , 2013, BCB.

[23]  Irina Artsimovitch,et al.  An α Helix to β Barrel Domain Switch Transforms the Transcription Factor RfaH into a Translation Factor , 2012, Cell.

[24]  K. Gothandam,et al.  Residue centrality in alpha helical polytopic transmembrane protein structures. , 2012, Journal of theoretical biology.

[25]  Tim J. P. Hubbard,et al.  SCOP: a Structural Classification of Proteins database , 2000, Nucleic Acids Res..

[26]  Tijana Milenkovic,et al.  GraphCrunch: A tool for large network analyses , 2008, BMC Bioinformatics.

[27]  Nikola Kasabov,et al.  Springer Handbook of Bio-/Neuro-Informatics , 2013 .

[28]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[29]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[30]  I. A. Emerson,et al.  Network analysis of transmembrane protein structures , 2012 .

[31]  Tijana Milenkovic,et al.  Proper evaluation of alignment-free network comparison methods , 2015, Bioinform..

[32]  David A. Lee,et al.  CATH: comprehensive structural and functional annotations for genome sequences , 2014, Nucleic Acids Res..

[33]  Aleksandar Stevanovic,et al.  GraphCrunch 2: Software tool for network modeling, alignment and clustering , 2011, BMC Bioinformatics.

[34]  Tijana Milenkoviæ,et al.  Uncovering Biological Network Function via Graphlet Degree Signatures , 2008, Cancer informatics.

[35]  Tijana Milenkovic,et al.  Exploring the structure and function of temporal networks with dynamic graphlets , 2015, Bioinform..

[36]  Brian F. Volkman,et al.  Interconversion between two unrelated protein folds in the lymphotactin native state , 2008, Proceedings of the National Academy of Sciences.

[37]  James E. Bray,et al.  The CATH Database provides insights into protein structure/function relationships , 1999, Nucleic Acids Res..

[38]  Tijana Milenkovic,et al.  Exploring the structure and function of temporal networks with dynamic graphlets , 2014, Bioinform..

[39]  R. Kolodny,et al.  Sequence-similar, structure-dissimilar protein pairs in the PDB , 2007, Proteins.

[40]  Vagmita Pabuwal,et al.  Comparative analysis of the packing topology of structurally important residues in helical membrane and soluble proteins. , 2009, Protein engineering, design & selection : PEDS.

[41]  Nataša Pržulj,et al.  An integrative approach to modeling biological networks , 2009, J. Integr. Bioinform..