Local Pre-processing for Node Classification in Networks - Application in Protein-Protein Interaction

Network modelling provides an increasingly popular conceptualisation in a wide range of domains, including the analysis of protein structure. Typical approaches to analysis model parameter values at nodes within the network. The spherical locality around a node provides a microenvironment that can be used to characterise an area of a network rather than a particular point within it. Microenvironments that centre on the nodes in a protein chain can be used to quantify parameters that are related to protein functionality. They also permit particular patterns of such parameters in node-centred microenvironments to be used to locate sites of particular interest. This paper evaluates an approach to index generation that seeks to rapidly construct microenvironment data. The results show that index generation performs best when the radius of microenvironments matches the granularity of the index. Results are presented to show that such microenvironments improve the utility of protein chain parameters in classifying the structural characteristics of nodes using both support vector machines and neural networks.

[1]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Wenchao Jiang,et al.  Identifying protein–protein interaction sites in transient complexes with temperature factor, sequence profile and accessible surface area , 2009, Amino Acids.

[4]  A. Valencia,et al.  Prediction of protein--protein interaction sites in heterocomplexes with neural networks. , 2002, European journal of biochemistry.

[5]  Joseph S. B. Mitchell,et al.  Efficient Collision Detection Using Bounding Volume Hierarchies of k-DOPs , 1998, IEEE Trans. Vis. Comput. Graph..

[6]  Arun Venkataramani,et al.  Sandpiper: Black-box and gray-box resource management for virtual machines , 2009, Comput. Networks.

[7]  E. Levina,et al.  Community extraction for social networks , 2010, Proceedings of the National Academy of Sciences.

[8]  Alejandro F. Frangi,et al.  Prediction of Cerebral Aneurysm Rupture Using Hemodynamic, Morphologic and Clinical Features: A Data Mining Approach , 2011, DEXA.

[9]  Rudolf Ahlswede,et al.  Network information flow , 2000, IEEE Trans. Inf. Theory.

[10]  Alfonso Valencia,et al.  Progress and challenges in predicting protein-protein interaction sites , 2008, Briefings Bioinform..

[11]  Tianyun Liu,et al.  Identification of recurring protein structure microenvironments and discovery of novel functional sites around CYS residues , 2010, BMC Structural Biology.

[12]  Sam Ansari,et al.  Statistical analysis of predominantly transient protein–protein interfaces , 2005, Proteins.

[13]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.

[14]  Jie Gui,et al.  Prediction of protein-protein interactions from protein sequence using local descriptors. , 2010, Protein and peptide letters.

[15]  Richard M. Jackson,et al.  Predicting protein interaction sites: binding hot-spots in protein-protein and protein-ligand interfaces , 2006, Bioinform..

[16]  Saraswathi Vishveshwara,et al.  PROTEIN STRUCTURE: INSIGHTS FROM GRAPH THEORY , 2002 .

[17]  Guoqiang Peter Zhang,et al.  Neural networks for classification: a survey , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[18]  Zhonghua Sun,et al.  Prediction of protein binding sites using physical and chemical descriptors and the support vector machine regression method , 2010 .

[19]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[20]  Charu C. Aggarwal,et al.  Social Network Data Analytics , 2011 .

[21]  Jon Louis Bentley,et al.  The Complexity of Finding Fixed-Radius Near Neighbors , 1977, Inf. Process. Lett..

[22]  Alan Kirman,et al.  The economy as an evolving network , 1997 .

[23]  István A. Kovács,et al.  Network-Based Tools for the Identification of Novel Drug Targets , 2011, Science Signaling.

[24]  Hiroki Shirai,et al.  Use of Amino Acid Composition to Predict Ligand-Binding Sites , 2007, J. Chem. Inf. Model..

[25]  P. Csermely Creative elements: network-based predictions of active centres in proteins and cellular and social networks. , 2008, Trends in biochemical sciences.

[26]  R. Altman,et al.  Characterizing the microenvironment surrounding protein sites , 1995, Protein science : a publication of the Protein Society.

[27]  C. Levinthal Molecular model-building by computer. , 1966, Scientific American.

[28]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[29]  Zheng Yuan,et al.  Prediction of protein B‐factor profiles , 2005, Proteins.

[30]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[31]  Xing-Ming Zhao,et al.  APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility , 2010, BMC Bioinformatics.

[32]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[33]  S. Kauffman Metabolic stability and epigenesis in randomly constructed genetic nets. , 1969, Journal of theoretical biology.

[34]  Gil Amitai,et al.  Network analysis of protein structures identifies functional residues. , 2004, Journal of molecular biology.

[35]  John N. Wilson,et al.  Using microenvironments to identify allosteric binding sites , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine.