Combining Protein-Protein Interaction (PPI) Network and Sequence Attributes for Predicting Hypertension Related Proteins

Cardiovascular disease is set to become the number one cause of deaths worldwide. It is therefore important to understand the etiologic mechanisms for hypertension, in order to identify new routes to improved treatment. Human hypertension arises from a combination of genetic factors and lifestyle influences. Here we study hypertension related proteins from the perspective of protein-protein interaction (PPI) networks, pathways, Gene Ontology (GO) categories and sequence properties. We find that hypertension related proteins are not generally associated with network hubs and do not exhibit high clustering coefficients. Despite this, they tend to be closer and better connected to other hypertension proteins on the interaction network than we would expect, with 23% directly interacting. We find that molecular function category ‘oxidoreductase’ and biological process categories ‘response to stimulus’ and ‘electron transport’ are overrepresented. We also find that functional similarity does not correlate strongly with PPI distance separating hypertension related protein pairs and known hypertension related proteins are spread across 36 KEGG pathways. Finally, weighted Bagged PART classifiers were used to build predictive models that combined amino acid sequence with PPI network and GO properties.

[1]  T. Hudson,et al.  A genome-wide association study identifies novel risk loci for type 2 diabetes , 2007, Nature.

[2]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[3]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[4]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[5]  P. Bork,et al.  G2D: a tool for mining genes associated with disease , 2005, BMC Genetics.

[6]  David J. Porteous,et al.  Speeding disease gene discovery by sequence based candidate prioritization , 2005, BMC Bioinformatics.

[7]  C. Ouzounis,et al.  Genome-wide identification of genes likely to be involved in human genetic disease. , 2004, Nucleic acids research.

[8]  Changyu Shen,et al.  Mining Alzheimer Disease Relevant Proteins from Integrated Protein Interactome Data , 2005, Pacific Symposium on Biocomputing.

[9]  Jeffrey J. DeStefano,et al.  Selection of primer-template sequences that bind human immunodeficiency virus reverse transcriptase with high affinity , 2006, Nucleic acids research.

[10]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[11]  David J. Porteous,et al.  SUSPECTS : enabling fast and effective prioritization of positional candidates , 2005 .

[12]  Yongjin Li,et al.  Discovering disease-genes by topological features in human protein-protein interaction network , 2006, Bioinform..

[13]  Igor Jurisica,et al.  Online Predicted Human Interaction Database , 2005, Bioinform..

[14]  Jason Y. Liu,et al.  Analysis of protein sequence and interaction data for candidate disease gene prediction , 2006, Nucleic acids research.

[15]  D. Burkitt Chapter 9 – Diseases of Affluence , 1980 .

[16]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[17]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[18]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[19]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[20]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[21]  Alan R. Powell,et al.  Integration of text- and data-mining using ontologies successfully selects disease gene candidates , 2005, Nucleic acids research.

[22]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[23]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[24]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[25]  Ali G. Gharavi,et al.  Molecular Mechanisms of Human Hypertension , 2001, Cell.

[26]  Rachel Leach,et al.  Rethinking the “Diseases of Affluence” Paradigm: Global Patterns of Nutritional Risks in Relation to Economic Development , 2005, PLoS medicine.

[27]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[28]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[29]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2004, Nucleic Acids Res..

[30]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[31]  Paul A. Bates,et al.  Global topological features of cancer proteins in the human interactome , 2006, Bioinform..

[32]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.