HAPPI: an online database of comprehensive human annotated and predicted protein interactions

BackgroundHuman protein-protein interaction (PPIs) data are the foundation for understanding molecular signalling networks and the functional roles of biomolecules. Several human PPI databases have become available; however, comparisons of these datasets have suggested limited data coverage and poor data quality. Ongoing collection and integration of human PPIs from different sources, both experimentally and computationally, can enable disease-specific network biology modelling in translational bioinformatics studies.ResultsWe developed a new web-based resource, the Human Annotated and Predicted Protein Interaction (HAPPI) database, located at http://bio.informatics.iupui.edu/HAPPI/. The HAPPI database was created by extracting and integrating publicly available protein interaction databases, including HPRD, BIND, MINT, STRING, and OPHID, using database integration techniques. We designed a unified entity-relationship data model to resolve semantic level differences of diverse concepts involved in PPI data integration. We applied a unified scoring model to give each PPI a measure of its reliability that can place each PPI at one of the five star rank levels from 1 to 5. We assessed the quality of PPIs contained in the new HAPPI database, using evolutionary conserved co-expression pairs called "MetaGene" pairs to measure the extent of MetaGene pair and PPI pair overlaps. While the overall quality of the HAPPI database across all star ranks is comparable to the overall qualities of HPRD or IntNetDB, the subset of the HAPPI database with star ranks between 3 and 5 has a much higher average quality than all other human PPI databases. As of summer 2008, the database contains 142,956 non-redundant, medium to high-confidence level human protein interaction pairs among 10,592 human proteins. The HAPPI database web application also provides …” should be “The HAPPI database web application also provides hyperlinked information of genes, pathways, protein domains, protein structure displays, and sequence feature maps for interactive exploration of PPI data in the database.ConclusionHAPPI is by far the most comprehensive public compilation of human protein interaction information. It enables its users to fully explore PPI data with quality measures and annotated information necessary for emerging network biology studies.

[1]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[2]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[3]  C. Yip,et al.  Quaternary structure of the insulin-insulin receptor complex. , 1999, Science.

[4]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information , 2021, Nucleic Acids Res..

[5]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[6]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[7]  A. Grigoriev A relationship between gene expression and protein interactions on the proteome scale: analysis of the bacteriophage T7 and the yeast Saccharomyces cerevisiae. , 2001, Nucleic acids research.

[8]  Roger E Bumgarner,et al.  Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. , 2001, Science.

[9]  Y. Hayashizaki,et al.  Protein-protein interaction panel using mouse full-length cDNAs. , 2001, Genome research.

[10]  M. Gerstein,et al.  Relating whole-genome expression data with protein-protein interactions. , 2002, Genome research.

[11]  H. Kitano Systems Biology: A Brief Overview , 2002, Science.

[12]  C. Steinbeck,et al.  The Chemistry Development Kit (CDK): An Open‐Source Java Library for Chemo‐ and Bioinformatics. , 2003 .

[13]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[14]  Jake Yue Chen,et al.  Initial large-scale exploration of protein-protein interactions in human brain , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[15]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[16]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[17]  Naama Barkai,et al.  Computational verification of protein-protein interactions by orthologous co-expression , 2005, BMC Bioinformatics.

[18]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[19]  Philip E. Bourne,et al.  The distribution and query systems of the RCSB Protein Data Bank , 2004, Nucleic Acids Res..

[20]  Natalie Wilson Human Protein Reference Database , 2004, Nature Reviews Genetics.

[21]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[22]  Natalie Wilson,et al.  Human Protein Reference Database , 2004, Nature Reviews Molecular Cell Biology.

[23]  S. L. Wong,et al.  A Map of the Interactome Network of the Metazoan C. elegans , 2004, Science.

[24]  Razvan C. Bunescu,et al.  Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome , 2005, Genome Biology.

[25]  Byungkyu Brian Park,et al.  HPID: The Human Protein Interaction Database , 2004, Bioinform..

[26]  Thomas Lengauer,et al.  Confirmation of human protein interaction data by human expression data , 2005, BMC Bioinformatics.

[27]  Francis D. Gibbons,et al.  Predicting protein complex membership using probabilistic network reliability. , 2004, Genome research.

[28]  Ian M. Donaldson,et al.  The Biomolecular Interaction Network Database and related tools 2005 update , 2004, Nucleic Acids Res..

[29]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2005, Nucleic Acids Res..

[30]  E. Zerhouni US biomedical research: basic, translational, and clinical sciences. , 2005, JAMA.

[31]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[32]  Igor Jurisica,et al.  Online Predicted Human Interaction Database , 2005, Bioinform..

[33]  Hui Lu,et al.  Correlation between gene expression profiles and protein-protein interactions within and across genomes , 2005, Bioinform..

[34]  F. Giorgini,et al.  Connecting the dots in Huntington's disease with protein interaction networks , 2005, Genome Biology.

[35]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[36]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[37]  K. S. Deshpande,et al.  Human protein reference database—2006 update , 2005, Nucleic Acids Res..

[38]  Dong Dong,et al.  IntNetDB v1.0: an integrated protein-protein interaction network database generated by a probabilistic model , 2006, BMC Bioinformatics.

[39]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[40]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt): an expanding universe of protein information , 2005, Nucleic Acids Res..

[41]  Sue Povey,et al.  The HUGO Gene Nomenclature Database, 2006 updates , 2005, Nucleic Acids Res..

[42]  Andreas Prlic,et al.  Ensembl 2006 , 2005, Nucleic Acids Res..

[43]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[44]  Mu Wang,et al.  An integrated computational proteomics method to extract protein targets for Fanconi Anemia studies , 2006, SAC.

[45]  Erich E. Wanker,et al.  Comparison of Human Protein-Protein Interaction Maps , 2007, German Conference on Bioinformatics.

[46]  Changyu Shen,et al.  Mining Alzheimer Disease Relevant Proteins from Integrated Protein Interactome Data , 2005, Pacific Symposium on Biocomputing.

[47]  Christian von Mering,et al.  STRING 7—recent developments in the integration and prediction of protein interactions , 2006, Nucleic Acids Res..

[48]  Gabriele Ausiello,et al.  MINT: the Molecular INTeraction database , 2006, Nucleic Acids Res..

[49]  M. Moran,et al.  Large-scale mapping of human protein–protein interactions by mass spectrometry , 2007, Molecular systems biology.

[50]  K. Gunsalus,et al.  Network modeling links breast cancer susceptibility and centrosome dysfunction. , 2007, Nature genetics.

[51]  Jake Yue Chen,et al.  A Systems Biology Approach to the Study of cisplatin Drug Resistance in Ovarian cancers , 2007, J. Bioinform. Comput. Biol..

[52]  Erich E. Wanker,et al.  UniHI: an entry gate to the human protein interactome , 2006, Nucleic Acids Res..

[53]  Y. Zhang,et al.  IntAct—open source resource for molecular interaction data , 2006, Nucleic Acids Res..

[54]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2008 update , 2008, Nucleic Acids Res..