EcID. A database for the inference of functional interactions in E. coli

The EcID database (Escherichia coli Interaction Database) provides a framework for the integration of information on functional interactions extracted from the following sources: EcoCyc (metabolic pathways, protein complexes and regulatory information), KEGG (metabolic pathways), MINT and IntAct (protein interactions). It also includes information on protein complexes from the two E. coli high-throughput pull-down experiments and potential interactions extracted from the literature using the web services associated to the iHOP text-mining system. Additionally, EcID incorporates results of various prediction methods, including two protein interaction prediction methods based on genomic information (Phylogenetic Profiles and Gene Neighbourhoods) and three methods based on the analysis of co-evolution (Mirror Tree, In Silico 2 Hybrid and Context Mirror). EcID associates to each prediction a specifically developed confidence score. The two main features that make EcID different from other systems are the combination of co-evolution-based predictions with the experimental data, and the introduction of E. coli-specific information, such as gene regulation information from EcoCyc. The possibilities offered by the combination of the EcID database information are illustrated with a prediction of potential functions for a group of poorly characterized genes related to yeaG. EcID is available online at http://ecid.bioinfo.cnio.es.

[1]  F. Cohen,et al.  Co-evolution of proteins with their interaction partners. , 2000, Journal of molecular biology.

[2]  A. Valencia,et al.  In silico two‐hybrid system for the selection of physically interacting protein pairs , 2002, Proteins.

[3]  Ingmar Reuter,et al.  Integr8 and Genome Reviews: integrated views of complete genomes and proteomes , 2004, Nucleic Acids Res..

[4]  B. Snel,et al.  Conservation of gene order: a fingerprint of proteins that physically interact. , 1998, Trends in biochemical sciences.

[5]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[6]  T. Gaasterland,et al.  Microbial genescapes: phyletic and functional patterns of ORF distribution among prokaryotes. , 1998, Microbial & comparative genomics.

[7]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[8]  T. Silhavy,et al.  Escherichia coli Starvation Diets: Essential Nutrients Weigh in Distinctly , 2005, Journal of bacteriology.

[9]  Matteo Pellegrini,et al.  Prolinks: a database of protein functional linkages derived from coevolution , 2004, Genome Biology.

[10]  Y. Zhang,et al.  IntAct—open source resource for molecular interaction data , 2006, Nucleic Acids Res..

[11]  Gabriele Ausiello,et al.  MINT: the Molecular INTeraction database , 2006, Nucleic Acids Res..

[12]  Mark D'Souza,et al.  Use of contiguity on the chromosome to predict functional coupling , 1998, Silico Biol..

[13]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[14]  Anton J. Enright,et al.  Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.

[15]  A. Valencia,et al.  Similarity of phylogenetic trees as indicator of protein-protein interaction. , 2001, Protein engineering.

[16]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[17]  S. Kanaya,et al.  Large-scale identification of protein-protein interaction of Escherichia coli K-12. , 2006, Genome research.

[18]  C. Sander,et al.  The HUPO PSI's Molecular Interaction format—a community standard for the representation of protein interaction data , 2004, Nature Biotechnology.

[19]  M. Ibañez-Ruiz,et al.  Identification of RpoS (ςS)-Regulated Genes inSalmonella enterica Serovar Typhimurium , 2000, Journal of bacteriology.

[20]  A. Valencia,et al.  Conserved Clusters of Functionally Related Genes in Two Bacterial Genomes , 1997, Journal of Molecular Evolution.

[21]  P Guerdoux-Jamet,et al.  Indigo: a World-Wide-Web review of genomes and gene functions. , 1998, FEMS microbiology reviews.

[22]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[23]  Christian von Mering,et al.  STRING 7—recent developments in the integration and prediction of protein interactions , 2006, Nucleic Acids Res..

[24]  Alfonso Valencia,et al.  iHOP web services , 2007, Nucleic Acids Res..

[25]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Rodrigo Lopez,et al.  Multiple sequence alignment with the Clustal series of programs , 2003, Nucleic Acids Res..

[27]  A. Valencia,et al.  A gene network for navigating the literature , 2004, Nature Genetics.

[28]  A. Valencia,et al.  High-confidence prediction of global interactomes based on genome-wide coevolutionary networks , 2008, Proceedings of the National Academy of Sciences.

[29]  Peter D. Karp,et al.  EcoCyc: a comprehensive database resource for Escherichia coli , 2004, Nucleic Acids Res..

[30]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[31]  Chong Su,et al.  Bacteriome.org—an integrated protein interaction database for E. coli , 2007, Nucleic Acids Res..

[32]  A. Emili,et al.  Interaction network containing conserved and essential protein complexes in Escherichia coli , 2005, Nature.