Protein-centric data integration for functional analysis of comparative proteomics data.

High-throughput proteomic, microarray, protein interaction and other experimental methods all generate long lists of proteins and/or genes that have been identified or have varied in accumulation under the experimental conditions studied. These lists can be difficult to sort through for Biologists to make sense of. Here we describe a next step in data analysis--a bottom-up approach at data integration--starting with protein sequence identifications, mapping them to a common representation of the protein and then bringing in a wide variety of structural, functional, genetic, and disease information related to proteins derived from annotated knowledge bases and then using this information to categorize the lists using Gene Ontology (GO) terms and mappings to biological pathway databases. We illustrate with examples how this can aid in identifying important processes from large complex lists.

[1]  Rolf Apweiler,et al.  UniProt archive , 2004, Bioinform..

[2]  Peter B. McGarvey,et al.  An emerging cyberinfrastructure for biodefense pathogen and pathogen–host data , 2007, Nucleic Acids Res..

[3]  Peter B. McGarvey,et al.  UniRef: comprehensive and non-redundant UniProt reference clusters , 2007, Bioinform..

[4]  Fred Heffron,et al.  Analysis of the Salmonella typhimurium Proteome through Environmental Response toward Infectious Conditions* , 2006, Molecular & Cellular Proteomics.

[5]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[6]  Ronald J Moore,et al.  Comparative proteomics of human monkeypox and vaccinia intracellular mature and extracellular enveloped virions. , 2008, Journal of proteome research.

[7]  Zhang-Zhi Hu,et al.  The iProClass integrated database for protein functional analysis , 2004, Comput. Biol. Chem..

[8]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[9]  Kenneth H. Buetow,et al.  PID: the Pathway Interaction Database , 2008, Nucleic Acids Res..

[10]  Hyunjin Yoon,et al.  Proteomics analysis of the causative agent of typhoid fever. , 2008, Journal of proteome research.

[11]  Lincoln Stein,et al.  Reactome knowledgebase of human biological pathways and processes , 2008, Nucleic Acids Res..

[12]  Peter B. McGarvey,et al.  Systems Integration of Biodefense Omics Data for Analysis of Pathogen-Host Interactions and Identification of Potential Targets , 2009, PloS one.