Protein Information and Knowledge Extractor: Discovering biological information from proteomics data

One of the main goals in proteomics is to solve biological and molecular questions regarding a set of identified proteins. In order to achieve this goal, one has to extract and collect the existing biological data from public repositories for every protein and afterward, analyze and organize the collected data. Due to the complexity of this task and the huge amount of data available, it is not possible to gather this information by hand, making it necessary to find automatic methods of data collection. Within a proteomic context, we have developed Protein Information and Knowledge Extractor (PIKE) which solves this problem by automatically accessing several public information systems and databases across the Internet. PIKE bioinformatics tool starts with a set of identified proteins, listed as the most common protein databases accession codes, and retrieves all relevant and updated information from the most relevant databases. Once the search is complete, PIKE summarizes the information for every single protein using several file formats that share and exchange the information with other software tools. It is our opinion that PIKE represents a great step forward for information procurement and drastically reduces manual database validation for large proteomic studies. It is available at http://proteo.cnb.csic.es/pike.

[1]  Lennart Martens,et al.  PRIDE: a public repository of protein and peptide identifications for the proteomics community , 2005, Nucleic Acids Res..

[2]  Joaquín Dopazo,et al.  BABELOMICS: a suite of web tools for functional annotation and analysis of groups of genes in high-throughput experiments , 2005, Nucleic Acids Res..

[3]  May D. Wang,et al.  GoMiner: a resource for biological interpretation of genomic and proteomic data , 2003, Genome Biology.

[4]  Bing Zhang,et al.  GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies , 2004, BMC Bioinformatics.

[5]  David Martin,et al.  GOToolBox: functional analysis of gene datasets based on Gene Ontology , 2004, Genome Biology.

[6]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[7]  In silico analysis of protein neoplastic biomarkers for cervix and uterine cancer , 2008, Clinical & translational oncology : official publication of the Federation of Spanish Oncology Societies and of the National Cancer Institute of Mexico.

[8]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[9]  William C Reinhold,et al.  MatchMiner: a tool for batch navigation among gene and gene product identifiers , 2003, Genome Biology.

[10]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[11]  Joaquín Dopazo,et al.  FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes , 2004, Bioinform..

[12]  Lennart Martens,et al.  The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases , 2007, BMC Bioinformatics.

[13]  Bing Zhang,et al.  WebGestalt: an integrated system for exploring gene sets in various biological contexts , 2005, Nucleic Acids Res..

[14]  Lennart Martens,et al.  PRIDE: The proteomics identifications database , 2005, Proteomics.

[15]  Gilbert S Omenn,et al.  The Human Proteome Organization Plasma Proteome Project pilot phase: Reference specimens, technology platform comparisons, and standardized data submissions and analyses , 2004, Proteomics.

[16]  M. Mann,et al.  Comparative Proteomic Phenotyping of Cell Lines and Primary Cells to Assess Preservation of Cell Type-specific Functions , 2009, Molecular & Cellular Proteomics.

[17]  Scott McMillan,et al.  The Bioinformatics Links Directory: a Compilation of Molecular Biology Web Servers , 2005, Nucleic Acids Res..

[18]  Rainer Malik,et al.  From proteome lists to biological impact– tools and strategies for the analysis of large MS data sets , 2010, Proteomics.

[19]  Hans-Werner Mewes,et al.  CRONOS: the cross-reference navigation server , 2009, Bioinform..

[20]  J. González-Ros,et al.  Proteomic analysis of apical microvillous membranes of syncytiotrophoblast cells reveals a high degree of similarity with lipid rafts. , 2005, Journal of proteome research.

[21]  Robert Gentleman,et al.  Using GOstats to test gene lists for GO term association , 2007, Bioinform..

[22]  E. Birney,et al.  The International Protein Index: An integrated database for proteomics experiments , 2004, Proteomics.

[23]  Ruedi Aebersold,et al.  The protein information and property explorer: an easy-to-use, rich-client web application for the management and functional analysis of proteomic data , 2008, Bioinform..

[24]  J. Carazo,et al.  GENECODIS: a web-based tool for finding significant concurrent annotations in gene lists , 2007, Genome Biology.

[25]  P. Khatri,et al.  Profiling gene expression using onto-express. , 2002, Genomics.

[26]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.