mspecLINE: bridging knowledge of human disease with the proteome

BackgroundPublic proteomics databases such as PeptideAtlas contain peptides and proteins identified in mass spectrometry experiments. However, these databases lack information about human disease for researchers studying disease-related proteins. We have developed mspecLINE, a tool that combines knowledge about human disease in MEDLINE with empirical data about the detectable human proteome in PeptideAtlas. mspecLINE associates diseases with proteins by calculating the semantic distance between annotated terms from a controlled biomedical vocabulary. We used an established semantic distance measure that is based on the co-occurrence of disease and protein terms in the MEDLINE bibliographic database.ResultsThe mspecLINE web application allows researchers to explore relationships between human diseases and parts of the proteome that are detectable using a mass spectrometer. Given a disease, the tool will display proteins and peptides from PeptideAtlas that may be associated with the disease. It will also display relevant literature from MEDLINE. Furthermore, mspecLINE allows researchers to select proteotypic peptides for specific protein targets in a mass spectrometry assay.ConclusionsAlthough mspecLINE applies an information retrieval technique to the MEDLINE database, it is distinct from previous MEDLINE query tools in that it combines the knowledge expressed in scientific literature with empirical proteomics data. The tool provides valuable information about candidate protein targets to researchers studying human disease and is freely available on a public web server.

[1]  Ron Edgar,et al.  NCBI Peptidome: a new public repository for mass spectrometry peptide identifications , 2009, Nature Biotechnology.

[2]  Nichole L. King,et al.  Targeted Quantitative Analysis of Streptococcus pyogenes Virulence Factors by Multiple Reaction Monitoring*S , 2008, Molecular & Cellular Proteomics.

[3]  Joel H. Saltz,et al.  caGrid: design and implementation of the core architecture of the cancer biomedical informatics grid , 2006, Bioinform..

[4]  Rob Jelier,et al.  CoPub Mapper: mining MEDLINE based on search term co-publication , 2005, BMC Bioinformatics.

[5]  Henry H. N. Lam,et al.  PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows , 2008, EMBO reports.

[6]  S. Gygi,et al.  Haptoglobin-α subunit as potential serum biomarker in ovarian cancer: Identification and characterization using proteomic profiling and mass spectrometry , 2003 .

[7]  Ning Zhang,et al.  Corra: Computational framework and tools for LC-MS discovery and targeted mass spectrometry-based proteomics , 2008, BMC Bioinformatics.

[8]  Kent A. Spackman,et al.  Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts , 2005, BMC Bioinformatics.

[9]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[10]  Eric W. Deutsch,et al.  The PeptideAtlas project , 2005, Nucleic Acids Res..

[11]  Lennart Martens,et al.  PRIDE: a public repository of protein and peptide identifications for the proteomics community , 2005, Nucleic Acids Res..

[12]  William I. Gasarch,et al.  Book Review: An introduction to Kolmogorov Complexity and its Applications Second Edition, 1997 by Ming Li and Paul Vitanyi (Springer (Graduate Text Series)) , 1997, SIGACT News.

[13]  Erik M. van Mulligen,et al.  Constructing an associative concept space for literature-based discovery , 2004, J. Assoc. Inf. Sci. Technol..

[14]  Nichole L. King,et al.  Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry , 2004, Genome Biology.

[15]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.

[16]  K. Clauser,et al.  Use of mass spectrometry to identify protein biomarkers of disease severity in the synovial fluid and serum of patients with rheumatoid arthritis. , 2004, Arthritis and rheumatism.

[17]  H. Lowe,et al.  Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. , 1994, JAMA.

[18]  Hongfang Liu,et al.  BioThesaurus: a web-based thesaurus of protein and gene names , 2006, Bioinform..

[19]  S. Gygi,et al.  Haptoglobin-alpha subunit as potential serum biomarker in ovarian cancer: identification and characterization using proteomic profiling and mass spectrometry. , 2003, Clinical cancer research : an official journal of the American Association for Cancer Research.

[20]  Eamonn J. Keogh,et al.  Towards parameter-free data mining , 2004, KDD.

[21]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[22]  Daniel B. Martin,et al.  Computational prediction of proteotypic peptides for quantitative proteomics , 2007, Nature Biotechnology.

[23]  Michael W. Berry,et al.  Gene clustering by Latent Semantic Indexing of MEDLINE abstracts , 2005, Bioinform..

[24]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[25]  Ronald de Wolf,et al.  Algorithmic Clustering of Music Based on String Compression , 2004, Computer Music Journal.

[26]  Lukas N. Mueller,et al.  Full Dynamic Range Proteome Analysis of S. cerevisiae by Targeted Proteomics , 2009, Cell.

[27]  Purvesh Khatri,et al.  A semantic analysis of the annotations of the human genome , 2005, Bioinform..

[28]  B J Stapley,et al.  Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[29]  Péter Gács,et al.  Information Distance , 1998, IEEE Trans. Inf. Theory.

[30]  Michael Schroeder,et al.  GoPubMed: exploring PubMed with the Gene Ontology , 2005, Nucleic Acids Res..

[31]  Xin Chen,et al.  An information-based sequence distance and its application to whole mitochondrial genome phylogeny , 2001, Bioinform..

[32]  Sophia Ananiadou,et al.  FACTA: a text search engine for finding associated biomedical concepts , 2008, Bioinform..

[33]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[34]  Scott Gustafson,et al.  caCORE: A common infrastructure for cancer informatics , 2003, Bioinform..

[35]  Paul M. B. Vitányi,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1993, Graduate Texts in Computer Science.

[36]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[37]  Ilya Shmulevich,et al.  Adaptable data management for systems biology investigations , 2009, BMC Bioinformatics.

[38]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.