Mining online full-text literature for novel protein interaction discovery

Mining published articles in biology and medicine is a favored means of identifying potential biomarkers in comparison to conventional reviewing process. This is made possible by the development of public literature databases and data mining algorithms. In this article, we present a method to extract novel protein interactions from online full-text articles for biomarker discovery. By evaluating support and confidence metrics, explicit and implicit protein interactions are extracted from corpus of articles. By properly chosen minimum support and confidence, our method maximizes the identification of known interactions while minimizing the number of novel interactions. Hence, our method provides a manageable size of novel interactions for biological validation.

[1]  Joel D. Martin,et al.  PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine , 2003, BMC Bioinformatics.

[2]  Alfonso Valencia,et al.  Text-mining approaches in molecular biology and biomedicine. , 2005, Drug discovery today.

[3]  Michael Krauthammer,et al.  GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data , 2004, J. Biomed. Informatics.

[4]  Alfonso Valencia,et al.  Evaluation of BioCreAtIvE assessment of task 2 , 2005, BMC Bioinformatics.

[5]  Padmini Srinivasan,et al.  Text mining: Generating hypotheses from MEDLINE , 2004, J. Assoc. Inf. Sci. Technol..

[6]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[7]  Rolf Apweiler,et al.  GOAnnotator: linking protein GO annotations to evidence text , 2006, Journal of biomedical discovery and collaboration.

[8]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[9]  Toshihisa Takagi,et al.  Automated extraction of information on protein-protein interactions from the biological literature , 2001, Bioinform..

[10]  Thomas Werner,et al.  The next generation of literature analysis: Integration of genomic analysis into text mining , 2005, Briefings Bioinform..

[11]  James W. Cooper,et al.  Discovery of protein-protein interactions using a combination of linguistic, statistical and graphical information , 2005, BMC Bioinformatics.

[12]  Michael Schroeder,et al.  Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies? , 2008, Briefings Bioinform..

[13]  Karin M. Verspoor,et al.  Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks , 2008, Genome Biology.

[14]  Daniel Berleant,et al.  Mining MEDLINE: Abstracts, Sentences, or Phrases? , 2001, Pacific Symposium on Biocomputing.

[15]  T. Takagi,et al.  Toward information extraction: identifying protein names from biological papers. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[16]  Hao Yu,et al.  Discovering patterns to extract protein-protein interactions from full texts , 2004, Bioinform..

[17]  William R. Hersh,et al.  A Survey of Current Work in Biomedical Text Mining , 2005 .

[18]  Daniel Hanisch,et al.  ProMiner: rule-based protein and gene entity recognition , 2005, BMC Bioinformatics.

[19]  Hsinchun Chen,et al.  Extracting gene pathway relations using a hybrid grammar: the Arizona Relation Parser , 2004, Bioinform..

[20]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[21]  Barend Mons,et al.  Online tools to support literature-based discovery in the life sciences , 2005, Briefings Bioinform..

[22]  P. Qian,et al.  Proteomics: Challenges, Techniques and Possibilities to Overcome Biological Sample Complexity , 2009, Human genomics and proteomics : HGP.

[23]  Kimberly Van Auken,et al.  Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation , 2009, BMC Bioinformatics.

[24]  Hagit Shatkay,et al.  Hairpins in bookstacks: Information retrieval from biomedical text , 2005, Briefings Bioinform..

[25]  Eugene W. Myers,et al.  Whole-genome DNA sequencing , 1999, Comput. Sci. Eng..