Quality-Based Knowledge Discovery from Medical Text on the Web

The MEDLINE database (Medical Literature Analysis and Retrieval System Online) contains an enormously increasing volume of biomedical articles. Consequently there is need for techniques which enable the quality-based discovery, the extraction, the integration and the use of hidden knowledge in those articles. Text mining helps to cope with the interpretation of these large volumes of data. Co-occurrence analysis is a technique applied in text mining. Statistical models are used to evaluate the significance of the relationship between entities such as disease names, drug names, and keywords in titles, abstracts or even entire publications. In this paper we present a selection of quality-oriented Web-based tools for analyzing biomedical literature, and specifically discuss PolySearch, FACTA and Kleio. Finally we discuss Pointwise Mutual Information (PMI), which is a measure to discover the strength of a relationship. PMI provides an indication of how more often the query and concept co-occur than expected by change. The results reveal hidden knowledge in articles regarding rheumatic diseases indexed by MEDLINE, thereby exposing relationships that can provide important additional information for medical experts and researchers for medical decision-making and quality-enhancing.

[1]  David S. Wishart,et al.  Nucleic Acids Research Polysearch: a Web-based Text Mining System for Extracting Relationships between Human Diseases, Genes, Mutations, Drugs Polysearch: a Web-based Text Mining System for Extracting Relationships between Human Diseases, Genes, Mutations, Drugs and Metabolites , 2008 .

[3]  Zhiyong Lu,et al.  PubMed and beyond: a survey of web tools for searching biomedical literature , 2011, Database J. Biol. Databases Curation.

[4]  Ana L. N. Fred,et al.  On Applying Approximate Entropy to ECG Signals for Knowledge Discovery on the Example of Big Sensor Data , 2012, AMT.

[5]  Jeffrey L. Solka,et al.  Text Data Mining: Theory and Methods , 2008, ArXiv.

[6]  Pinar Yildirim,et al.  Clustering Analysis for Vasculitic Diseases , 2010, NDT.

[7]  Sophia Ananiadou,et al.  FACTA: a text search engine for finding associated biomedical concepts , 2008, Bioinform..

[8]  Reza Hassanpour,et al.  Prediction of Similarities Among Rheumatic Diseases , 2010, Journal of Medical Systems.

[9]  T. V. D. Cruys Two multivariate generalizations of pointwise mutual information , 2011 .

[10]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .

[11]  Timothy Baldwin,et al.  Evaluating topic models for digital libraries , 2010, JCDL '10.

[12]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[13]  Gabriel Recchia,et al.  More data trumps smarter algorithms: Comparing pointwise mutual information with latent semantic analysis , 2009, Behavior research methods.

[14]  Naoaki Okazaki,et al.  Kleio: a knowledge-enriched information retrieval system for biology , 2008, SIGIR '08.

[15]  Markus Kreuzthaler,et al.  A Comparison of Different Retrieval Strategies Working on Medical Free Texts , 2011, J. Univers. Comput. Sci..

[16]  Teruko Takada,et al.  Mining local and tail dependence structures based on pointwise mutual information , 2011, Data Mining and Knowledge Discovery.

[17]  Judit Bar-Ilan,et al.  Comparing rankings of search results on the Web , 2005, Inf. Process. Manag..

[18]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[19]  Andreas Holzinger,et al.  Disease-Disease Relationships for Rheumatic Diseases: Web-Based Biomedical Textmining an Knowledge Discovery to Assist Medical Decision Making , 2012, 2012 IEEE 36th Annual Computer Software and Applications Conference.

[20]  Andreas Holzinger,et al.  On Using Entropy for Enhancing Handwriting Preprocessing , 2012, Entropy.

[21]  J. Silva,et al.  A Local Maxima method and a Fair Dispersion Normalization for extracting multi-word units from corpora , 2009 .