Exploring text mining from MEDLINE

We present a text mining application that exploits the MeSH heading subheading combinations present in MEDLINE records. The process begins with a user specified pair of subheadings. Co-occurring concepts qualified by these subheadings are regarded as being conceptually related and thus extracted. A parallel process using SemRep, a linguistic tool, also extracts conceptually related concept pairs from the titles of MEDLINE records. The pairs extracted via MeSH and the pairs extracted via SemRep are compared to yield a high confidence subset. These pairs are then combined to project a summary view associated with the selected subheading pair. For each concept the "diversity" in the set of related concepts is assessed. We suggest that this summary and the diversity indicators will be useful a health care practitioner or researcher. We illustrate this application with the subheading pair "drug therapy" and "therapeutic use" which approximates the treatment relationship between Drugs and Diseases.

[1]  B J Stapley,et al.  Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[2]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[3]  Marti A. Hearst Untangling Text Data Mining , 1999, ACL.

[4]  Hagit Shatkay,et al.  Genes, Themes, and Microarrays: Using Information Retrieval for Large-Scale Gene Analysis , 2000, ISMB.

[5]  Neil R. Smalheiser,et al.  Artificial Intelligence An interactive system for finding complementary literatures : a stimulus to scientific discovery , 1995 .

[6]  Michael Gribskov,et al.  Use of keyword hierarchies to interpret gene expression patterns , 2001, Bioinform..

[7]  Lorraine K. Tanabe,et al.  Tagging gene and protein names in biomedical text , 2002, Bioinform..

[8]  James J. Cimino,et al.  Automated knowledge extraction from MEDLINE citations , 2000, AMIA.

[9]  Lawrence Hunter,et al.  Extracting Molecular Binding Relationships from Biomedical Text , 2000, ANLP.

[10]  R. Brian Haynes,et al.  Developing optimal search strategies for detecting clinically sound studies in MEDLINE. , 1994, Journal of the American Medical Informatics Association : JAMIA.

[11]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[12]  J. Cimino,et al.  Automatic knowledge acquisition from MEDLINE. , 1993, Methods of information in medicine.

[13]  Charles Sneiderman,et al.  Argument identification for arterial branching predications asserted in cardiac catheterization reports , 2000, AMIA.

[14]  Padmini Srinivasan,et al.  MeSHmap: a text mining tool for MEDLINE , 2001, AMIA.

[15]  Michael Krauthammer,et al.  GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles , 2001, ISMB.