Shallow Information Extraction from Medical Forum Data

We study a novel shallow information extraction problem that involves extracting sentences of a given set of topic categories from medical forum data. Given a corpus of medical forum documents, our goal is to extract two related types of sentences that describe a biomedical case (i.e., medical problem descriptions and medical treatment descriptions). Such an extraction task directly generates medical case descriptions that can be useful in many applications. We solve the problem using two popular machine learning methods Support Vector Machines (SVM) and Conditional Random Fields (CRF). We propose novel features to improve the accuracy of extraction. Experiment results show that we can obtain an accuracy of up to 75%.

[1]  Sunita Sarawagi,et al.  Information Extraction , 2008 .

[2]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[3]  Young-In Song,et al.  Finding question-answer pairs from online forums , 2008, SIGIR '08.

[4]  Sanda M. Harabagiu,et al.  Methods for Using Textual Entailment in Open-Domain Question Answering , 2006, ACL.

[5]  Jihie Kim,et al.  An intelligent discussion-bot for answering student queries in threaded discussions , 2006, IUI '06.

[6]  Hyoil Han,et al.  Approaches to text mining for clinical medical records , 2006, SAC '06.

[7]  Daniel T. Heinze,et al.  Mining free-text medical records , 2001, AMIA.

[8]  ChengXiang Zhai,et al.  Generating comparative summaries of contradictory opinions in text , 2009, CIKM.

[9]  Vibhu O. Mittal,et al.  Comparative Experiments on Sentiment Classification for Online Product Reviews , 2006, AAAI.

[10]  S da SilvaAltigran,et al.  A brief survey of web data extraction tools , 2002 .

[11]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[12]  Calton Pu,et al.  A fully automated object extraction system for the World Wide Web , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[13]  Isabelle Bichindaritz,et al.  Medical applications in case-based reasoning , 2005, The Knowledge Engineering Review.

[14]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[15]  Mila Ramos-Santacruz,et al.  REES: A Large-Scale Relation and Event Extraction System , 2000, ANLP.

[16]  R. Platt,et al.  Using automated medical records for rapid identification of illness syndromes (syndromic surveillance): the example of lower respiratory infection , 2001, BMC public health.

[17]  Rudy Prabowo,et al.  Sentiment analysis: A combined approach , 2009, J. Informetrics.

[18]  Krzysztof J. Cios,et al.  Uniqueness of medical data mining , 2002, Artif. Intell. Medicine.

[19]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[20]  Berthier A. Ribeiro-Neto,et al.  A brief survey of web data extraction tools , 2002, SGMD.