A transfer approach to detecting disease reporting events in blog social media

Event-Based Epidemic Intelligence (e-EI) has arisen as a body of work which relies upon different forms of pattern recognition in order to detect the disease reporting events from unstructured text that is present on the Web. Current supervised approaches to e-EI suffer both from high initial and high maintenance costs, due to the need to manually label examples to train and update a classifier for detecting disease reporting events in dynamic information sources, such as blogs. In this paper, we propose a new method for the supervised detection of disease reporting events. We tackle the burden of manually labelling data and address the problems associated with building a supervised learner to classify frequently evolving, and variable blog content. We automatically classify outbreak reports to train a supervised learner, and the knowledge acquired from the learning process is then transferred to the task of classifying blogs. Our experiments show that with the automatic classification of training data, and the transfer approach, we achieve an overall precision of 92% and an accuracy of 78.20%.

[1]  Son Doan,et al.  Using Hedges to Enhance a Disease Outbreak Report Text Mining System , 2009, BioNLP@HLT-NAACL.

[2]  Son Doan,et al.  Classifying disease outbreak reports using n-grams and semantic features , 2009, Int. J. Medical Informatics.

[3]  Wei Liu,et al.  Extending Semi-supervised Learning Methods for Inductive Transfer Learning , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[4]  Bing Liu,et al.  Automatic extraction of outbreak information from news , 2008 .

[5]  Alessandro Moschitti,et al.  Making Tree Kernels Practical for Natural Language Learning , 2006, EACL.

[6]  John Zimmerman,et al.  Learning information intent via observation , 2007, WWW '07.

[7]  Martin Oberhofer,et al.  Knowledge Discovery in the Blogosphere: Approaches and Challenges , 2010, IEEE Internet Computing.

[8]  Heng Ji,et al.  Can One Language Bootstrap the Other: A Case Study on Event Extraction , 2009, HLT-NAACL 2009.

[9]  Ramesh Nallapati,et al.  A Comparative Study of Methods for Transductive Transfer Learning , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[10]  Ariel Fuxman,et al.  Improving classification accuracy using automatically extracted training data , 2009, KDD.

[11]  Joe Carthy,et al.  Sentence-level event classification in unstructured texts , 2009, Information Retrieval.

[12]  Dan Roth,et al.  On Kernel Methods for Relational Learning , 2003, ICML.

[13]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[14]  Vikas Sindhwani,et al.  Active Dual Supervision: Reducing the Cost of Annotating Examples and Features , 2009, HLT-NAACL 2009.

[15]  Gareth J. F. Jones,et al.  Applying summarization techniques for term selection in relevance feedback , 2001, SIGIR '01.

[16]  Marie-Francine Moens,et al.  Information extraction from blogs , 2009 .

[17]  Bin Wang,et al.  A probabilistic model for retrospective news event detection , 2005, SIGIR '05.