Assessment of Utility in Web Mining for the Domain of Public Health

This paper presents ongoing work on application of Information Extraction (IE) technology to domain of Public Health, in a real-world scenario. A central issue in IE is the quality of the results. We present two novel points. First, we distinguish the criteria for quality: the objective criteria that measure correctness of the system's analysis in traditional terms (F-measure, recall and precision), and, on the other hand, subjective criteria that measure the utility of the results to the end-user. Second, to obtain measures of utility, we build an environment that allows users to interact with the system by rating the analyzed content. We then build and compare several classifiers that learn from the user's responses to predict the relevance scores for new events. We conduct experiments with learning to predict relevance, and discuss the results and their implications for text mining in the domain of Public Health.

[1]  Lynette Hirschman,et al.  Language understanding evaluations: lessons learned from MUC and ATIS , 1998, LREC.

[2]  Andrew McCallum,et al.  Confidence Estimation for Information Extraction , 2004, NAACL.

[3]  A Mawudeku,et al.  Landscape of international event-based biosurveillance , 2010, Emerging health threats journal.

[4]  Steinberger Ralf,et al.  Combining Information about Epidemic Threats from Multiple Sources , 2007 .

[5]  Kenneth D. Mandl,et al.  HealthMap: Global Infectious Disease Monitoring through Automated Classification and Visualization of Internet Media Reports , 2008, Journal of the American Medical Informatics Association.

[6]  Ralph Grishman,et al.  Information extraction for enhanced access to disease outbreak reports , 2002, J. Biomed. Informatics.

[7]  Ralph Grishman,et al.  Complexity of Event Structure in IE Scenarios , 2002, COLING.

[8]  Son Doan,et al.  Global Health Monitor - A Web-based System for Detecting and Mapping Infectious Diseases , 2019, IJCNLP.

[9]  GrishmanRalph,et al.  Information extraction for enhanced access to disease outbreak reports , 2002 .

[10]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[11]  Steinberger Ralf,et al.  Automatic Epidemiological Surveillance from On-line News in MedISys and PULS , 2009 .

[12]  Ralf Steinberger,et al.  Text Mining from the Web for Medical Intelligence , 2007, NATO ASI Mining Massive Data Sets for Security.

[13]  J. Linge,et al.  Internet surveillance systems for early alerting of health threats. , 2009, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[14]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.