HealthMap: Global Infectious Disease Monitoring through Automated Classification and Visualization of Internet Media Reports

Abstract Objective Unstructured electronic information sources, such as news reports, are proving to be valuable inputs for public health surveillance. However, staying abreast of current disease outbreaks requires scouring a continually growing number of disparate news sources and alert services, resulting in information overload. Our objective is to address this challenge through the HealthMap.org Web application, an automated system for querying, filtering, integrating and visualizing unstructured reports on disease outbreaks. Design This report describes the design principles, software architecture and implementation of HealthMap and discusses key challenges and future plans. Measurements We describe the process by which HealthMap collects and integrates outbreak data from a variety of sources, including news media (e.g., Google News), expert-curated accounts (e.g., ProMED Mail), and validated official alerts. Through the use of text processing algorithms, the system classifies alerts by location and disease and then overlays them on an interactive geographic map. We measure the accuracy of the classification algorithms based on the level of human curation necessary to correct misclassifications, and examine geographic coverage. Results As part of the evaluation of the system, we analyzed 778 reports with HealthMap, representing 87 disease categories and 89 countries. The automated classifier performed with 84% accuracy, demonstrating significant usefulness in managing the large volume of information processed by the system. Accuracy for ProMED alerts is 91% compared to Google News reports at 81%, as ProMED messages follow a more regular structure. Conclusion HealthMap is a useful free and open resource employing text-processing algorithms to identify important disease outbreak information through a user-friendly interface.

[1]  Paul A. Fontelo,et al.  Scanning the Emerging Infectious Diseases Horizon - Visualizing ProMED Emails Using EpiSPIDER , 2007 .

[2]  Eric Mykhalovskiy,et al.  The Global Public Health Intelligence Network and early warning outbreak detection: a Canadian contribution to global public health. , 2006, Canadian journal of public health = Revue canadienne de sante publique.

[3]  D. Laskin Dealing with information overload. , 1994, Journal of oral and maxillofacial surgery : official journal of the American Association of Oral and Maxillofacial Surgeons.

[4]  G. Rodier,et al.  Hot spots in a wired world: WHO surveillance of emerging and re-emerging infectious diseases. , 2001, The Lancet. Infectious diseases.

[5]  Starr Roxanne Hiltz,et al.  Structuring computer-mediated communication systems to avoid information overload , 1985, CACM.

[6]  A. Dobson,et al.  Emerging infectious pathogens of wildlife. , 2001, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[7]  John S. Brownstein,et al.  Enhancing West Nile Virus Surveillance, United States , 2004, Emerging infectious diseases.

[8]  L.D. Paulson Will hard drives finally stop shrinking? , 2005, Computer.

[9]  L. Madoff ProMED-mail: an early warning system for emerging diseases. , 2004, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[10]  L. Madoff,et al.  The internet and the global monitoring of emerging diseases: lessons from the first 10 years of ProMED-mail. , 2005, Archives of medical research.

[11]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[12]  NETWATCH: Diseases on the Move , 2006, Science.

[13]  Jesse James Garrett Ajax: A New Approach to Web Applications , 2007 .

[14]  Evangelos E. Milios,et al.  Filtering for medical news items using a machine learning approach , 2002, AMIA.

[15]  Michael J. Ryan,et al.  Rumors of disease in the global village: outbreak verification. , 2000, Emerging infectious diseases.

[16]  Berthier A. Ribeiro-Neto,et al.  An experimental study in automatically categorizing medical documents , 2001, J. Assoc. Inf. Sci. Technol..

[17]  Hsinchun Chen,et al.  Machine Learning for Information Retrieval: Neural Networks, Symbolic Learning, and Genetic Algorithms , 1995, J. Am. Soc. Inf. Sci..

[18]  Linda Dailey Paulson,et al.  Building Rich Web Applications with Ajax , 2005, Computer.

[19]  R. Doyle The American terrorist. , 2001, Scientific American.

[20]  N. Gratz,et al.  Emerging and resurging vector-borne diseases. , 1999, Annual review of entomology.

[21]  Hal Berghel,et al.  Cyberspace 2000: dealing with information overload , 1997, CACM.

[22]  Tim O'Reilly,et al.  What is Web 2.0: Design Patterns and Business Models for the Next Generation of Software , 2007 .

[23]  Steve Cayzer,et al.  Semantic blogging and decentralized knowledge management , 2004, CACM.

[24]  Gunther Eysenbach,et al.  SARS and Population Health Technology , 2003, Journal of medical Internet research.

[25]  B H Rosenberg,et al.  ProMED global monitoring of emerging diseases: design for a demonstration program. , 1996, Health policy.

[26]  J. Giles Internet encyclopaedias go head to head , 2005, Nature.