Overview of the NTCIR-13: MedWeb Task

The amount of medical and clinical-related information on the Web is increasing. Among the various types of information, Web-based data are particularly valuable, with Twitterbased medical research garnering much attention. The NTCIR13 MedWeb (Medical Natural Language Processing for Web Document) provides pseudo-Twitter messages in a crosslanguage and multi-label corpus, covering three languages (Japanese, English, and Chinese), and annotated with eight labels (e.g., cold, fever, flu, and so on). The MedWeb task classifies each tweet into one of two categories: those containing a patient’s symptom, and those that do not. Because our task settings can be formalized as the factualization of text, the achievement of this task can be applied directly to practical clinical applications. In all, eight groups (19 systems) participated in the Japanese subtask, four groups (12 systems) participated in the English subtask, and two groups (six systems) participated in the Chinese subtask. This paper presents the results of these systems, along with relevant discussions, to clarify the issues that need to be resolved in medical natural language processing.

[1]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[2]  Yuji Matsumoto,et al.  Applying Conditional Random Fields to Japanese Morphological Analysis , 2004, EMNLP.

[3]  David M. Pennock,et al.  Using internet searches for influenza surveillance. , 2008, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[4]  Ozlem Uzuner,et al.  Second i2b2 workshop on natural language processing challenges for clinical records. , 2008, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[5]  Aron Culotta,et al.  Towards detecting influenza epidemics by analyzing Twitter messages , 2010, SOMA '10.

[6]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7]  Alberto Maria Segre,et al.  The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic , 2011, PloS one.

[8]  Mizuki Morita,et al.  Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter , 2011, EMNLP.

[9]  Benyuan Liu,et al.  Twitter Improves Seasonal Influenza Prediction , 2018, HEALTHINF.

[10]  Ellen M. Voorhees,et al.  Overview of the TREC 2012 Medical Records Track , 2012, TREC.

[11]  Michael J. Paul,et al.  National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic , 2013, PloS one.

[12]  Paola Velardi,et al.  Influenza-Like Illness Surveillance on Twitter through Automated Learning of Naïve Language , 2013, PloS one.

[13]  Tomoko Ohkuma,et al.  Overview of the NTCIR-10 MedNLP Task , 2013, NTCIR.

[14]  Mark Dredze,et al.  Separating Fact from Fear: Tracking Flu Infections on Twitter , 2013, NAACL.

[15]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[16]  Michael J. Paul,et al.  Twitter Improves Influenza Forecasting , 2014, PLoS currents.

[17]  Tomoko Ohkuma,et al.  Overview of the NTCIR-11 MedNLP-2 Task , 2014, NTCIR.

[18]  M. Shigematsu,et al.  Using Social Media for Actionable Disease Surveillance and Outbreak Management: A Systematic Literature Review , 2015, PloS one.

[19]  Michael J. Paul,et al.  Social Media as a Sensor of Air Quality and Public Response in China , 2015, Journal of medical Internet research.

[20]  Tomoko Ohkuma,et al.  Overview of the NTCIR-12 MedNLPDoc Task , 2016, NTCIR.

[21]  Eiji Aramaki,et al.  Forecasting Word Model: Twitter-based Influenza Surveillance and Prediction , 2016, COLING.