Automatic classification of doctor-patient questions for a virtual patient record query task

We present the work-in-progress of automating the classification of doctorpatient questions in the context of a simulated consultation with a virtual patient. We classify questions according to the computational strategy (rule-based or other) needed for looking up data in the clinical record. We compare ‘traditional’ machine learning methods (Gaussian and Multinomial Naive Bayes, and Support Vector Machines) and a neural network classifier (FastText). We obtained the best results with the SVM using semantic annotations, but the neural classifier achieved promising results without it.

[1]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[2]  Jimmy J. Lin,et al.  Answering Clinical Questions with Knowledge-Based and Statistical Techniques , 2007, CL.

[3]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[4]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[5]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[6]  Anton Leuski,et al.  Building Effective Question Answering Characters , 2006, SIGDIAL Workshop.

[7]  Kirk Roberts,et al.  Interactive use of online health resources: a comparison of consumer and professional questions , 2016, J. Am. Medical Informatics Assoc..

[8]  Albert A. Rizzo,et al.  Evaluation of Justina: A Virtual Patient with PTSD , 2008, IVA.

[9]  Preslav Nakov,et al.  SemEval-2016 Task 3: Community Question Answering , 2019, *SEMEVAL.

[10]  Ellen M. Voorhees,et al.  State-of-the-art in biomedical literature retrieval for clinical cases: a survey of the TREC 2014 CDS track , 2016, Information Retrieval Journal.

[11]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[12]  Zhi Jin,et al.  Discriminative Neural Sentence Modeling by Tree-Based Convolution , 2015, EMNLP.

[13]  Hyoil Han,et al.  Biomedical question answering: A survey , 2010, Comput. Methods Programs Biomed..

[14]  Yifeng Liu,et al.  Question Answering for Biomedicine , 2016 .

[15]  Pierre Zweigenbaum,et al.  Description of the PatientGenesys Dialogue System , 2015, SIGDIAL Conference.

[16]  Amy O. Stevens,et al.  The use of virtual patients to teach medical students history taking and communication skills. , 2006, American journal of surgery.

[17]  Félicie Pastore How Can I Help You Today ? Guide de la Consultation Médicale et Paramédicale en Anglais , 2016 .

[18]  Ulf Hermjakob,et al.  Parsing and Question Classification for Question Answering , 2001, ACL 2001.

[19]  Evan Jaffe,et al.  Interpreting Questions with a Log-Linear Ranking Model in a Virtual Patient Dialogue System , 2015, BEA@NAACL-HLT.

[20]  Siddharth Patwardhan,et al.  Question analysis: How Watson reads a clue , 2012, IBM J. Res. Dev..

[21]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[22]  Gale M. Lucas,et al.  Natural Language Understanding Performance & Use Considerations in Virtual Medical Encounters , 2016, MMVR.

[23]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[24]  Min Li,et al.  An ontology for clinical questions about the contents of patient notes , 2012, J. Biomed. Informatics.

[25]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[26]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[27]  Pierre Zweigenbaum,et al.  Managing Linguistic and Terminological Variation in a Medical Dialogue System , 2016, LREC.

[28]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[29]  Paul N Kizakevich,et al.  Lessons learned in modeling virtual pediatric patients. , 2003, Studies in health technology and informatics.

[30]  Todd R. Johnson,et al.  Retrofitting Word Vectors of MeSH Terms to Improve Semantic Similarity Measures , 2016, Louhi@EMNLP.

[31]  Halil Kilicoglu,et al.  Decomposing Consumer Health Questions , 2014, BioNLP@ACL.

[32]  Yoav Goldberg,et al.  A Primer on Neural Network Models for Natural Language Processing , 2015, J. Artif. Intell. Res..

[33]  Thomas C. Rindflesch,et al.  Semantic representation of consumer questions and physician answers , 2006, Int. J. Medical Informatics.

[34]  Sanda M. Harabagiu,et al.  Medical Question Answering for Clinical Decision Support , 2016, CIKM.

[35]  Ye Zhang,et al.  MGNC-CNN: A Simple Approach to Exploiting Multiple Word Embeddings for Sentence Classification , 2016, NAACL.

[36]  Ulf Leser,et al.  Question answering for biology. , 2015, Methods.

[37]  Sampo Pyysalo,et al.  How to Train good Word Embeddings for Biomedical NLP , 2016, BioNLP@ACL.

[38]  P. Gorman,et al.  A taxonomy of generic clinical questions: classification study , 2000, BMJ : British Medical Journal.

[39]  Diego Molla Aliod,et al.  Question Answering in Restricted Domains: An Overview , 2007, CL.

[40]  Halil Kilicoglu,et al.  Automatically Classifying Question Types for Consumer Health Questions , 2014, AMIA.