Complex question answering based on a semantic domain model of clinical medicine

Much research in recent years has focused on question answering. Due to significant advances in answering simple fact-seeking questions, research is moving towards resolving complex questions. An approach adopted by many researchers is to decompose a complex question into a series of fact-seeking questions and reuse techniques developed for answering simple questions. This thesis presents an alternative novel approach to domain-specific complex question answering based on consistently applying a semantic domain model to question and document understanding as well as to answer extraction and generation. This study uses a semantic domain model of clinical medicine to encode (a) a clinician's information need expressed as a question on the one hand and (b) the meaning of scientific publications on the other to yield a common representation. It is hypothesized that this approach will work well for (1) finding documents that contain answers to clinical questions and (2) extracting these answers from the documents. The domain of clinical question answering was selected primarily because of its unparalleled resources that permit providing a proof by construction for this hypothesis. In addition, a working prototype of a clinical question answering system will support research in informed clinical decision making. The proposed methodology is based on the semantic domain model developed within the paradigm of Evidence Based Medicine. Three basic components of this model---the clinical task, a framework for capturing a synopsis of a clinical scenario that generated the question, and strength of evidence presented in an answer---are identified and discussed in detail. Algorithms and methods were developed that combine knowledge-based and statistical techniques to extract the basic components of the domain model from abstracts of biomedical articles. These algorithms serve as a foundation for the prototype end-to-end clinical question answering system that was built and evaluated to test the hypotheses. Evaluation of the system on test collections developed in the course of this work and based on real life clinical questions demonstrates feasibility of complex question answering and high accuracy information retrieval using a semantic domain model.

[1]  Wanda Pratt,et al.  QueryCat: automatic categorization of MEDLINE queries , 2000, AMIA.

[2]  H. Hricak,et al.  Evidence-based medicine. , 1997, Singapore medical journal.

[3]  Padmini Srinivasan,et al.  Query Expansion and MEDLINE , 1996, Inf. Process. Manag..

[4]  Donna K. Harman,et al.  Relevance feedback revisited , 1992, SIGIR '92.

[5]  M. Chambliss,et al.  Answering clinical questions. , 1996, The Journal of family practice.

[6]  Carol Friedman,et al.  Two biomedical sublanguages: a description based on the theories of Zellig Harris , 2002, J. Biomed. Informatics.

[7]  Halil Kilicoglu,et al.  Abstraction Summarization for Managing the Biomedical Research Literature , 2004, HLT-NAACL 2004.

[8]  Ellen M. Voorhees,et al.  Overview of the TREC 2002 Question Answering Track , 2003, TREC.

[9]  Margaret Ann Wilkinson,et al.  Information Sources used by Lawyers in Problem Solving: An Empirical Exploration , 2001 .

[10]  Kevin Humphreys,et al.  New Directions in Question Answering , 2006, Information Retrieval.

[11]  T C Strasser The information needs of practicing physicians in northeastern New York State. , 1978, Bulletin of the Medical Library Association.

[12]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[13]  R A Greenes,et al.  SAPHIRE--an information retrieval system featuring concept matching, automatic indexing, probabilistic retrieval, and hierarchical relationships. , 1990, Computers and biomedical research, an international journal.

[14]  H. Mcdonald,et al.  Effects of computerized clinical decision support systems on practitioner performance and patient outcomes: a systematic review. , 2005, JAMA.

[15]  Answering Definition Questions with Multiple Knowledge Sources , 2004, NAACL.

[16]  Tat-Seng Chua,et al.  Generic soft pattern models for definitional question answering , 2005, SIGIR '05.

[17]  Vasileios Hatzivassiloglou,et al.  Leveraging a common representation for personalized search and summarization in a medical digital library , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[18]  D L Sackett,et al.  Using research findings in clinical practice , 1998, BMJ.

[19]  L. Zakowski,et al.  Evidence-based medicine: answering questions of diagnosis. , 2004, Clinical medicine & research.

[20]  Werner Deutsch,et al.  Processes of Question Answering , 2003 .

[21]  George R. Thoma,et al.  Organizing Literature Information for Clinical Decision Support , 2004, MedInfo.

[22]  Jimmy J. Lin,et al.  Evaluation of PICO as a Knowledge Representation for Clinical Questions , 2006, AMIA.

[23]  Jan Petersen,et al.  A Conceptual Model for Documentation of Clinical Information in the EHR , 2003, MIE.

[24]  Susan T. Dumais,et al.  Optimizing search by showing results in context , 2001, CHI.

[25]  Jimmy J. Lin,et al.  Answer Extraction, Semantic Clustering, and Extractive Summarization for Clinical Question Answering , 2006, ACL.

[26]  William S. Cooper,et al.  Fact Retrieval and Deductive Question-Answering Information Retrieval Systems , 1964, JACM.

[27]  Howard L. Bleich,et al.  Conceptual mapping of user's queries to medical subject headings , 1997, AMIA.

[28]  P. Gorman,et al.  A taxonomy of generic clinical questions: classification study , 2000, BMJ : British Medical Journal.

[29]  R. Brian Haynes,et al.  Developing optimal search strategies for detecting clinically sound studies in MEDLINE. , 1994, Journal of the American Medical Informatics Association : JAMIA.

[30]  E. R. Stinson,et al.  Survey of health professionals' information habits and needs. Conducted through personal interviews. , 1980, JAMA.

[31]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[32]  Marcelo Fiszman,et al.  The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text , 2003, J. Biomed. Informatics.

[33]  Michael Krauthammer,et al.  Term identification in the biomedical literature , 2004, J. Biomed. Informatics.

[34]  George Karypis,et al.  Evaluation of hierarchical clustering algorithms for document datasets , 2002, CIKM '02.

[35]  Farah Benamara Cooperative Question Answering in Restricted Domains: the WEBCOOP Experiment , 2004 .

[36]  H. Shatkey,et al.  Finding themes in Medline documents - probabilistic similarity search , 2000, Proceedings IEEE Advances in Digital Libraries 2000.

[37]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[38]  C. A. Perry,et al.  Knowledge bases in medicine: a review. , 1990, Bulletin of the Medical Library Association.

[39]  Carole D. Hafner,et al.  The role of context in case-based legal reasoning: teleological, temporal, and procedural , 2002, Artificial Intelligence and Law.

[40]  William R. Hersh,et al.  A categorization and analysis of the criticisms of Evidence-Based Medicine , 2004, Int. J. Medical Informatics.

[41]  Allen C. Browne,et al.  UMLS language and vocabulary tools. , 2003, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[42]  P. Hazell,et al.  A randomized controlled trial of clonidine added to psychostimulant medication for hyperactive and aggressive children. , 2003, Journal of the American Academy of Child and Adolescent Psychiatry.

[43]  Hongfang Liu,et al.  Knowledge-Intensive and Statistical Approaches to the Retrieval and Annotation of Genomics MEDLINE Citations , 2004, TREC.

[44]  Nunzia Bettinsoli Giuse,et al.  Evidence-based databases versus primary medical literature: an in-house investigation on their optimal use. , 2004, Journal of the Medical Library Association : JMLA.

[45]  Jimmy J. Lin,et al.  Data-Intensive Question Answering , 2001, TREC.

[46]  Nunzia Bettinsoli Giuse,et al.  Information in context: integrating information specialists into practice settings. , 2002, Journal of the Medical Library Association : JMLA.

[47]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Evaluation , 2000, TREC.

[48]  Susanne M. Humphrey,et al.  The NLM Indexing Initiative's Medical Text Indexer , 2004, MedInfo.

[49]  Richard Smith,et al.  Britain's gift: a “Medline” of synthesised evidence , 2001, BMJ : British Medical Journal.

[50]  Gloria J. Leckie,et al.  Modeling the Information Seeking of Professionals: A General Model Derived from Research on Engineers, Health Care Professionals, and Lawyers , 1996, The Library Quarterly.

[51]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[52]  W SEWELL,et al.  MEDICAL SUBJECT HEADINGS IN MEDLARS. , 1964, Bulletin of the Medical Library Association.

[53]  Lynette Hirschman,et al.  Natural language question answering: the view from here , 2001, Natural Language Engineering.

[54]  Thomas R. Lindlof Qualitative Communication Research Methods , 1994 .

[55]  Andrew Booth,et al.  The value of structured abstracts in information retrieval from MEDLINE , 1997 .

[56]  Thomas C. Rindflesch,et al.  Query Expansion Using the UMLS ® Metathesaurus ® , 1997 .

[57]  D. Slawson,et al.  Becoming an information master: using POEMs to change practice with confidence. Patient-Oriented Evidence that Matters. , 2000, The Journal of family practice.

[58]  Bin Zhu,et al.  elpfulMed: Intelligent searching for medical information over the internet , 2003, J. Assoc. Inf. Sci. Technol..

[59]  Mary P Fairchok,et al.  The efficacy of duct tape vs cryotherapy in the treatment of verruca vulgaris (the common wart). , 2002, Archives of pediatrics & adolescent medicine.

[60]  M. Ebell,et al.  Analysis of questions asked by family doctors regarding patient care , 1999, BMJ.

[61]  Leila Kosseim,et al.  The Problem of Precision in Restricted-Domain Question Answering. Some Proposed Methods of Improvement , 2004, Conference On Question Answering In Restricted Domains.

[62]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[63]  M. Schlick Formulating the Question , 1974 .

[64]  Fabio Rinaldi,et al.  Answering Questions in the Genomics Domain , 2004, ACL 2004.

[65]  George R. Thoma,et al.  PubMed on Tap: Discovering Design Principles for Online Information Delivery to Handheld Computers , 2004, MedInfo.

[66]  R. Brian Haynes,et al.  Enhancing Retrieval of Best Evidence for Health Care from Bibliographic Databases: Calibration of the Hand Search of the Literature , 2001, MedInfo.

[67]  P. Kantor Foundations of Statistical Natural Language Processing , 2001, Information Retrieval.

[68]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[69]  G. Guyatt,et al.  Users' guides to the medical literature. , 1993, JAMA.

[70]  B. Ewigman,et al.  Answering family physicians' clinical questions using electronic medical databases. , 2001, The Journal of family practice.

[71]  Fang Liu,et al.  Accessing MEDLINE/PubMed with Handheld Devices: Developments and New Search Portals , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[72]  Marvin Minsky,et al.  A framework for representing knowledge , 1974 .

[73]  T Timpka,et al.  Information needs and information seeking behaviour in primary health care. , 1989, Scandinavian journal of primary health care.

[74]  Jimmy J. Lin,et al.  Fusion of Knowledge-Intensive and Statistical Approaches for Retrieving and Annotating Textual Genomics Documents , 2005, TREC.

[75]  W. Rosenberg,et al.  Evidence based medicine: an approach to clinical problem-solving , 1995, BMJ.

[76]  Elmer V. Villanueva,et al.  Improving question formulation for use in evidence appraisal in a tertiary care setting: a randomised controlled trial [ISRCTN66375463] , 2001, BMC Medical Informatics Decis. Mak..

[77]  Jerome A Osheroff,et al.  Research Paper: Answering Physicians' Clinical Questions: Obstacles and Potential Solutions , 2005, J. Am. Medical Informatics Assoc..

[78]  Maarten de Rijke,et al.  Overview of the CLEF 2004 Multilingual Question Answering Track , 2004, CLEF.

[79]  Martin M. Soubbotin Patterns of Potential Answer Expressions as Clues to the Right Answers , 2001, TREC.

[80]  Young-In Song,et al.  A Practical QA System in Restricted Domains , 2004 .

[81]  Charles Sneiderman,et al.  Semantic characteristics of MEDLINE citations useful for therapeutic decision-making , 2005, AMIA.

[82]  Jimmy Lin,et al.  Situated Question Answering in the Clinical Domain: Selecting the Best Drug Treatment for Diseases , 2006, ACL 2006.

[83]  G. Bergus,et al.  Does the structure of clinical questions affect the outcome of curbside consultations with specialty colleagues? , 2000, Archives of family medicine.

[84]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[85]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[86]  Patrick Ruch,et al.  Using Argumentation to Retrieve Articles with Similar Citations from MEDLINE , 2004, NLPBA/BioNLP.

[87]  Carol Collier Kuhlthau,et al.  Information search process of lawyers: a call for 'just for me' information services , 2001, J. Documentation.

[88]  D. Covell,et al.  Information needs in office practice: are they being met? , 1985, Annals of internal medicine.

[89]  Tsuneaki Kato,et al.  An evaluation of question answering challenge (QAC-1) at the NTCIR workshop 3 , 2004, SIGF.

[90]  Stephen B. Johnson,et al.  Scenario-based Assessment of Physicians' Information Needs , 2004, MedInfo.

[91]  W. Bruce Croft,et al.  Combining the language model and inference network approaches to retrieval , 2004, Inf. Process. Manag..

[92]  W. Richardson,et al.  The well-built clinical question: a key to evidence-based decisions. , 1995, ACP journal club.

[93]  Wanda Pratt,et al.  A Study of Biomedical Concept Identification: MetaMap vs. People , 2003, AMIA.

[94]  Brian S Alper,et al.  Physicians Answer More Clinical Questions and Change Clinical Decisions More Often With Synthesized Evidence: A Randomized Trial in Primary Care , 2005, The Annals of Family Medicine.

[95]  S. D. De Groote,et al.  Measuring use patterns of online journals and databases. , 2003, Journal of the Medical Library Association : JMLA.

[96]  William R. Hersh,et al.  Research Paper: A Performance and Failure Analysis of SAPHIRE with a MEDLINE Test Collection , 1994, J. Am. Medical Informatics Assoc..

[97]  Jaime G. Carbonell,et al.  Unsupervised question answering data acquisition from local corpora , 2004, CIKM '04.

[98]  Jimmy J. Lin,et al.  The role of knowledge in conceptual retrieval: a study in the domain of clinical medicine , 2006, SIGIR.

[99]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[100]  Ann Peterson Bishop,et al.  Document Structure and Digital Libraries: How Researchers Mobilize Information in Journal Articles , 1999, Inf. Process. Manag..

[101]  Wanda Pratt,et al.  A Knowledge-Based Approach to Organizing Retrieved Documents , 1999, AAAI/IAAI.

[102]  Dario A. Giuse,et al.  Research Paper: Information Needs of Health Care Professionals in an Aids Outpatient Clinic as Determined by Chart Review , 1994, J. Am. Medical Informatics Assoc..

[103]  Halil Kilicoglu,et al.  Word sense disambiguation by selecting the best semantic type based on Journal Descriptor Indexing: Preliminary experiment , 2006, J. Assoc. Inf. Sci. Technol..

[104]  F. V. Kesteren Measuring the quality of multi-document cluster headlines , 2006 .

[105]  Edward H. Shortliffe,et al.  Development and evaluation of a context-based document representation for searching the medical literature , 1997, International Journal on Digital Libraries.

[106]  N. Ford,et al.  Structuring the pre-search reference interview: a useful technique for handling clinical questions. , 2000, Bulletin of the Medical Library Association.

[107]  Bernice W. Polemis Nonparametric Statistics for the Behavioral Sciences , 1959 .

[108]  Anita Komlodi,et al.  Attorneys interacting with legal information systems: Tools for mental model building and task integration , 2005, ASIST.

[109]  P N Gorman,et al.  Information Seeking in Primary Care , 1995, Medical decision making : an international journal of the Society for Medical Decision Making.

[110]  Randolph A. Miller,et al.  A New Tool to Identify Key Biomedical Concepts in Text Documents, with Special Application to Curriculum Content , 2002, AMIA.

[111]  S. Satya‐Murti Evidence-based Medicine: How to Practice and Teach EBM , 1997 .

[112]  Oren Tsur,et al.  BioGrapher: Biography Questions as a Restricted Domain Question Answering Task , 2004 .

[113]  Nina Wacholder,et al.  HITIQA : A Question Answering Analytical Tool , .

[114]  Graeme Hirst,et al.  Analysis of Semantic Classes in Medical Text for Question Answering , 2004 .

[115]  Ben Shneiderman,et al.  Readings in information visualization - using vision to think , 1999 .

[116]  M. Ebell,et al.  Strength of recommendation taxonomy (SORT): a patient-centered approach to grading evidence in the medical literature. , 2004, The Journal of the American Board of Family Practice.

[117]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[118]  Pierre Zweigenbaum,et al.  Towards a Medical Question-Answering System: a Feasibility Study , 2003, MIE.

[119]  W. Lehnert A Conceptual Theory of Question Answering , 1986, IJCAI.

[120]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[121]  D. Sackett,et al.  The number needed to treat: a clinically useful measure of treatment effect , 1995, BMJ.

[122]  B. Schneirdeman,et al.  Designing the User Interface: Strategies for Effective Human-Computer Interaction , 1998 .

[123]  D. Ward,et al.  The role of expert searching in the Family Physicians' Inquiries Network (FPIN). , 2005, Journal of the Medical Library Association : JMLA.

[124]  P. Seideman,et al.  Naproxen and paracetamol compared with naproxen only in coxarthrosis. Increased effect of the combination in 18 patients. , 1993, Acta orthopaedica Scandinavica.

[125]  Dunja Mladenic,et al.  Feature Selection for Unbalanced Class Distribution and Naive Bayes , 1999, ICML.

[126]  Craig A. Morioka,et al.  IndexFinder: A Method of Extracting Key Concepts from Clinical Texts for Indexing , 2003, AMIA.

[127]  Robert J Flaherty,et al.  A simple method for evaluating the clinical literature. , 2004, Family practice management.

[128]  Heiner Stuckenschmidt,et al.  Constructing a legal core ontology: LRI-Core , 2004 .

[129]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[130]  W. Bruce Croft,et al.  The INQUERY Retrieval System , 1992, DEXA.

[131]  Dragomir R. Radev,et al.  Question-answering by predictive annotation , 2000, SIGIR '00.

[132]  George Hripcsak,et al.  Research Paper: Knowledge-based Approaches to the Maintenance of a Large Controlled Medical Terminology , 1994, J. Am. Medical Informatics Assoc..

[133]  K. Cogdill,et al.  First-year medical students' information needs and resource selection: responses to a clinical scenario. , 1997, Bulletin of the Medical Library Association.

[134]  Elizabeth D. Liddy,et al.  Evaluation of Restricted Domain Question-Answering Systems , 2004, ACL 2004.

[135]  Hoa Trang Dang,et al.  Overview of DUC 2005 , 2005 .

[136]  S. Harabagiu,et al.  Strategies for Advanced Question Answering , 2004, Workshop On Pragmatics Of Question Answering.

[137]  Douglas W. Oard,et al.  Exploring Interactive Relevance Feedback With a Two-Pass Study Design , 2004 .

[138]  Cheryl Rae Dee,et al.  Information needs of the rural physician: A descriptive study , 1993, Bulletin of the Medical Library Association.

[139]  Enrico Motta,et al.  AQUA - Ontology-Based Question Answering System , 2004, MICAI.

[140]  Sanda M. Harabagiu,et al.  Using Scenario Knowledge in Automatic Question Answering , 2006 .

[141]  N McKoy,et al.  Systems to rate the strength of scientific evidence. , 2002, Evidence report/technology assessment.

[142]  F. Gutzwiller,et al.  A proposal for more informative abstracts of clinical articles. Ad Hoc Working Group for Critical Appraisal of the Medical Literature. , 1987, Annals of internal medicine.

[143]  R. Sutcliffe,et al.  A Qualitative Comparison of Scientific and Journalistic Texts from the Perspective of Extracting Definitions , 2004 .

[144]  Anton Leuski,et al.  Evaluating document clustering for interactive information retrieval , 2001, CIKM '01.

[145]  James J. Cimino,et al.  Building a Knowledge Base to Support a Digital Library , 2001, MedInfo.

[146]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.

[147]  P. Gorman,et al.  Can primary care physicians' questions be answered using the medical journal literature? , 1994, Bulletin of the Medical Library Association.

[148]  George R. Thoma,et al.  MEDLINE as a Source of Just-in-Time Answers to Clinical Questions , 2006, AMIA.

[149]  Jian Zhang,et al.  Improving the Effectiveness of Information Retrieval with Clustering and Fusion , 2001, Int. J. Comput. Linguistics Chin. Lang. Process..

[150]  Richard Smith What clinical information do doctors need? , 1996, BMJ.

[151]  Eduard H. Hovy,et al.  Learning surface text patterns for a Question Answering System , 2002, ACL.

[152]  William R. Hersh,et al.  Enhancing Access to the Bibliome: The TREC Genomics Track , 2004, MedInfo.