Navigating through Very Large Sets of Medical Records: An Information Retrieval Evaluation Architecture for Non-standardized Text

Despite the prevalence of informatics and advanced information systems, there exists large amounts of unstructured text data. This is especially true in medicine and health care, where free text is an indispensable part of information representation. In this paper, the motivation behind developing information retrieval systems in medicine and health care is described. An overview of information retrieval evaluation is given, before describing the architecture and the development of an extendible information retrieval evaluation framework. This framework allows different information retrieval tools to be compared to a gold standard in order to test its effectiveness. The paper also gives a review of available gold standards which can be used for research purposes in the area of information retrieval of medical free texts.

[1]  N R Smalheiser,et al.  Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses. , 1998, Computer methods and programs in biomedicine.

[2]  Angus Roberts,et al.  The CLEF Corpus: Semantic Annotation of Clinical Text , 2007, AMIA.

[3]  Stephen E. Robertson,et al.  On the Evaluation of IR Systems , 1992, Inf. Process. Manag..

[4]  G Gell,et al.  AURA: Routine Documentation of Medical Texts , 1983, Methods of Information in Medicine.

[5]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[6]  F. Sullivan,et al.  An information retrieval service to support clinical decision-making at the point of care. , 1999, The British journal of general practice : the journal of the Royal College of General Practitioners.

[7]  Markus Kreuzthaler,et al.  On the Need for Open-Source Ground Truths for Medical Information Retrieval Systems , 2010 .

[8]  J. Mattison,et al.  Naming Notes: Transitions from Free Text to Structured Entry , 1995, Methods of Information in Medicine.

[9]  Christopher G. Chute,et al.  Building and Evaluating Annotated Corpora for Medical NLP Systems , 2006, AMIA.

[10]  Fredric C. Gey,et al.  ENSM-SE at CLEF 2006 : Fuzzy Proximity Method with an Adhoc Influence Function in Evaluation of Multilingual and Multi-modal Information Retrieval 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, Alicante, Spain , 2007 .

[11]  Yuan Luo,et al.  Identifying patient smoking status from medical discharge records. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[12]  Andreas Holzinger,et al.  Semantic Information in Medical Information Systems: Utilization of Text Mining Techniques to Analyze Medical Diagnoses , 2008, J. Univers. Comput. Sci..

[13]  James R. Warren,et al.  Information Overload: Opportunities and Challenges for the GP's Desktop , 1998, MedInfo.

[14]  Patrick Ruch,et al.  Model Formulation: Advancing Biomedical Image Retrieval: Development and Analysis of a Test Collection , 2006, J. Am. Medical Informatics Assoc..

[15]  Andreas Holzinger,et al.  Semantische Informationsextraktion in medizinischen Informationssystemen , 2007, Informatik-Spektrum.

[16]  F Wingert Morphologic analysis of compound words. , 1985, Methods of information in medicine.

[17]  Peter Szolovits,et al.  Evaluating the state-of-the-art in automatic de-identification. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[18]  Arie Hasman,et al.  The granularity of medical narratives and its effect on the speed and completeness of information retrieval. , 1998, Journal of the American Medical Informatics Association : JAMIA.

[19]  Hermann A. Maurer,et al.  Interactive Computer Assisted Formulation of Retrieval Requests for a Medical Information System using an Intelligent Tutoring System , 2000 .

[20]  L A Lenert,et al.  Monitoring free-text data using medical language processing. , 1993, Computers and biomedical research, an international journal.

[21]  Andreas Holzinger,et al.  Usability engineering methods for software developers , 2005, CACM.

[22]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[23]  Stephen P. Harter,et al.  Evaluation of information retrieval systems : Approaches, issues, and methods , 1997 .

[24]  Andreas Holzinger,et al.  Semantic Information in Medical Information Systems - from Data and Information to Knowledge: Facing Information Overload , 2007 .

[25]  Christian Lovis,et al.  Power of expression in the electronic patient record: structured data or narrative text? , 2000, Int. J. Medical Informatics.

[26]  Philip J. B. Brown,et al.  Evaluation of the quality of information retrieval of clinical findings from a computerized patient database using a semantic terminological model. , 2000, Journal of the American Medical Informatics Association : JAMIA.

[27]  G Gell,et al.  Experience with the AURA free-text documentation system. , 1976, Radiology.

[28]  Eugene Kim,et al.  Overview of the ImageCLEFmed 2006 Medical Retrieval and Medical Annotation Tasks , 2006, CLEF.

[29]  Chris Buckley Why current IR engines fail , 2004, SIGIR '04.

[30]  K. Bretonnel Cohen,et al.  A shared task involving multi-label classification of clinical free text , 2007, BioNLP@ACL.

[31]  Thomas D. Wilson,et al.  Human Information Behavior , 2000, Informing Sci. Int. J. an Emerg. Transdiscipl..

[32]  F Wingert,et al.  Automated Indexing Based on SNOMED , 1985, Methods of Information in Medicine.

[33]  Tefko Saracevic,et al.  Evaluation of evaluation in information retrieval , 1995, SIGIR '95.

[34]  Markus Kreuzthaler,et al.  A Comparison of Different Retrieval Strategies Working on Medical Free Texts , 2011, J. Univers. Comput. Sci..

[35]  F Wingert An indexing system for SNOMED. , 1986, Methods of information in medicine.

[36]  W R Hersh,et al.  How well do physicians use electronic information retrieval systems? A framework for investigation and systematic review. , 1998, JAMA.