Fast tagging of medical terms in legal text

Medical terms occur across a wide variety of legal, medical, and news corpora. Documents containing these terms are of particular interest to legal professionals operating in such fields as medical malpractice, personal injury, and product liability. This paper describes a novel method of tagging medical terms in legal, medical, and news text that is very fast and also has high recall and precision. To date, most research in medical term spotting has been confined to medical text and has approached the problem by extracting noun phrases from sentences and mapping them to a list of medical concepts via a fuzzy lookup. The medical term tagging described in this paper relies on a fast finite state machine that finds within sentences the longest contiguous sets of words associated with medical terms in a medical term authority file, converts word sets into medical term hash keys, and looks up medical concept ids associated with the hash keys. Additionally our system relies on a probabilistic term classifier that uses local context to disambiguate terms being used in a medical sense from terms being used in a non-medical sense. Our method is two orders of magnitude faster than an approach based on noun phrase extraction and has better precision and recall for terms pertaining to injuries, diseases, drugs, medical procedures, and medical devices. The methods presented here have been implemented and are the core engines for a Thomson West product called the Medical Litigator. Thus far, the Medical Litigator has processed over 100 million documents and generated over 165 million tags representing approximately 164,000 unique medical concepts. The resulting system is very fast and posted a recall from 0.79 to 0.93 and precision between 0.94 and 0.97, depending on the document type.

[1]  Daniel Hanisch,et al.  : identifying , 2022 .

[2]  W. G. Cole,et al.  Metaphrase: An Aid to the Clinical Conceptualization and Formalization of Patient Problems in Healthcare Enterprises , 1998, Methods of Information in Medicine.

[3]  William T. Hole,et al.  Finding UMLS Metathesaurus concepts in MEDLINE , 2002, AMIA.

[4]  Nina Wacholder,et al.  Spotting and Discovering Terms Through Natural Language Processing , 2003, Information Retrieval.

[5]  Craig A. Morioka,et al.  IndexFinder: A Method of Extracting Key Concepts from Clinical Texts for Indexing , 2003, AMIA.

[6]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[7]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[8]  Steven Abney Rapid Incremental Parsing with Repair , 1990 .

[9]  Thomas H. Payne,et al.  Mapping to MeSH: The Art of Trapping MeSH Equivalence from within Narrative Text , 1988 .

[10]  Wesley W. Chu,et al.  IndexFinder : A Knowledge-based Method for Indexing Clinical Texts , 2003 .

[11]  Randolph A. Miller,et al.  A New Tool to Identify Key Biomedical Concepts in Text Documents, with Special Application to Curriculum Content , 2002, AMIA.

[12]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[13]  Douglas E. Appelt,et al.  FASTUS: A Finite-state Processor for Information Extraction from Real-world Text , 1993, IJCAI.