Automated identification of medical concepts and assertions in medical text.

This paper describes a machine learning, text processing approach that allows the extraction of key medical information from unstructured text in Electronic Medical Records. The approach utilizes a novel text representation that shares the simplicity of the widely used bag-of-words representation, but can also represent some form of semantic information in the text. The large dimensionality of this type of learning models is controlled by the use of a ℓ(1) regularization to favor parsimonious models. Experimental results demonstrate the accuracy of the approach in extracting medical assertions that can be associated to polarity and relevance detection.

[1]  S. Soderland,et al.  Automatic structuring of radiology free-text reports. , 2001, Radiographics : a review publication of the Radiological Society of North America, Inc.

[2]  Limsoon Wong,et al.  Accomplishments and challenges in literature data mining for biology , 2002, Bioinform..

[3]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[4]  Angus Roberts,et al.  Extracting Clinical Relationships from Patient Narratives , 2008, BioNLP.

[5]  D. Aronow,et al.  Information technology applications in quality assurance and quality improvement, Part I. , 1993, The Joint Commission journal on quality improvement.

[6]  G Hripcsak,et al.  Natural language processing and its future in medicine. , 1999, Academic medicine : journal of the Association of American Medical Colleges.

[7]  Mark W. Schmidt,et al.  Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches , 2007, ECML.

[8]  N Sager,et al.  Automatic encoding into SNOMED III: a preliminary investigation. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[9]  Wendy W. Chapman,et al.  Evaluation of negation phrases in narrative clinical reports , 2001, AMIA.

[10]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Hyoil Han,et al.  Approaches to text mining for clinical medical records , 2006, SAC '06.