Converting Semi-structured Clinical Medical Records into Information and Knowledge

Clinical medical records contain a wealth of information, largely in free-textual form. Thus, means to extract structured information from free-text records becomes an important research endeavor. In this paper, we propose and implement an information extraction system that extracts three types of information - numeric values, medical terms and categorical value - from semi-structured patient records. Three approaches are proposed to solve the problems posed by each of the three types of values, respectively, and very good performance (precision and recall) is achieved. A novel link-grammar based approach was invented to associate feature and number in a sentence, and extremely high accuracy was achieved. A simple but efficient approach, using POS-based pattern and domain ontology, was adopted to extract medical terms of interest. Finally, an NLPbased feature extraction method coupled with an ID3 based decision tree is used to classify and extract categorical cases. This preliminary approach to categorical fields has, so far, proven to be quite effective.

[1]  Hamish Cunningham,et al.  GATE-a General Architecture for Text Engineering , 1996, COLING.

[2]  Forouzan Golshani,et al.  Proceedings of the Eighth International Conference on Data Engineering , 1992 .

[3]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[4]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[5]  David Fisher,et al.  Machine Learning of Text Analysis Rules for Clinical Records , 1999 .

[6]  Ellen Riloff,et al.  Automatically Constructing a Dictionary for Information Extraction Tasks , 1993, AAAI.

[7]  Sanda M. Harabagiu,et al.  Acquisition of Linguistic Patterns for Knowledge-based Information Extraction , 2000, LREC.

[8]  Renato De Mori,et al.  The Application of Semantic Classification Trees to Natural Language Understanding , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[10]  Ellen Riloff,et al.  Information extraction as a basis for high-precision text classification , 1994, TOIS.

[11]  David Fisher,et al.  CRYSTAL: Inducing a Conceptual Dictionary , 1995, IJCAI.

[12]  Daniel Dominic Sleator,et al.  Parsing English with a Link Grammar , 1995, IWPT.

[13]  Diana Maynard,et al.  JAPE: a Java Annotation Patterns Engine , 2000 .

[14]  Wendy G. Lehnert,et al.  Inductive text classification for medical applications , 1995, J. Exp. Theor. Artif. Intell..