Knowledge Discovery Using the Electronic Medical Record

Abstract Knowledge discovery and data mining is one of the most promising areas of current informatics research. However, real successes of clinical data mining have mainly been limited to algorithms research, to specific prospectively created datasets, or to administrative databases requiring manual extraction of data. Natural language processing (NLP), which extracts clinical information from text reports, increases the available data for knowledge discovery. This allows greater use of clinical data already stored in existing clinical databases. We validated a dataset using NLP and rules to extract clinical findings with a prediction rule that was validated on manually abstracted data. The outcome variables for each study were similar, indicating the potential of using NLP extracted findings to create datasets for clinical research. The study also indicated the potential for using data external data sources to determine clinical outcomes.