Data Preparation Framework for Preprocessing Clinical Data in Data Mining

Electronic health records are designed to provide online transactional data recording and reporting services that support the health care process. The characteristics of clinical data as it originates during the process of clinical documentation, including issues of data availability and complex representation models can make data mining applications challenging. Data preprocessing and transformation are required before one can apply data mining to clinical data. In this article, an approach to data preparation that utilizes information from the data, metadata and sources of medical knowledge is described. Heuristic rules and policies are defined for these three types of supporting information. Compared with an entirely manual process for data preparation, this approach can potentially reduce manual work by achieving a degree of automation in the rule creation and execution. A pilot experiment demonstrates that data sets created through this approach lead to a better model learning results than a fully manual process.