Knowledge-Based Patient Data Generation

The development and investigation of medical applications require patient data from various Electronic Health Records (EHR) or Clinical Records (CR). However, in practice, patient data is and should be protected and monitored to avoid unauthorized access or publicity, because of many reasons including privacy, security, ethics, and confidentiality. Thus, many researchers and developers encounter the problem to access required patient data for their research or make patient data available for example to demonstrate the reproducibility of their results. In this paper, we propose a knowledge-based approach of synthesizing large scale patient data. Our main goal is to make the generated patient data as realistic as possible, by using domain knowledge to control the data generation process. Such domain knowledge can be collected from biomedical publications such as PubMed, from medical textbooks, or web resources (e.g. Wikipedia and medical websites). Collected knowledge is formalized in the Patient Data Definition Language (PDDL) for the patient data generation. We have implemented the proposed approach in our Advanced Patient Data Generator (APDG). We have used APDG to generate large scale data for breast cancer patients in the experiments of SemanticCT, a semantically-enabled system for clinical trials. The results show that the generated patient data are useful for various tests in the system.

[1]  Anna L. Buczak,et al.  Construction and Validation of Synthetic Electronic Medical Records , 2009, Online journal of public health informatics.

[2]  Zhisheng Huang,et al.  Visual Interface Tools for Advanced Patient Data Generator , 2013 .

[3]  Anna L. Buczak,et al.  Data-driven approach for creating synthetic electronic medical records , 2010, BMC Medical Informatics Decis. Mak..

[4]  Minghui Zhang,et al.  Design and Implementation of Visualization Tools for Advanced Patient Data Generator , 2013, WISE Workshops.

[5]  Nicolette de Keizer,et al.  Towards the Automated Calculation of Clinical Quality Indicators , 2011, KR4HC.

[6]  Frank van Harmelen,et al.  SemanticCT: A Semantically-Enabled System for Clinical Trials , 2013, KR4HC/ProHealth.

[7]  Zhisheng Huang,et al.  A Semantically-Enabled System for Clinical Trials (demonstration) , 2013 .

[8]  Silvia Miksch,et al.  Process Support and Knowledge Representation in Health Care , 2012, Lecture Notes in Computer Science.

[9]  Dieter Fensel,et al.  Towards LarKC: A Platform for Web-Scale Reasoning , 2008, 2008 IEEE International Conference on Semantic Computing.

[10]  Frank van Harmelen,et al.  Rule-Based Formalization of Eligibility Criteria for Clinical Trials , 2013, AIME.

[11]  Silvia Miksch,et al.  Knowledge Representation for Health Care , 2014, Lecture Notes in Computer Science.

[12]  T Beale,et al.  Archetypes: Constraint-based Domain Models for Future-proof Information Systems , 2000, OOPSLA 2000.