GUDM: Automatic Generation of Unified Datasets for Learning and Reasoning in Healthcare

A wide array of biomedical data are generated and made available to healthcare experts. However, due to the diverse nature of data, it is difficult to predict outcomes from it. It is therefore necessary to combine these diverse data sources into a single unified dataset. This paper proposes a global unified data model (GUDM) to provide a global unified data structure for all data sources and generate a unified dataset by a “data modeler” tool. The proposed tool implements user-centric priority based approach which can easily resolve the problems of unified data modeling and overlapping attributes across multiple datasets. The tool is illustrated using sample diabetes mellitus data. The diverse data sources to generate the unified dataset for diabetes mellitus include clinical trial information, a social media interaction dataset and physical activity data collected using different sensors. To realize the significance of the unified dataset, we adopted a well-known rough set theory based rules creation process to create rules from the unified dataset. The evaluation of the tool on six different sets of locally created diverse datasets shows that the tool, on average, reduces 94.1% time efforts of the experts and knowledge engineer while creating unified datasets.

[1]  Young-Koo Lee,et al.  Smart CDSS: integration of Social Media and Interaction Engine (SMIE) in healthcare for chronic disease patients , 2013, Multimedia Tools and Applications.

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  Sung Wook Baik,et al.  Saliency-directed prioritization of visual data in wireless surveillance networks , 2015, Inf. Fusion.

[4]  Christopher J. C. Burges,et al.  Spectral clustering and transductive learning with multiple views , 2007, ICML '07.

[5]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[6]  Petia Radeva,et al.  Multimodal Data Fusion for Intelligent Cardiovascular Diagnosis and Treatment in the Active Vessel Medical Workstation , 2009 .

[7]  Alon Y. Levy Logic-based techniques in data integration , 2001 .

[8]  Jerzy W. Grzymala-Busse,et al.  Knowledge acquisition under uncertainty — a rough set approach , 1988, J. Intell. Robotic Syst..

[9]  Jing Li,et al.  Heterogeneous data fusion for alzheimer's disease study , 2008, KDD.

[10]  Huan Liu,et al.  Multi-Source Feature Selection via Geometry-Dependent Covariance Analysis , 2008, FSDM.

[11]  Jerzy W. Grzymala-Busse,et al.  A Comparison of Several Approaches to Missing Attribute Values in Data Mining , 2000, Rough Sets and Current Trends in Computing.

[12]  X. E. Gros,et al.  Fusion of NDT data , 1993 .

[13]  Szymon Wilk,et al.  Rough Set Based Data Exploration Using ROSE System , 1999, ISMIS.

[14]  Sungyoung Lee,et al.  Oblivious user management for cloud-based data synchronization , 2015, The Journal of Supercomputing.

[15]  Sung Wook Baik,et al.  Mobile-Cloud Assisted Video Summarization Framework for Efficient Management of Remote Sensing Data Generated by Wireless Capsule Sensors , 2014, Sensors.

[16]  Abdulelah Alwabel,et al.  Toward a framework for data quality in cloud-based health information system , 2013, International Conference on Information Society (i-Society 2013).

[17]  Khairan Rajab,et al.  Heterogeneous modeling of medical image data using B-spline functions , 2012, Proceedings of the Institution of Mechanical Engineers. Part H, Journal of engineering in medicine.

[18]  Young-Koo Lee,et al.  Comprehensive Context Recognizer Based on Multimodal Sensors in a Smartphone , 2012, Sensors.

[19]  Marinka Zitnik,et al.  Matrix Factorization-Based Data Fusion for Gene Function Prediction in Baker's Yeast and Slime Mold , 2013, Pacific Symposium on Biocomputing.

[20]  Isabelle Bloch,et al.  Sensor fusion in anti-personnel mine detection using a two-level belief function model , 2003, IEEE Trans. Syst. Man Cybern. Part C.

[21]  Alberto Leardini,et al.  Multimod Data Manager: A tool for data fusion , 2007, Comput. Methods Programs Biomed..

[22]  Klaus R. Dittrich,et al.  Data Integration — Problems, Approaches, and Perspectives , 2007 .

[23]  George Lee,et al.  Multi-modal data fusion schemes for integrated classification of imaging and non-imaging biomedical data , 2011, 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[24]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[25]  K. Ovaska,et al.  Large-scale data integration framework provides a comprehensive view on glioblastoma multiforme , 2010, Genome Medicine.

[26]  Sung Wook Baik,et al.  Prioritization of brain MRI volumes using medical image perception model and tumor region segmentation , 2013, Comput. Biol. Medicine.

[27]  Paul E. Johnson,et al.  Impact of Electronic Health Record Clinical Decision Support on Diabetes Care: A Randomized Trial , 2011, The Annals of Family Medicine.

[28]  Sung Wook Baik,et al.  Video summarization based tele-endoscopy: a service to efficiently manage visual data generated during wireless capsule endoscopy procedure , 2014, Journal of Medical Systems.

[29]  Teeradache Viangteeravat,et al.  Clinical data integration of distributed data sources using Health Level Seven (HL7) v3-RIM mapping , 2011, Journal of Clinical Bioinformatics.

[30]  Finn Drabløs,et al.  MotifLab: a tools and data integration workbench for motif discovery and regulatory sequence analysis , 2012, BMC Bioinformatics.

[31]  V. Megalooikonomou,et al.  Medical Data Fusion for Telemedicine , 2007, IEEE Engineering in Medicine and Biology Magazine.

[32]  Chaiyaporn Chirathamjaree,et al.  A Data Model for Heterogeneous Data Sources , 2008, 2008 IEEE International Conference on e-Business Engineering.

[33]  D. S. El Zanfaly,et al.  Heterogeneous data reduction model for payment request file of direct debit processes , 2012, 2012 8th International Conference on Informatics and Systems (INFOS).

[34]  Alejandro F. Frangi,et al.  GIMIAS: An Open Source Framework for Efficient Development of Research Tools and Clinical Prototypes , 2009, FIMH.

[35]  Benjamin Littenberg,et al.  The Vermont Diabetes Information System: A Cluster Randomized Trial of a Population Based Decision Support System , 2009, Journal of General Internal Medicine.

[36]  Manolis Tsiknakis,et al.  Mining Distributed and Heterogeneous Data Sources: A Project in the Medical Domain , 2000 .

[37]  Wajahat Ali Khan,et al.  Cloud-based Smart CDSS for chronic diseases , 2013 .

[38]  Petar M. Djuric,et al.  A Bayesian Approach to Data Fusion in Sensor Networks , 2013, ArXiv.

[39]  Young-Koo Lee,et al.  EEM: evolutionary ensembles model for activity recognition in Smart Homes , 2012, Applied Intelligence.

[40]  Richard McClatchey,et al.  A Data Model for Integrating Heterogeneous Medical Data in the Health-e-Child Project , 2008, HealthGrid.

[41]  Sungyoung Lee,et al.  Human Facial Expression Recognition Using Wavelet Transform and Hidden Markov Model , 2013, IWAAL.

[42]  Marcin S. Szczuka,et al.  The Rough Set Exploration System , 2005, Trans. Rough Sets.

[43]  A. Goesmann,et al.  Building a BRIDGE for the integration of heterogeneous data from functional genomics into a platform for systems biology. , 2003, Journal of biotechnology.

[44]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[45]  Zdzislaw Pawlak,et al.  Rough Set Theory and its Applications to Data Analysis , 1998, Cybern. Syst..

[46]  Jiankang K. Wu,et al.  Bayesian Approach for Data Fusion in Sensor Networks , 2006, 2006 9th International Conference on Information Fusion.

[47]  George Lee,et al.  Computer-aided prognosis: Predicting patient and disease outcome via quantitative fusion of multi-scale, multi-modal data , 2011, Comput. Medical Imaging Graph..

[48]  Sungyoung Lee,et al.  O-Bin: Oblivious Binning for Encrypted Data over Cloud , 2015, 2015 IEEE 29th International Conference on Advanced Information Networking and Applications.

[49]  Marcin S. Szczuka,et al.  A New Version of Rough Set Exploration System , 2002, Rough Sets and Current Trends in Computing.