Particularities of data mining in medicine: lessons learned from patient medical time series data analysis

Nowadays, large amounts of data are generated in the medical domain. Various physiological signals generated from different organs can be recorded to extract interesting information about patients’ health. The analysis of physiological signals is a hard task that requires the use of specific approaches such as the Knowledge Discovery in Databases process. The application of such process in the domain of medicine has a series of implications and difficulties, especially regarding the application of data mining techniques to data, mainly time series, gathered from medical examinations of patients. The goal of this paper is to describe the lessons learned and the experience gathered by the authors applying data mining techniques to real medical patient data including time series. In this research, we carried out an exhaustive case study working on data from two medical fields: stabilometry (15 professional basketball players, 18 elite ice skaters) and electroencephalography (100 healthy patients, 100 epileptic patients). We applied a previously proposed knowledge discovery framework for classification purpose obtaining good results in terms of classification accuracy (greater than 99% in both fields). The good results obtained in our research are the groundwork for the lessons learned and recommendations made in this position paper that intends to be a guide for experts who have to face similar medical data mining projects.

[1]  Ulrich Müller-Kolck,et al.  Systematic introduction to expert systems: Knowledge representations and problem-solving methods , 1994 .

[2]  Edward H. Shortliffe,et al.  Chapter 3 – Consultation System , 1976 .

[3]  A. Anguera,et al.  Applying data mining techniques to medical time series: an empirical case study in electroencephalography and stabilometry , 2016, Computational and structural biotechnology journal.

[4]  Aboul Ella Hassanien,et al.  Dimensionality reduction of medical big data using neural-fuzzy classifier , 2014, Soft Computing.

[5]  Joel J. P. C. Rodrigues,et al.  A Comprehensive Review on Smart Decision Support Systems for Health Care , 2019, IEEE Systems Journal.

[6]  U. Rajendra Acharya,et al.  Application of entropies for automated diagnosis of epilepsy using EEG signals: A review , 2015, Knowl. Based Syst..

[7]  E. Kannan,et al.  Generalized Regression Neural Network based Expert System for Hepatitis B Diagnosis , 2014, J. Comput. Sci..

[8]  Zhiming Ji,et al.  Measurement of balance in computer posturography: Comparison of methods--A brief review. , 2011, Journal of bodywork and movement therapies.

[9]  Edward H. Shortliffe,et al.  Computer-based medical consultations, MYCIN , 1976 .

[10]  George Scott,et al.  Strategic Planning for High-Tech Product Development , 2001, Technol. Anal. Strateg. Manag..

[11]  R. Barry,et al.  Event-related EEG time-frequency PCA and the orienting reflex to auditory stimuli. , 2015, Psychophysiology.

[12]  Dimitrios I. Fotiadis,et al.  Mining balance disorders' data for the development of diagnostic decision support systems , 2016, Comput. Biol. Medicine.

[13]  Juan Alfonso Lara,et al.  Modelling Stabilometric Time Series , 2010, HEALTHINF.

[14]  Mei-Hui Wang,et al.  A Fuzzy Expert System for Diabetes Decision Support Application , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[15]  Joshua Lederberg How DENDRAL was conceived and born , 1990 .

[16]  R Boniver [Posture and posturography]. , 1994, Revue medicale de Liege.

[17]  J. Eisman,et al.  Identification of High‐Risk Individuals for Hip Fracture: A 14‐Year Prospective Study , 2005, Journal of bone and mineral research : the official journal of the American Society for Bone and Mineral Research.

[18]  Dharmendra Sharma,et al.  Artificial Intelligence and Data Mining Techniques in Medicine – Success Stories , 2008, 2008 International Conference on BioMedical Engineering and Informatics.

[19]  Humberto Bustince,et al.  Medical diagnosis of cardiovascular diseases using an interval-valued fuzzy rule-based classification system , 2014, Appl. Soft Comput..

[20]  Richard J. Povinelli,et al.  Time series data mining: identifying temporal patterns for characterization and prediction of time series events , 1999 .

[21]  P. Deepa Shenoy,et al.  Cancer Prognosis Prediction Model using Data Mining Techniques , 2014 .

[22]  Jharna Majumdar,et al.  Analysis and Detection of Diabetes Using Data Mining Techniques—A Big Data Application in Health Care , 2019, Emerging Research in Computing, Information, Communication and Applications.

[23]  Bin Hu,et al.  Electroencephalogram-based emotion assessment system using ontology and data mining techniques , 2015, Appl. Soft Comput..

[24]  Martin C. Müller,et al.  Model-based decision rules reduce the risk of molecular relapse after cessation of tyrosine kinase inhibitor therapy in chronic myeloid leukemia. , 2013, Blood.

[25]  Bruce G. Buchanan,et al.  Dendral and Meta-Dendral: Their Applications Dimension , 1978, Artif. Intell..

[26]  F. Segovia,et al.  Analysis of 18F-DMFP PET data using multikernel classification in order to assist the diagnosis of Parkinsonism , 2015, 2015 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC).

[27]  Dajun Song,et al.  The assessment of postural stability after ambulatory anesthesia: a comparison of desflurane with propofol. , 2002, Anesthesia and analgesia.

[28]  Dennis J. Sweeney,et al.  Quantitative Methods for Business , 1983 .

[29]  Himansu Sekhar Behera,et al.  Evolving Low Complex Higher Order Neural Network Based Classifiers for Medical Data Classification , 2016 .

[30]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[31]  Juan Alfonso Lara,et al.  Generating time series reference models based on event analysis , 2010, ECAI.

[32]  Frances M De Blasio,et al.  Event-related EEG time-frequency analysis and the Orienting Reflex to auditory stimuli. , 2012, Psychophysiology.

[33]  Lara Torralbo,et al.  Marco de Descubrimiento de Conocimiento para DatosEstructuralmente Complejos con Énfasis en el Análisis de Eventos en Series Temporales , 2011 .

[34]  K Lehnertz,et al.  Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: dependence on recording region and brain state. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  Frank Puppe Principles of Problem-Solving Methods , 1993 .

[36]  Anders Ringgaard Kristensen,et al.  A new decision support framework for managing foot-and-mouth disease epidemics , 2014, Ann. Oper. Res..

[37]  Aytürk Keles,et al.  Expert system based on neuro-fuzzy rules for diagnosis breast cancer , 2011, Expert Syst. Appl..

[38]  Bianca Zadrozny,et al.  A Bayesian network decision model for supporting the diagnosis of dementia, Alzheimer's disease and mild cognitive impairment , 2014, Comput. Biol. Medicine.

[39]  Kim Dremstrup,et al.  EMD-Based Temporal and Spectral Features for the Classification of EEG Signals Using Supervised Learning , 2016, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[40]  Sabyasachi Mukherjee,et al.  Malignant Mesothelioma Disease Diagnosis using Data Mining Techniques , 2018, Appl. Artif. Intell..

[41]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[42]  Sarinder Kaur A P Kashmir Singh A methodological review of data mining techniques in predictive medicine: An application in hemodynamic prediction for abdominal aortic aneurysm disease , 2014 .

[43]  S. Archana,et al.  Survey of Classification Techniques in Data Mining , 2014 .

[44]  Babita Pandey,et al.  Classification of EEG based Diseases using Data Mining , 2014 .

[45]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[46]  Uli K. Chettipally,et al.  Implementation of a Clinical Decision Support System for Children With Minor Blunt Head Trauma Who Are at Nonnegligible Risk for Traumatic Brain Injuries , 2019, Annals of emergency medicine.

[47]  Juan Alfonso Lara,et al.  Comparing Posturographic Time Series through Events Detection , 2008, 2008 21st IEEE International Symposium on Computer-Based Medical Systems.

[48]  F Alonso,et al.  Generating Reference Models for Structurally Complex Data , 2013, Methods of Information in Medicine.

[49]  R. Hemamalini,et al.  530-533- Grace Prince , 2015 .

[50]  R. Barry,et al.  A review of electrophysiology in attention-deficit/hyperactivity disorder: I. Qualitative and quantitative electroencephalography , 2003, Clinical Neurophysiology.

[51]  Janet L. Kolodner,et al.  Case-Based Reasoning , 1989, IJCAI 1989.

[52]  J. Martin,et al.  Assessing the quality of medical and health data from the 2003 birth certificate revision: results from two states. , 2013, National vital statistics reports : from the Centers for Disease Control and Prevention, National Center for Health Statistics, National Vital Statistics System.

[53]  Juan Alfonso Lara,et al.  A general framework for time series data mining based on event analysis: Application to the medical domains of electroencephalography and stabilometry , 2014, J. Biomed. Informatics.

[54]  Anneke Kleppe,et al.  MDA explained - the Model Driven Architecture: practice and promise , 2003, Addison Wesley object technology series.

[55]  M. Uludağ,et al.  Effect of Whole-Body Vibration on Balance Using Posturography and Balance Tests in Postmenopausal Women , 2015, American journal of physical medicine & rehabilitation.

[56]  Krzysztof J. Cios,et al.  Uniqueness of medical data mining , 2002, Artif. Intell. Medicine.

[57]  Juan Alfonso Lara,et al.  A UML profile for the conceptual modelling of structurally complex data: Easing human effort in the KDD process , 2014, Inf. Softw. Technol..

[58]  Saeid Nahavandi,et al.  Medical data classification using interval type-2 fuzzy logic system and wavelets , 2015, Appl. Soft Comput..

[59]  Simone M. Silva,et al.  Comparison between Decision Tree and Genetic Programming to distinguish healthy from stroke postural sway patterns , 2015, 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[60]  Jing Li,et al.  Application of data mining based on clinical medicine database , 2010, 2010 2nd International Conference on Signal Processing Systems.

[61]  Antonio Soriano Payá,et al.  Embedded system for diagnosing dysfunctions in the lower urinary tract , 2007, SAC '07.

[62]  Rakesh Kumar Khare,et al.  Decision Tree Classification based Decision Support System for Derma Disease , 2014 .

[63]  Anindya Bijoy Das,et al.  Discrimination and classification of focal and non-focal EEG signals using entropy-based features in the EMD-DWT domain , 2016, Biomed. Signal Process. Control..

[64]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[65]  Yongwoong Jeon,et al.  Development of a balance analysis system for early diagnosis of Parkinson's disease , 2015 .

[66]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[67]  Li Zhang An intelligent mobile based decision support system for retinal disease diagnosis , 2014 .

[68]  Miguel Ángel Guevara-López,et al.  Improving the Mann-Whitney statistical test for feature selection: An approach in breast cancer diagnosis on mammography , 2015, Artif. Intell. Medicine.

[69]  Juan Alfonso Lara,et al.  Data preparation for KDD through automatic reasoning based on description logic , 2014, Inf. Syst..

[70]  Ciril Groselj Data mining problems in medicine , 2002, Proceedings of 15th IEEE Symposium on Computer-Based Medical Systems (CBMS 2002).

[71]  Maia Angelova,et al.  Risk factors and prediction of very short term versus short/intermediate term post-stroke mortality: A data mining approach , 2014, Comput. Biol. Medicine.

[72]  Tan-Hsu Tan,et al.  Using K-Nearest Neighbor Classification to Diagnose Abnormal Lung Sounds , 2015, Sensors.

[73]  N. Radcliffe,et al.  Case-based approach to managing angle closure glaucoma with anterior segment imaging. , 2014, Canadian journal of ophthalmology. Journal canadien d'ophtalmologie.

[74]  Debjani Chakraborty,et al.  Learning scale-space representation of nucleus for accurate localization and segmentation of epithelial squamous nuclei in cervical smears , 2014, IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI).

[75]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[76]  Juan Alfonso Lara,et al.  Sensor-Generated Time Series Events: A Definition Language , 2012, Sensors.

[77]  F. Zeng,et al.  Discovery of Acupoints and Combinations with Potential to Treat Vascular Dementia: A Data Mining Analysis , 2015, Evidence-based complementary and alternative medicine : eCAM.

[78]  Jian-Bo Yang,et al.  A cooperative belief rule based decision support system for lymph node metastasis diagnosis in gastric cancer , 2015, Knowl. Based Syst..

[79]  Aboul Ella Hassanien,et al.  MRI breast cancer diagnosis hybrid approach using adaptive ant-based segmentation and multilayer perceptron neural networks classifier , 2014, Appl. Soft Comput..

[80]  Seyyed Abed Hosseini Epilepsy Recognition by Higher Order Spectra Analysis of EEG Signals , 2015 .

[81]  Wu He,et al.  Advances in Processing, Mining, and Learning Complex Data: From Foundations to Real-World Applications , 2018, Complex..

[82]  U. Rajendra Acharya,et al.  Automated EEG analysis of epilepsy: A review , 2013, Knowl. Based Syst..

[83]  Ya-Ju Fan,et al.  On the Time Series $K$-Nearest Neighbor Classification of Abnormal Brain Activity , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[84]  H. Adeli,et al.  Wavelet-based EEG processing for computer-aided seizure detection and epilepsy diagnosis , 2015, Seizure.

[85]  Ahmad Taher Azar,et al.  A novel hybrid feature selection method based on rough set and improved harmony search , 2015, Neural Computing and Applications.

[86]  Jianhui Wu,et al.  Differential Diagnosis Model of Hypocellular Myelodysplastic Syndrome and Aplastic Anemia Based on the Medical Big Data Platform , 2018, Complex..

[87]  L. Tarassenko,et al.  Lessons from the evaluation of a clinical decision support tool for cardiovascular disease risk management in rural India , 2015 .

[88]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[89]  Kenton R Kaufman,et al.  Significant reduction in risk of falls and back pain in osteoporotic-kyphotic women through a Spinal Proprioceptive Extension Exercise Dynamic (SPEED) program. , 2005, Mayo Clinic proceedings.

[90]  Ren Jiadong,et al.  A Comprehensive Looks at Data Mining Techniques Contributing to Medical Data Growth: A Survey of Researcher Reviews , 2018, Advances in Intelligent Systems and Computing.