On integrating clustering and statistical analysis for supporting cardiovascular disease diagnosis

Statistical analysis of medical data plays significant role in medical diagnostics development. However in many cases the statistics is not effective enough. In the paper we consider combining statistical inference with clustering in the preprocessing phase of data analysis. The proposed methodology is checked on cardiovascular data and used for developing methods of early diagnosis of hypertension in children. Experiments, conducted on the real data, have demonstrated that the proposed hybrid approach allowed to discover relationships which have not been identified by using only the statistical methods. We have observed approximately 30% growth in the number of correlations between diagnosed attributes. Moreover all the obtained statistically significant dependencies were stronger in clusters rather than in the whole datasets.

[1]  Sanjay Kalra,et al.  EXERCISE BASED REHABILITATION FOR HEART FAILURE , 2007 .

[2]  Xuehui Meng,et al.  Comparison of three data mining models for predicting diabetes or prediabetes by risk factors , 2013, The Kaohsiung journal of medical sciences.

[3]  Mike Thomas,et al.  Cluster analysis and clinical asthma phenotypes. , 2008, American journal of respiratory and critical care medicine.

[4]  Syed Umar Amin,et al.  Data Mining in Clinical Decision Support Systems for Diagnosis, Prediction and Treatment of Heart Disease , 2013 .

[5]  S. Ebrahim,et al.  Exercise based rehabilitation for heart failure. , 2004, The Cochrane database of systematic reviews.

[6]  Amir-Masoud Eftekhari-Moghadam,et al.  Knowledge discovery in medicine: Current issue and future trend , 2014, Expert Syst. Appl..

[7]  Francesc Figueras,et al.  Intrauterine growth restriction: new concepts in antenatal surveillance, diagnosis, and management. , 2011, American Journal of Obstetrics and Gynecology.

[8]  Yi-Hsin Wang,et al.  Mining Medical Data: A Case Study of Endometriosis , 2013, Journal of Medical Systems.

[9]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[10]  Peter C Austin,et al.  Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes. , 2013, Journal of clinical epidemiology.

[11]  Illhoi Yoo,et al.  Data Mining in Healthcare and Biomedicine: A Survey of the Literature , 2012, Journal of Medical Systems.

[12]  Bulusu Lakshmana Deekshatulu,et al.  Prediction of risk score for heart disease using associative classification and hybrid feature subset selection , 2012, 2012 12th International Conference on Intelligent Systems Design and Applications (ISDA).

[13]  B. Falkner,et al.  Cardiovascular Characteristics in Adolescents Who Develop Essential Hypertension , 1981, Hypertension.

[14]  Moumen T. El-Melegy,et al.  Model-wise and point-wise random sample consensus for robust regression and outlier detection , 2014, Neural Networks.

[15]  Chih-Ping Wei,et al.  Feature Selection for Medical Data Mining: Comparisons of Expert Judgment and Automatic Approaches , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[16]  Stephen Warwick Looney,et al.  4 Statistical Methods for Assessing Biomarkers and Analyzing Biomarker Data , 2007 .

[17]  Usman Qamar,et al.  An ensemble based decision support framework for intelligent heart disease diagnosis , 2014, International Conference on Information Society (i-Society 2014).

[18]  M. Shouman,et al.  Using data mining techniques in heart disease diagnosis and treatment , 2012, 2012 Japan-Egypt Conference on Electronics, Communications and Computers.

[19]  Agnieszka Wosiak,et al.  Myocardial dysfunction measured by tissue Doppler echocardiography in children with primary arterial hypertension. , 2015, Kardiologia polska.

[20]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[21]  Douglas G. Altman,et al.  Measurement in Medicine: The Analysis of Method Comparison Studies , 1983 .

[22]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[23]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[24]  Agnieszka Wosiak,et al.  Intra-uterine growth restriction as a risk factor for hypertension in children six to 10 years old , 2014, Cardiovascular journal of Africa.

[25]  Simon Briscoe,et al.  Exercise-based rehabilitation for heart failure. , 2014, The Cochrane database of systematic reviews.

[26]  Philip Miller,et al.  Essential Statistical Methods for Medical Statistics , 2011 .

[27]  Aboul Ella Hassanien,et al.  Fuzzy and hard clustering analysis for thyroid disease , 2013, Comput. Methods Programs Biomed..

[28]  J. Feber,et al.  Hypertension in children: new trends and challenges. , 2010, Clinical science.

[29]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[30]  D. Hinkle,et al.  Applied statistics for the behavioral sciences , 1979 .

[31]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[32]  Baoyan Liu,et al.  Real-world clinical data mining on TCM clinical diagnosis and treatment: A survey , 2012, 2012 IEEE 14th International Conference on e-Health Networking, Applications and Services (Healthcom).

[33]  Artur Polinski,et al.  Analysis of correlation between heart rate and blood pressure , 2011, 2011 Federated Conference on Computer Science and Information Systems (FedCSIS).