K-Means and K-Medoids: Cluster Analysis on Birth Data Collected in City Muzaffarabad, Kashmir

In the field of medical, each and every analysis is decisive as the study links to life of the subject under observation. One of the most vital area in the field of medical is the healthcare of expecting women in low income countries. High mortality rate due to increased number of caesarean section is evident because of poor medical infrastructure in the region, misunderstood religious teachings, low education and lack of proper decision making at the right time. The root cause analysis of situations demanding caesarean section is a tough job, however in the presence of historical data, one may extract useful information that will help supporting a medical decision by predicting the outcome. It is obvious that regional disparities have a huge impact on the residents of that region. A study performed on any region cannot be all applicable to the residents of some other distant region. This motive has established grounds to conduct a local study upon the data collected from expecting women in city Muzaffarabad, Kashmir. It is believed that the findings of this study will be significant for women that share more or less similar physical, social and maternal traits. Keeping this in mind, study presents an analysis of two clustering techniques for the investigation of appropriate algorithm that groups data into relevant clusters robustly. Firstly, we analyzed K-means and K-medoids algorithms’ capability to cluster the data using different distance metrics. Secondly, data transformation techniques including scale, range and Yeo-Johnson are applied. Finally, transformed data are used in K-means and K-medoids algorithms’ to generate cluster accuracy. It is observed that the results produced from transformed data are better than using raw data. Yeo-Johnson transformation method is found best for k-means (Hartigan & Wang), K-medoids (SEV distance function) and Rank k-medoids (SEV distance function) with mean accuracy 67.58%, 69.58% and 72.64% respectively.

[1]  Farahnaz Sadoughi,et al.  Ranked k-medoids: A fast and accurate rank-based partitioning algorithm for clustering large datasets , 2013, Knowl. Based Syst..

[2]  Jonathan Chang,et al.  Using Unsupervised Clustering to Identify Pregnancy Co-Morbidities. , 2019, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[3]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[4]  J. Cecatti,et al.  Cluster analysis identifying clinical phenotypes of preterm birth and related maternal and neonatal outcomes from the Brazilian Multicentre Study on Preterm Birth , 2019, International journal of gynaecology and obstetrics: the official organ of the International Federation of Gynaecology and Obstetrics.

[5]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[6]  Ahmed Alsayat,et al.  Efficient genetic K-Means clustering for health care knowledge discovery , 2016, 2016 IEEE 14th International Conference on Software Engineering Research, Management and Applications (SERA).

[7]  Srijana Pandey,et al.  Socio-economic and Demographic Determinants of Antenatal Care Services Utilization in Central Nepal , 2014, International journal of MCH and AIDS.

[8]  Majaz Moonis,et al.  Stroke Subtype Clustering by Multifractal Bayesian Denoising with Fuzzy C Means and K-Means Algorithms , 2018, Complex..

[9]  Inderveer Chana,et al.  A survey of clustering techniques for big data analysis , 2014, 2014 5th International Conference - Confluence The Next Generation Information Technology Summit (Confluence).

[10]  Amit Kumar Kar,et al.  A Comparative Study & Performance Evaluation of Different Clustering Techniques in Data Mining , 2016 .

[11]  Z M Kesuma,et al.  Maternal health care in Aceh Province: cluster analysis results , 2018, Journal of Physics: Conference Series.

[12]  Stuart R Lipsitz,et al.  Relationship Between Cesarean Delivery Rate and Maternal and Neonatal Mortality. , 2015, JAMA.

[13]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[14]  Dragan Gamberger,et al.  Homogeneous clusters of Alzheimer’s disease patient population , 2016, Biomedical engineering online.

[15]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[16]  Lior Rokach,et al.  Introduction to Recommender Systems Handbook , 2011, Recommender Systems Handbook.

[17]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[18]  D. Lyon,et al.  Heart failure symptom clusters and functional status - a cross-sectional study. , 2015, Journal of advanced nursing.

[19]  Rasmus Grønfeldt Winther,et al.  Prisoners of Abstraction? The Theory and Measure of Genetic Variation, and the Very Concept of “Race” , 2013 .

[20]  G. Acharya,et al.  Differential placental gene expression in severe preeclampsia. , 2009, Placenta.

[21]  Catherine J Ryan,et al.  Symptoms across the continuum of acute coronary syndromes: differences between women and men. , 2008, American journal of critical care : an official publication, American Association of Critical-Care Nurses.

[22]  Abdul Majid,et al.  Performance Analysis of Classification Algorithms on Birth Dataset , 2020, IEEE Access.

[23]  Yuehua Cui,et al.  Screening high-risk clusters for developing birth defects in mothers in Shanxi Province, China: application of latent class cluster analysis , 2015, BMC Pregnancy and Childbirth.

[24]  Brian J. Cox,et al.  Unsupervised Placental Gene Expression Profiling Identifies Clinically Relevant Subclasses of Human Preeclampsia , 2016, Hypertension.

[25]  Chitra Dorai,et al.  Shape spectra based view grouping for free-form objects , 1995, Proceedings., International Conference on Image Processing.

[26]  Greg R. Alexander,et al.  Clustering of U.S. Women Receiving No Prenatal Care: Differences in Pregnancy Outcomes and Implications for Targeting Interventions , 2005, Maternal and Child Health Journal.

[27]  Robert Tibshirani,et al.  Hybrid hierarchical clustering with applications to microarray data. , 2005, Biostatistics.

[28]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[29]  Richard A. Johnson,et al.  A new family of power transformations to improve normality or symmetry , 2000 .

[30]  Alexander Schliep,et al.  Clustering cancer gene expression data: a comparative study , 2008, BMC Bioinformatics.

[31]  Krešimir Šolić,et al.  Cluster analysis as a prediction tool for pregnancy outcomes. , 2015, Collegium antropologicum.

[32]  Shuyu Chen,et al.  A hybrid prediction model for type 2 diabetes using K-means and decision tree , 2017, 2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS).

[33]  F. Guignard,et al.  A Gendered Bun in the Oven. The Gender-reveal Party as a New Ritualization during Pregnancy , 2015 .

[34]  I. Kalule-Sabiti,et al.  The effect of socio–demographic factors on the utilization of maternal health care services in Uganda , 2014 .

[35]  Anil K. Jain,et al.  Learning prototypes for online handwritten digits , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[36]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[37]  K. Espy,et al.  A new look at quantifying tobacco exposure during pregnancy using fuzzy clustering. , 2011, Neurotoxicology and teratology.

[38]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[39]  Samuel Danso,et al.  Does facility birth reduce maternal and perinatal mortality in Brong Ahafo, Ghana? A secondary analysis using data on 119 244 pregnancies from two cluster-randomised controlled trials , 2019, The Lancet. Global health.

[40]  Wendy R. Fox,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1991 .

[41]  Rabia Riaz,et al.  Cause Analysis of Caesarian Sections and Application of Machine Learning Methods for Classification of Birth Data , 2018, IEEE Access.

[42]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[43]  J. Caers,et al.  Stochastic Simulation of Patterns Using Distance-Based Pattern Modeling , 2010 .

[44]  Taiwo Oladipupo Ayodele,et al.  Types of Machine Learning Algorithms , 2010 .

[45]  Enayetur Raheem,et al.  Regional disparities in maternal and child health indicators: Cluster analysis of districts in Bangladesh , 2019, PloS one.

[46]  S. Singh,et al.  Endocrine regulation in asymmetric intrauterine fetal growth retardation , 2006, The journal of maternal-fetal & neonatal medicine : the official journal of the European Association of Perinatal Medicine, the Federation of Asia and Oceania Perinatal Societies, the International Society of Perinatal Obstetricians.

[47]  Agnieszka Wosiak,et al.  Intra-uterine growth restriction as a risk factor for hypertension in children six to 10 years old , 2014, Cardiovascular journal of Africa.

[48]  Anil K. Jain,et al.  A self-organizing network for hyperellipsoidal clustering (HEC) , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[49]  K. S. Shreedhara,et al.  Biometric measurement and classification of IUGR using neural networks , 2014, 2014 International Conference on Contemporary Computing and Informatics (IC3I).