K-means clustering of overweight and obese population using quantile-transformed metabolic data

Objective Use of K-means clustering for big data technology to cluster an overweight and obese population metabolically. Methods K-means clustering with the help of quantile transformation of attribute values was applied to overcome the impact of the considerable variation in the values of obesity attributes involving outliers and skewed distribution. Results Overall, 447 subjects were categorized into six clusters; metabolically normal, mild, and severe categories. There were clearly separated metabolically normal Cluster 1 and severe Cluster 2, as well as intermediate Cluster 3, 4, and 5 that had profiles of fewer attributes with abnormal values. Cluster 3 was characteristic of sole hypertension. Cluster 3 and 4 exhibited contrasting HDL-C and LDL-C levels despite similarly elevated total cholesterol. Cluster 6 with slightly elevated triglyceride was closest to the normal group. Four- and 10-quantile-transformations yielded consistent clustering results. Compared with the original data, the quantile-transformed data produced more regular and spherical clusters and evenly distributed clusters in terms of object numbers. Conclusions This big data analysis strategy makes use of quantile-transformation of data to overcome the issue of outliers and the irregular distribution and applies to the analysis of other non-communicable diseases.

[1]  Obesity Expert Panel Expert panel report: Guidelines (2013) for the management of overweight and obesity in adults , 2014, Obesity.

[2]  S. Czernichow,et al.  Metabolically Healthy Obesity and Risk of Mortality: Does the Definition of Metabolic Health Matter? Diabetes Care 2013; 36: 2294-2300 RESPONSE , 2014 .

[3]  H. J. Yoo,et al.  Higher mortality in metabolically obese normal‐weight people than in metabolically healthy obese subjects in elderly Koreans , 2013, Clinical endocrinology.

[4]  Alan D. Lopez,et al.  Global, regional, and national prevalence of overweight and obesity in children and adults during 1980–2013: a systematic analysis for the Global Burden of Disease Study 2013 , 2014, The Lancet.

[5]  Milos Hauskrecht,et al.  Multivariate Conditional Outlier Detection and Its Clinical Application , 2016, AAAI.

[6]  A. Kassambara,et al.  Extract and Visualize the Results of Multivariate Data Analyses [R package factoextra version 1.0.7] , 2020 .

[7]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[8]  Xiuzhen Huang,et al.  A practical comparison of two K-Means clustering algorithms , 2008, BMC Bioinformatics.

[9]  S. Menini,et al.  Metabolically healthy versus metabolically unhealthy obesity. , 2019, Metabolism: clinical and experimental.

[10]  P. Deedwania,et al.  Obesity is rarely healthy. , 2018, The lancet. Diabetes & endocrinology.

[11]  D. Hand,et al.  Advising on research methods: A consultant's companion , 2011 .

[12]  Ellen Harper,et al.  Can Big Data Transform Electronic Health Records Into Learning Health Systems? , 2014, Nursing Informatics.

[13]  K. Borgwardt,et al.  Machine Learning in Medicine , 2015, Mach. Learn. under Resour. Constraints Vol. 3.

[14]  Wenfeng Li,et al.  A User-Adaptive Algorithm for Activity Recognition Based on K-Means Clustering, Local Outlier Factor, and Multivariate Gaussian Distribution , 2018, Sensors.

[15]  Rob J Hyndman,et al.  Sample Quantiles in Statistical Packages , 1996 .

[16]  S. Blair,et al.  The intriguing metabolically healthy but obese phenotype: cardiovascular prognosis and role of fitness. , 2013, European heart journal.

[17]  A. De Lorenzo,et al.  "Metabolically Healthy" Obesity: Fact or Threat? , 2017, Current diabetes reviews.

[18]  K. Donato,et al.  Body mass index and the prevalence of hypertension and dyslipidemia. , 2000, Obesity research.

[19]  Alan D. Lopez,et al.  The Global Burden of Disease Study , 2003 .

[20]  Analyzing the Duration of Untreated Psychosis: Quantile Regression. , 2016, JAMA psychiatry.

[21]  Michelle A Morris,et al.  How has big data contributed to obesity research? A review of the literature , 2018, International Journal of Obesity.

[22]  S. Czernichow,et al.  Metabolically Healthy Obesity and Risk of Mortality , 2013, Diabetes Care.

[23]  F. Hu,et al.  Metabolically healthy obesity: epidemiology, mechanisms, and clinical implications. , 2013, The lancet. Diabetes & endocrinology.

[24]  Vernon Gayle,et al.  The role of administrative data in the big data revolution in social science research. , 2016, Social science research.

[25]  Yuan Luo,et al.  Big Data and Data Science in Critical Care. , 2018, Chest.

[26]  A. Mokdad,et al.  Waist-to-thigh ratio and diabetes among US adults: the Third National Health and Nutrition Examination Survey. , 2010, Diabetes research and clinical practice.