Exploring statistical approaches to diminish subjectivity of cluster analysis to derive dietary patterns: The Tomorrow Project.

Dietary patterns derived by cluster analysis are commonly reported with little information describing how decisions are made at each step of the analytical process. Using food frequency questionnaire data obtained in 2001-2007 on Albertan men (n = 6,445) and women (n = 10,299) aged 35-69 years, the authors explored the use of statistical approaches to diminish the subjectivity inherent in cluster analysis. Reproducibility of cluster solutions, defined as agreement between 2 cluster assignments, by 3 clustering methods (Ward's minimum variance, flexible beta, K means) was evaluated. Ratios of between- versus within-cluster variances were examined, and health-related variables across clusters in the final solution were described. K means produced cluster solutions with the highest reproducibility. For men, 4 clusters were chosen on the basis of ratios of between- versus within-cluster variances, but for women, 3 clusters were chosen on the basis of interpretability of cluster labels and descriptive statistics. In comparison with those in other clusters, men and women in the "healthy" clusters by greater proportions reported normal body mass index, smaller waist circumference, and lower energy intakes. The authors' approach appeared helpful when choosing the clustering method for both sexes and the optimal number of clusters for men, but additional analyses are required to understand why it performed differently for women.

[1]  W. Willett,et al.  Anthropometric Measures and Body Composition , 1998 .

[2]  A. Prevost,et al.  Dietary patterns and their associations with demographic, lifestyle and health variables in a random sample of British adults , 1996, British Journal of Nutrition.

[3]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[4]  J F Sallis,et al.  Compendium of physical activities: classification of energy costs of human physical activities. , 1993, Medicine and science in sports and exercise.

[5]  G. W. Milligan,et al.  A Review Of Monte Carlo Tests Of Cluster Analysis. , 1981, Multivariate behavioral research.

[6]  Heather K. Neilson,et al.  Practice of Epidemiology Reliability and Validity of the Past Year Total Physical Activity Questionnaire , 2005 .

[7]  T. Sørensen,et al.  Food intake patterns and body mass index in observational studies , 2001, International Journal of Obesity.

[8]  Denis Muller,et al.  Dietary patterns and changes in body mass index and waist circumference in adults. , 2003, The American journal of clinical nutrition.

[9]  R. Dam New approaches to the study of dietary patterns , 2005 .

[10]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[11]  Michael J Gibney,et al.  Comparison of cluster and principal component analysis techniques to derive dietary patterns in Irish adults , 2008, British Journal of Nutrition.

[12]  K. Tucker,et al.  Are dietary patterns useful for understanding the role of diet in chronic disease? , 2001, The American journal of clinical nutrition.

[13]  Katherine Tucker,et al.  Commentary: Dietary patterns in transition can inform health risk, but detailed assessments are needed to guide recommendations. , 2007, International journal of epidemiology.

[14]  Jean M Kerver,et al.  Dietary patterns associated with risk factors for cardiovascular disease in healthy US adults. , 2003, The American journal of clinical nutrition.

[15]  Maurice G. Kendall,et al.  The advanced theory of statistics , 1945 .

[16]  Brian Everitt,et al.  Cluster analysis , 1974 .

[17]  A. Wolk,et al.  Food patterns and cardiovascular disease risk factors: the Swedish INTERGENE research program. , 2008, The American journal of clinical nutrition.

[18]  Katherine L Tucker,et al.  Empirically derived eating patterns using factor or cluster analysis: a review. , 2004, Nutrition reviews.

[19]  Salvatore Panico,et al.  Dietary patterns among older Europeans: the EPIC-Elderly study , 2005, British Journal of Nutrition.

[20]  Denis Muller,et al.  Associations of empirically derived eating patterns with plasma lipid biomarkers: a comparison of factor and cluster analysis methods. , 2004, The American journal of clinical nutrition.

[21]  M. Schulze,et al.  Methodological approaches to study dietary patterns in relation to risk of coronary heart disease and stroke , 2006, British Journal of Nutrition.

[22]  L. Hubert,et al.  Comparing partitions , 1985 .

[23]  A F Subar,et al.  Associations between food patterns defined by cluster analysis and colorectal cancer incidence in the NIH–AARP diet and health study , 2009, European Journal of Clinical Nutrition.

[24]  B E Ainsworth,et al.  Compendium of physical activities: an update of activity codes and MET intensities. , 2000, Medicine and science in sports and exercise.

[25]  Victor Kipnis,et al.  Comparing 3 dietary pattern methods--cluster analysis, factor analysis, and index analysis--With colorectal cancer risk: The NIH-AARP Diet and Health Study. , 2010, American journal of epidemiology.

[26]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[27]  G. Berglund,et al.  Food patterns defined by cluster analysis and their utility as dietary exposure variables: a report from the Malmö Diet and Cancer Study , 2000, Public Health Nutrition.

[28]  Dennis G. Fisher,et al.  The adjusted rand statistic: A SAS macro , 1988 .

[29]  G. W. Milligan,et al.  An examination of the effect of six types of error perturbation on fifteen clustering algorithms , 1980 .

[30]  P. Robson,et al.  Population-based cohort development in Alberta, Canada: a feasibility study. , 2006, Chronic diseases in Canada.

[31]  M. Marmot,et al.  Socioeconomic differences in dietary patterns among middle-aged men and women. , 2003, Social science & medicine.

[32]  A. Kant,et al.  Dietary patterns and health outcomes. , 2004, Journal of the American Dietetic Association.

[33]  G. W. Milligan,et al.  A Study of the Beta-Flexible Clustering Method. , 1989, Multivariate behavioral research.

[34]  J. Stanton Pattern Analysis in Nutrition: A Review , 1986 .

[35]  C. Friedenreich,et al.  Adaptation and evaluation of the National Cancer Institute's Diet History Questionnaire and nutrient database for Canadian populations , 2007, Public Health Nutrition.

[36]  Carla K. Miller,et al.  Comparative strategies for using cluster analysis to assess dietary patterns. , 2006, Journal of the American Dietetic Association.

[37]  J. Stanton,et al.  Pattern analysis in nutrition , 1986 .

[38]  G. W. Milligan,et al.  A study of standardization of variables in cluster analysis , 1988 .

[39]  E. Riboli,et al.  Tracing the Mediterranean diet through principal components and cluster analyses in the Greek population , 2003, European Journal of Clinical Nutrition.

[40]  R. Dam,et al.  New approaches to the study of dietary patterns. , 2005 .

[41]  M. Schulze,et al.  Dietary patterns and their association with food and nutrient intake in the European Prospective Investigation into Cancer and Nutrition (EPIC)–Potsdam study , 2001, British Journal of Nutrition.

[42]  M. Schulze,et al.  Can dietary patterns help us detect diet–disease associations? , 2005, Nutrition Research Reviews.

[43]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[44]  Francisco Azuaje,et al.  A cluster validity framework for genome expression data , 2002, Bioinform..

[45]  T. Sørensen,et al.  Consistency of food intake factors by different dietary assessment methods and population groups , 2003, British Journal of Nutrition.

[46]  Frank B. Hu,et al.  Dietary pattern analysis: a new direction in nutritional epidemiology , 2002, Current opinion in lipidology.

[47]  J. Freudenheim,et al.  Analysis of patterns of food intake in nutritional epidemiology: food classification in principal components analysis and the subsequent impact on estimates for endometrial cancer , 2001, Public Health Nutrition.