Revisiting French tomato data: Cluster analysis with incomplete data

Abstract The analysis of French tomato data from a 2004 Sensometrics workshop is revisited. The workshop posed two questions (1) are there consumer segments in hedonic data and (2) if there are segments, can they be characterized using consumer and tomato attributes. The challenge with the hedonic data is a large amount of missing data. “Probabilistic” solutions to the latter via multiple imputation are explored. In addition to more traditional methods, polynomial models are used to answer the second question.

[1]  R. Little Missing-Data Adjustments in Large Surveys , 1988 .

[2]  J L Schafer,et al.  Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective. , 1998, Multivariate behavioral research.

[3]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[4]  C. Spearman The proof and measurement of association between two things. By C. Spearman, 1904. , 1987, The American journal of psychology.

[5]  G. Glass,et al.  Statistical methods in education and psychology , 1970 .

[6]  Kristian Kleinke Multiple Imputation Under Violated Distributional Assumptions: A Systematic Evaluation of the Assumed Robustness of Predictive Mean Matching , 2017 .

[7]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[8]  Xavier Basagaña,et al.  A framework for multiple imputation in cluster analysis. , 2013, American journal of epidemiology.

[9]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[10]  Jérôme Pagès,et al.  Multiple factor analysis (AFMULT package) , 1994 .

[11]  F. Kianifard Applied Multivariate Data Analysis: Volume II: Categorical and Multivariate Methods , 1994 .

[12]  Douglas A. Wolfe,et al.  Nonparametric Statistical Methods , 1973 .

[13]  Marie Chavent,et al.  Handling Missing Values with Regularized Iterative Multiple Correspondence Analysis , 2011, Journal of Classification.

[14]  Michalis Vazirgiannis,et al.  On Clustering Validation Techniques , 2001, Journal of Intelligent Information Systems.

[15]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[16]  RICHARD C. DUBES,et al.  How many clusters are best? - An experiment , 1987, Pattern Recognit..

[17]  K. Wagstaff Clustering with Missing Values: No Imputation Required , 2004 .

[18]  L. Frank,et al.  Predictive mean matching imputation of semicontinuous variables , 2014 .

[19]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[20]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[21]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[22]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[23]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[24]  Adrian E. Raftery,et al.  Enhanced Model-Based Clustering, Density Estimation, and Discriminant Analysis Software: MCLUST , 2003, J. Classif..

[25]  J. Tukey,et al.  Transformations Related to the Angular and the Square Root , 1950 .

[26]  William H. Press,et al.  Numerical recipes in C , 2002 .

[27]  Paul D. McNicholas,et al.  Parsimonious Gaussian mixture models , 2008, Stat. Comput..

[28]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[29]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Cyrus R. Mehta,et al.  Exact Inference for Categorical Data , 2005 .

[31]  Dave Plaehn,et al.  An L-PLS preference cluster analysis on French consumer hedonics to fresh tomatoes , 2006 .

[32]  Ryan P. Browne,et al.  Handling missing data in consumer hedonic tests arising from direct scaling , 2016 .

[33]  Henri Luchian,et al.  A unifying criterion for unsupervised clustering and feature selection , 2011, Pattern Recognit..

[34]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[35]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[36]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[37]  L. A. Marascuilo,et al.  Nonparametric post hoc comparisons for trend. , 1967, Psychological bulletin.

[38]  Kurt Hornik,et al.  A Combination Scheme for Fuzzy Clustering , 2002, AFSS.

[39]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[40]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[41]  Oliver Rivero-Arias,et al.  Evaluation of software for multiple imputation of semi-continuous data , 2007, Statistical methods in medical research.

[42]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[43]  Harald Martens,et al.  Regression of a data matrix on descriptors of both its rows and of its columns via latent variables: L-PLSR , 2005, Comput. Stat. Data Anal..

[44]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[45]  G. Molenberghs,et al.  Clustering multiply imputed multivariate high‐dimensional longitudinal profiles , 2017, Biometrical journal. Biometrische Zeitschrift.

[46]  J. Josse,et al.  Handling missing values in multiple factor analysis , 2013 .

[47]  Ryan P. Browne,et al.  Product selection for liking studies: The sensory informed design , 2015 .

[48]  Luca Scrucca,et al.  mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models , 2016, R J..

[49]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .