Evaluation of a new k-means approach for exploratory clustering of items

For the exploratory analysis of survey data commonly the exploratory factor analysis (EFA) is used. However, EFA is known to exhibit some problems. The major mathematical issue is the factor indeterminacy. Further problems are for example its weak performance in small sample sizes (n ≤ 150) and with high cross-loadings (e.g. Guadagnoli & Velicer, 1988; Sass, 2010; Wayne F. Velicer & Fava, 1998) as well as the general issue of the underlying measurement model including uncorrelated residual variances, what may be difficult to justify (Cudeck & Henly, 1991; MacCallum & Tucker, 1991; R. C. Tryon, 1959). The authors suggest two new k-means approaches as an alternative: k-means scaled distance measure (sdm) where items are represented in a coordinate system in a way so that their distance is based on one minus their correlation; and k-means cor where item inter-correlations are directly taken as the coordinate points of the items. These approaches were tested in a resampling with two real data sets and a traditional Monte Carlo simulation, as well as in a cross validation using confirmatory factor analysis (CFA). For dimensionality assessment the cluster validity coefficient Silhouette was used. In either analysis these approaches were compared to existing cluster analysis approaches and EFA. The authors conclude that the main advantage of the new approaches are (a) that cluster scores are determinate and (b) for item assignment kmeans sdm obtains better results than EFA and other cluster analysis approaches. The authors therefor suggest to use a combination of EFA methods for dimensionality assessment and kmeans for item assignment.

[1]  R. Peterson A Meta-Analysis of Variance Accounted for and Factor Loadings in Exploratory Factor Analysis , 2000 .

[2]  Douglas B. Kell,et al.  Computational cluster validation in post-genomic data analysis , 2005, Bioinform..

[3]  Sanjay Mishra,et al.  Efficient theory development and factor retention criteria: Abandon the ‘eigenvalue greater than one’ criterion , 2008 .

[4]  B. Muthén,et al.  Exploratory Structural Equation Modeling , 2009 .

[5]  M. Chavent,et al.  ClustOfVar: An R Package for the Clustering of Variables , 2011, 1112.0295.

[6]  J. Hunter Methods of Reordering the Correlation Matrix to Facilitate Visual Inspection and Preliminary Cluster Analysis. , 1973 .

[7]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[8]  E. Vigneau,et al.  Clustering of Variables Around Latent Components , 2003 .

[9]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[10]  Wayne F. Velicer,et al.  The Relation Between Factor Score Estimates, Image Scores, and Principal Component Scores , 1976 .

[11]  J. Horn A rationale and test for the number of factors in factor analysis , 1965, Psychometrika.

[12]  Daniel A Sass,et al.  Please Scroll down for Article Multivariate Behavioral Research a Comparative Investigation of Rotation Criteria within Exploratory Factor Analysis , 2022 .

[13]  W. Velicer,et al.  Comparison of five rules for determining the number of components to retain. , 1986 .

[14]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[15]  Marie Chavent,et al.  ClustOfVar : an R package for dimension reduction via clustering of variables. Application in supervised classification and variable selection in gene expressions data , 2013 .

[16]  S J Henly,et al.  Model selection in covariance structures analysis and the "problem" of sample size: a clarification. , 1991, Psychological bulletin.

[17]  Duane T. Wegener,et al.  Evaluating the use of exploratory factor analysis in psychological research. , 1999 .

[18]  W. Arrindell,et al.  An Empirical Test of the Utility of the Observations-To-Variables Ratio in Factor and Components Analysis , 1985 .

[19]  Daniel A. Sass,et al.  Factor Loading Estimation Error and Stability Using Exploratory Factor Analysis , 2010 .

[20]  W. Velicer,et al.  Relation of sample size to the stability of component patterns. , 1988, Psychological bulletin.

[21]  W. Velicer,et al.  Affects of variable and subject sampling on factor pattern recovery. , 1998 .

[22]  D. Linden,et al.  Overlap between General Factors of Personality in the Big Five, Giant Three, and trait emotional intelligence , 2012 .

[23]  R. Tryon,et al.  A theory of psychological componentsan alternative to "mathematical factors." , 1935 .

[24]  R. Tryon Domain sampling formulation of cluster and factor analysis , 1959 .

[25]  D. Bacon An Evaluation of Cluster Analytic Approaches to Initial Model Specification , 2001 .

[26]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[27]  W Revelle,et al.  Hierarchical Cluster Analysis And The Internal Structure Of Tests. , 1979, Multivariate behavioral research.

[28]  Guy N. Brock,et al.  clValid , an R package for cluster validation , 2008 .

[29]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[30]  Daniel J. Mundfrom,et al.  Minimum Sample Size Recommendations for Conducting Factor Analyses , 2005 .

[31]  T. Schmitt Current Methodological Considerations in Exploratory and Confirmatory Factor Analysis , 2011 .

[32]  Wayne F. Velicer,et al.  Construct Explication through Factor or Component Analysis: A Review and Evaluation of Alternative Procedures for Determining the Number of Factors or Components , 2000 .

[33]  Stijn van Dongen,et al.  Metric distances derived from cosine similarity and Pearson and Spearman correlations , 2012, ArXiv.

[34]  K. Schweizer Classifying Variables on the Basis of Disaggregate Correlations. , 1991, Multivariate behavioral research.

[35]  Goldine C. Gleser,et al.  Maximizing the discriminating power of a multiple-score test , 1953 .

[36]  W. Revelle psych: Procedures for Personality and Psychological Research , 2017 .

[37]  Robert C. MacCallum,et al.  Representing sources of error in the common-factor model: Implications for theory and practice. , 1991 .

[38]  James H. Steiger,et al.  On the validity of indeterminate factor scores , 1978 .