Exploratory Item Classification Via Spectral Graph Clustering

Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class analysis, often induce a high computational overhead and have difficulty handling missing data, especially in the presence of high-dimensional responses. In this article, the authors propose a spectral clustering algorithm for exploratory item cluster analysis. The method is computationally efficient, effective for data with missing or incomplete responses, easy to implement, and often outperforms traditional clustering algorithms in the context of high dimensionality. The spectral clustering algorithm is based on graph theory, a branch of mathematics that studies the properties of graphs. The algorithm first constructs a graph of items, characterizing the similarity structure among items. It then extracts item clusters based on the graphical structure, grouping similar items together. The proposed method is evaluated through simulations and an application to the revised Eysenck Personality Questionnaire.

[1]  Zhiliang Ying,et al.  Latent Variable Selection for Multidimensional Item Response Theory Models via $$L_{1}$$L1 Regularization , 2016 .

[2]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[3]  Chia-Yi Chiu,et al.  Cluster Analysis for Cognitive Diagnosis: Theory and Applications , 2009 .

[4]  J. Horn A rationale and test for the number of factors in factor analysis , 1965, Psychometrika.

[5]  M. Reckase Multidimensional Item Response Theory , 2009 .

[6]  Jingchen Liu,et al.  Theory of the Self-learning Q-Matrix. , 2010, Bernoulli : official journal of the Bernoulli Society for Mathematical Statistics and Probability.

[7]  Jinming Zhang,et al.  Conditional Covariance Theory and Detect for Polytomous Items , 2004 .

[8]  Y. Park Diagnostic cluster analysis of mathematics skills , 2011 .

[9]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[10]  Eugenia Stoimenova,et al.  Applied Nonparametric Statistical Methods , 2010 .

[11]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data: Methods and Models , 2009 .

[12]  Raymond B. Cattell,et al.  Handbook of multivariate experimental psychology , 1968 .

[13]  Jason W. Osborne,et al.  Best practices in exploratory factor analysis: four recommendations for getting the most from your analysis. , 2005 .

[14]  R. Forthofer,et al.  Rank Correlation Methods , 1981 .

[15]  F. Borgen,et al.  Applying Cluster Analysis in Counseling Psychology Research. , 1987 .

[16]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[17]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[18]  D. C. Howell Statistical Methods for Psychology , 1987 .

[19]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[20]  R K Blashfield,et al.  The Growth Of Cluster Analysis: Tryon, Ward, And Johnson. , 1980, Multivariate behavioral research.

[21]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[22]  Jinming Zhang,et al.  A Procedure for Dimensionality Analyses of Response Data from Various Test Designs , 2013, Psychometrika.

[23]  Alberto Maydeu-Olivares,et al.  Item diagnostics in multivariate discrete data. , 2015, Psychological methods.

[24]  M. Kendall,et al.  Rank Correlation Methods (5th ed.). , 1992 .

[25]  John A. Johnson,et al.  The international personality item pool and the future of public-domain personality measures ☆ , 2006 .

[26]  Wayne S. DeSarbo,et al.  Simple and Weighted Unfolding Threshold Models for the Spatial Representation of Binary Choice Data , 1986 .

[27]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[28]  Roger K. Blashfield,et al.  The Methods and Problems of Cluster Analysis , 1988 .

[29]  Goldine C. Gleser,et al.  Maximizing the discriminating power of a multiple-score test , 1953 .

[30]  William Stout,et al.  The theoretical detect index of dimensionality and its application to approximate simple structure , 1999 .

[31]  Raymond B. Cattell,et al.  The Scientific Analysis of Personality and Motivation , 1977 .

[32]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[33]  William Stout,et al.  Using New Proximity Measures With Hierarchical Cluster Analysis to Detect Multidimensionality , 1998 .

[34]  Zoubin Ghahramani,et al.  Spectral Methods for Automatic Multiscale Data Clustering , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[35]  William Revelle,et al.  An overview of the psych package , 2009 .

[36]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[37]  V. N. Bogaevski,et al.  Matrix Perturbation Theory , 1991 .

[38]  P M Bentler,et al.  A two-stage estimation of structural equation models with continuous and polytomous variables. , 1995, The British journal of mathematical and statistical psychology.

[39]  Chia-Yi Chiu,et al.  A Nonparametric Approach to Cognitive Diagnosis by Proximity to Ideal Response Patterns , 2013, J. Classif..

[40]  Jingchen Liu,et al.  Data-Driven Learning of Q-Matrix , 2012, Applied psychological measurement.

[41]  Jinchuan Xing,et al.  Toward a more uniform sampling of human genetic diversity: a survey of worldwide populations by high-density genotyping. , 2010, Genomics.

[42]  Z. Ying,et al.  Statistical Analysis of Q-Matrix Based Diagnostic Classification Models , 2015, Journal of the American Statistical Association.

[43]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data , 2009 .

[44]  Richard G. Baraniuk,et al.  k-POD: A Method for k-Means Clustering of Missing Data , 2014, 1411.7013.

[45]  M. R. Brito,et al.  Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection , 1997 .

[46]  Jingchen Liu,et al.  A Fused Latent and Graphical Model for Multivariate Binary Data , 2016, 1606.08925.

[47]  Hans J. Eysenck,et al.  Personality, genetics, and behavior : selected papers , 1982 .

[48]  J. S. Roberts,et al.  A General Item Response Theory Model for Unfolding Unidimensional Polytomous Responses , 2000 .