Orthogonal rotations in latent semantic analysis: An empirical study

Abstract The Latent Semantic Analysis (LSA) literature has recently started to address the issue of interpretability of the extracted dimensions. On the software implementation front, recent versions of SAS Text Miner ® started incorporating Varimax rotations. Considering open source software such as R, when it comes to rotation procedures the user has many more options. However, there is a little work in providing guidance for selecting an appropriate rotation procedure. In this paper we further previous research on LSA rotations by introducing two well-known orthogonal rotations, namely Quartimax and Equamax, and comparing them to Varimax. We present a study that empirically tests the influence of the chosen orthogonal rotations on the extraction and interpretation of LSA factors. Our results indicate that, in most cases, Varimax and Equamax produce factors with similar interpretation, while Quartimax tends to produce a single factor. We conclude with recommendations on how these rotation procedures should be used and suggestions for future research. We note that orthogonal rotations can be used to improve the interpretability of other SVD-based models, such as COALS.

[1]  J. Carroll An analytical solution for approximating simple structure in factor analysis , 1953 .

[2]  Lucian L. Visinescu,et al.  Text-mining the voice of the people , 2012, Commun. ACM.

[3]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[4]  Chih-Ping Wei,et al.  A Latent Semantic Indexing-based approach to multilingual document clustering , 2008, Decis. Support Syst..

[5]  Weiss,et al.  Text Mining , 2010 .

[6]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[7]  Bojan Furlan,et al.  Semantic similarity of short texts in languages with a deficient natural language processing support , 2013, Decis. Support Syst..

[8]  D. R. Swanson Historical note: information retrieval and the future of an illusion , 1997 .

[9]  G. A. Ferguson,et al.  A general rotation criterion and its use in orthogonal rotation , 1970 .

[10]  Scott B. MacKenzie,et al.  Common method biases in behavioral research: a critical review of the literature and recommended remedies. , 2003, The Journal of applied psychology.

[11]  K. Jöreskog A general approach to confirmatory maximum likelihood factor analysis , 1969 .

[12]  Timothy T Rogers,et al.  Computational Models of Semantic Memory , 2022 .

[13]  Sun Park,et al.  Automatic generic document summarization based on non-negative matrix factorization , 2009, Inf. Process. Manag..

[14]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[15]  Charles Wrigley,et al.  Application of the quartimax method of rotation to Thurstone's primary mental abilities study , 1958 .

[16]  Liang-Chih Yu,et al.  Independent component analysis for near-synonym choice , 2013, Decis. Support Syst..

[17]  Kristof Coussement,et al.  Improving Customer Complaint Management by Automatic Email Classification Using Linguistic Style Features as Predictors , 2007 .

[18]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[19]  S A Mulaik,et al.  A Brief History of the Philosophical Foundations of Exploratory Factor Analysis. , 1987, Multivariate behavioral research.

[20]  J. Tukey,et al.  Multiple-Factor Analysis , 1947 .

[21]  Arthur C. Graesser,et al.  Strengths, Limitations, and Extensions of LSA , 2007 .

[22]  Danushka Bollegala,et al.  A preference learning approach to sentence ordering for multi-document summarization , 2012, Inf. Sci..

[23]  C. Spearman General intelligence Objectively Determined and Measured , 1904 .

[24]  N. Lackey,et al.  Making Sense of Factor Analysis: The Use of Factor Analysis for Instrument Development in Health Care Research , 2003 .

[25]  Goran Nenadic,et al.  Towards semi-automated curation: using text mining to recreate the HIV-1, human protein interaction database , 2012, Database J. Biol. Databases Curation.

[26]  Danielle S. McNamara,et al.  Handbook of latent semantic analysis , 2007 .

[27]  Anna Sidorova,et al.  Uncovering the Intellectual Core of the Information Systems Discipline , 2008, MIS Q..

[28]  J. O. Neuhaus,et al.  THE QUARTIMAX METHOD , 1954 .

[29]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[30]  Integrating the Implementation of Quartimax, Varimax, Oblimax, and Related Rotational Methods , 1962 .

[31]  James C. Wetherbe,et al.  An Empirical Comparison of Four Text Mining Methods , 2010, 2010 43rd Hawaii International Conference on System Sciences.

[32]  Victor R. Prybutok,et al.  Latent Semantic Analysis: five methodological recommendations , 2012, Eur. J. Inf. Syst..

[33]  Daniel Mirman,et al.  Attractor Dynamics and Semantic Neighborhood Density: Processing Is Slowed by near Neighbors and Speeded by Distant Neighbors We Thank Ann Kulikowski for Her Help with Data Collection And , 2022 .

[34]  Qing Cao,et al.  Exploring determinants of voting for the "helpfulness" of online user reviews: A text mining approach , 2011, Decis. Support Syst..

[35]  G. A. Ferguson,et al.  The concept of parsimony in factor analysis , 1954 .

[36]  M. Browne An Overview of Analytic Rotation in Exploratory Factor Analysis , 2001 .

[37]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.