A Guide to Text Analysis with Latent Semantic Analysis in R with Annotated Code: Studying Online Reviews and the Stack Exchange Community

In this guide, we introduce researchers in the behavioral sciences in general and MIS in particular to text analysis as done with latent semantic analysis (LSA). The guide contains hands-on annotated code samples in R that walk the reader through a typical process of acquiring relevant texts, creating a semantic space out of them, and then projecting words, phrase, or documents onto that semantic space to calculate their lexical similarities. R is an open source, popular programming language with extensive statistical libraries. We introduce LSA as a concept, discuss the process of preparing the data, and note its potential and limitations. We demonstrate this process through a sequence of annotated code examples: we start with a study of online reviews that extracts lexical insight about trust. That R code applies singular value decomposition (SVD). The guide next demonstrates a realistically large data analysis of Stack Exchange, a popular Q&A site for programmers. That R code applies an alternative sparse SVD method. All the code and data are available on github.com.

[1]  Fred D. Davis Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology , 1989, MIS Q..

[2]  Straub,et al.  Editor's Comments: An Update and Extension to SEM Guidelines for Administrative and Social Science Research , 2011 .

[3]  Vlado Keselj,et al.  Text Similarity Using Google Tri-grams , 2012, Canadian Conference on AI.

[4]  Lucian L. Visinescu,et al.  Orthogonal rotations in latent semantic analysis: An empirical study , 2014, Decis. Support Syst..

[5]  Liang-Chih Yu,et al.  Independent component analysis for near-synonym choice , 2013, Decis. Support Syst..

[6]  Kenneth E. Shirley,et al.  LDAvis: A method for visualizing and interpreting topics , 2014 .

[7]  Lin-Chih Chen Building a term suggestion and ranking system based on a probabilistic analysis model and a semantic analysis graph , 2012, Decis. Support Syst..

[8]  Dirk Thorleuchter,et al.  Integrating expert knowledge and multilingual web crawling data in a lead qualification system , 2016, Decis. Support Syst..

[9]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[10]  Shimi Naurin Ahmad,et al.  Analyzing electronic word of mouth: A social commerce construct , 2017, Int. J. Inf. Manag..

[11]  Nicholas Evangelopoulos Thematic orientation of the ISJ within a semantic space of IS research , 2016, Inf. Syst. J..

[12]  I. Fischer You Just Don T Understand Women And Men In Conversation , 2016 .

[13]  Qing Cao,et al.  Exploring determinants of voting for the "helpfulness" of online user reviews: A text mining approach , 2011, Decis. Support Syst..

[14]  D. Tannen The Power of Talk: Who Gets Heard and Why. , 1995 .

[15]  Kristof Coussement,et al.  Improving Customer Complaint Management by Automatic Email Classification Using Linguistic Style Features as Predictors , 2007 .

[16]  Juan C. Valle-Lisboa,et al.  The uncovering of hidden structures by Latent Semantic Analysis , 2007, Inf. Sci..

[17]  Kai R. Larsen,et al.  Exploring the Semantic Validity of Questionnaire Scales , 2008, Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008).

[18]  Kristof Coussement,et al.  A Bayesian approach for incorporating expert opinions into decision support systems: A case study of online consumer-satisfaction detection , 2015, Decis. Support Syst..

[19]  Darrell Laham,et al.  From paragraph to graph: Latent semantic analysis for information visualization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Marie-Francine Moens,et al.  Highly discriminative statistical features for email classification , 2012, Knowledge and Information Systems.

[21]  Z. John Zhang,et al.  From Story Line to Box Office: A New Approach for Green-Lighting Movie Scripts , 2007, Manag. Sci..

[22]  Anna Sidorova,et al.  Uncovering the Intellectual Core of the Information Systems Discipline , 2008, MIS Q..

[23]  Jan vom Brocke,et al.  Text Mining For Information Systems Researchers: An Annotated Topic Modeling Tutorial , 2016, Commun. Assoc. Inf. Syst..

[24]  N. Lazar,et al.  The ASA Statement on p-Values: Context, Process, and Purpose , 2016 .

[25]  Jia Hao,et al.  Knowledge map-based method for domain knowledge browsing , 2014, Decis. Support Syst..

[26]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[27]  Michel Ballings,et al.  The added value of auxiliary data in sentiment analysis of Facebook posts , 2016, Decis. Support Syst..

[28]  Detmar W. Straub,et al.  Validation Guidelines for IS Positivist Research , 2004, Commun. Assoc. Inf. Syst..

[29]  Susan Michie,et al.  Behavior change interventions: the potential of ontologies for advancing science and practice , 2017, Journal of Behavioral Medicine.

[30]  L. Wittgenstein Philosophical investigations = Philosophische Untersuchungen , 1958 .

[31]  Chih-Ping Wei,et al.  A Latent Semantic Indexing-based approach to multilingual document clustering , 2008, Decis. Support Syst..

[32]  Marta Indulska,et al.  Quantitative approaches to content analysis: identifying conceptual drift across publication outlets , 2012, Eur. J. Inf. Syst..

[33]  Walter Kintsch,et al.  Predication , 2001, Cogn. Sci..

[34]  Rudy Hirschheim,et al.  Reflections on Information Systems Journal's thematic composition , 2016, Inf. Syst. J..

[35]  Detmar W. Straub,et al.  Validating Instruments in MIS Research , 1989, MIS Q..

[36]  Jan Ketil Arnulf,et al.  Predicting Survey Responses: How and Why Semantics Shape Survey Statistics on Organizational Behaviour , 2014, PloS one.

[37]  Kai R. Larsen,et al.  A Tool for Addressing Construct Identity in Literature Reviews and Meta-Analyses , 2016, MIS Q..

[38]  R. P. McDonald,et al.  Structural Equations with Latent Variables , 1989 .

[39]  Victor R. Prybutok,et al.  Latent Semantic Analysis: five methodological recommendations , 2012, Eur. J. Inf. Syst..

[40]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[41]  Detmar W. Straub,et al.  Validation in Information Systems Research: A State-of-the-Art Assessment , 2001, MIS Q..

[42]  Ricardo Colomo Palacios,et al.  SEMO: a framework for customer social networks analysis based on semantics , 2010, J. Inf. Technol..

[43]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[44]  Dirk S. Hovorka,et al.  Analyzing unstructured text data: Using latent categorization to identify intellectual communities in information systems , 2008, Decis. Support Syst..

[45]  Elizabeth León Guzman,et al.  Multidimensional analysis model for a document warehouse that includes textual measures , 2015, Decis. Support Syst..