Improving Text Analysis Using Sentence Conjunctions and Punctuation

User generated content in the form of customer reviews, blogs or tweets is an emerging and rich source of data for marketers. Topic models have been successfully applied to such data, demonstrating that empirical text analysis benefits greatly from a latent variable approach which summarizes high-level interactions among words. We propose a new topic model that allows for serial dependency of topics in text. That is, topics may carry over from word to word in a document, violating the bag-of-words assumption in traditional topic models. In our model, topic carry-over is informed by sentence conjunctions and punctuation. Typically, such observed information is eliminated prior to analyzing text data (i.e., “pre-processing”) because words such as “and” and “but” do not differentiate topics. We find that these elements of grammar contain information relevant to topic changes. We examine the performance of our model using multiple data sets and estab- lish boundary conditions for when our model leads to improved inference about customer evaluations. Implications and opportunities for future research are discussed.

[1]  Peter E. Rossi,et al.  Overcoming Scale Usage Heterogeneity , 2001 .

[2]  Peter S. Fader,et al.  A Cross-Cohort Changepoint Model for Customer-Base Analysis , 2016, Mark. Sci..

[3]  Varol Akman,et al.  Current approaches to punctuation in computational linguistics , 1996, Comput. Humanit..

[4]  G. Tellis,et al.  Mining Marketing Meaning from Online Chatter: Strategic Brand Analysis of Big Data Using Latent Dirichlet Allocation , 2014 .

[5]  Edoardo M. Airoldi,et al.  Improving and Evaluating Topic Models and Other Models of Text , 2016 .

[6]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[7]  Greg M. Allenby,et al.  Sentence-Based Text Analysis for Customer Reviews , 2016, Mark. Sci..

[8]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[9]  Wayne S. DeSarbo,et al.  Modeling Dynamic Effects in Repeated-Measures Experiments Involving Preference/Choice: An Illustration Involving Stated Preference Analysis , 2004 .

[10]  Oded Netzer,et al.  A Hidden Markov Model of Customer Relationship Dynamics , 2008, Mark. Sci..

[11]  James Allan,et al.  Capturing term dependencies using a language model based on sentence trees , 2002, CIKM '02.

[12]  S. Chib Estimation and comparison of multiple change-point models , 1998 .

[13]  Bruce G. S. Hardie,et al.  A Dynamic Changepoint Model for New Product Sales Forecasting , 2004 .

[14]  Jim Albert,et al.  Ordinal Data Modeling , 2000 .

[15]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[16]  Charles F. Meyer,et al.  A Linguistic Study of American Punctuation , 1987 .

[17]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[18]  Greg M. Allenby,et al.  The Dimensionality of Customer Satisfaction Survey Responses and Implications for Driver Analysis , 2013, Mark. Sci..