Causal Relationship Detection in Archival Collections of Product Reviews for Understanding Technology Evolution

Technology progress is one of the key reasons behind today's rapid changes in lifestyles. Knowing how products and objects evolve can not only help with understanding the evolutionary patterns in our society but can also provide clues on effective product design and can offer support for predicting the future. We propose a general framework for analyzing technology's impact on our lives through detecting cause--effect relationships, where causes represent changes in technology while effects are changes in social life, such as new activities or new ways of using products. We address the challenge of viewing technology evolution through the “social impact lens” by mining causal relationships from the long-term collections of product reviews. In particular, we first propose dividing vocabulary into two groups: terms describing product features (called physical terms) and terms representing product usage (called conceptual terms). We then search for two kinds of changes related to the appearance of terms: frequency-based and context-based changes. The former indicate periods when a word was significantly more frequently used, whereas the latter indicate periods of high change in the word's context. Based on the detected changes, we then search for causal term pairs such that the change in the physical term triggers the change in the conceptual term. We next extend our approach to finding causal relationships between word groups such as a group of words representing the same technology and causing a given conceptual change or group of words representing two different technologies that simultaneously “co-cause” a conceptual change. We conduct experiments on different product types using the Amazon Product Review Dataset, which spans 1995 to 2013, and we demonstrate that our approaches outperform state-of-the-art baselines.

[1]  Gerhard Weikum,et al.  Entity timelines: visual analytics and named entity evolution , 2011, CIKM '11.

[2]  Thomas Risse,et al.  NEER: An Unsupervised Method for Named Entity Evolution Recognition , 2012, COLING.

[3]  Idris A. Eckley,et al.  changepoint: An R Package for Changepoint Analysis , 2014 .

[4]  P. Gilbert Combining var estimation and state space model reduction for simple good predictions , 1995 .

[5]  James Allan,et al.  Introduction to topic detection and tracking , 2002 .

[6]  Dan I. Moldovan,et al.  Mining Answers for Causation Questions , 2002 .

[7]  Michael I. Jordan,et al.  Learning graphical models for stationary time series , 2004, IEEE Transactions on Signal Processing.

[8]  Paramita Mirza,et al.  Extracting Temporal and Causal Relations between Events , 2014, ACL.

[9]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[10]  Jure Leskovec,et al.  Hidden factors and hidden topics: understanding rating dimensions with review text , 2013, RecSys.

[11]  George Hripcsak,et al.  Methodological Review: A review of causal inference for biomedical informatics , 2011 .

[12]  Gerhard Weikum,et al.  Bridging the Terminology Gap in Web Archive Search , 2009, WebDB.

[13]  Oren Etzioni,et al.  Extracting Product Features and Opinions from Reviews , 2005, HLT.

[14]  S. Payson,et al.  Product Evolution: What It Is and How It Can Be Measured , 1995 .

[15]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[16]  Bud Mishra,et al.  The Temporal Logic of Causal Structures , 2009, UAI.

[17]  Kjetil Nørvåg,et al.  Exploiting time-based synonyms in searching document archives , 2010, JCDL '10.

[18]  Adam Jatowt,et al.  Detecting Evolution of Concepts based on Cause-Effect Relationships in Online Reviews , 2016, WWW.

[19]  Dan I. Moldovan,et al.  Causal Relation Extraction , 2008, LREC.

[20]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[21]  Xin Tong,et al.  TextFlow: Towards Better Understanding of Evolving Topics in Text , 2011, IEEE Transactions on Visualization and Computer Graphics.

[22]  B. Sax Karl Marx's theory of history: A defense , 1985 .

[23]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[24]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[25]  Vasudeva Varma,et al.  Modeling the evolution of product entities , 2014, SIGIR.

[26]  Roxana Gîrju,et al.  Automatic Detection of Causal Relations for Question Answering , 2003, ACL 2003.

[27]  Sourav S. Bhowmick,et al.  Omnia Mutantur, Nihil Interit: Connecting Past with Present by Finding Corresponding Terms across Time , 2015, ACL.

[28]  Gerhard Weikum,et al.  Incorporating terminology evolution for query translation in text retrieval with association rules , 2010, CIKM '10.

[29]  Zoubin Ghahramani,et al.  Learning Dynamic Bayesian Networks , 1997, Summer School on Neural Networks.

[30]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[31]  JatowtAdam,et al.  Causal Relationship Detection in Archival Collections of Product Reviews for Understanding Technology Evolution , 2016 .

[32]  Dan I. Moldovan,et al.  Text Mining for Causal Relations , 2002, FLAIRS.

[33]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[34]  Kira Radinsky,et al.  Learning causality for news events prediction , 2012, WWW.

[35]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[36]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[37]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[38]  Syin Chan,et al.  Extracting Causal Knowledge from a Medical Database Using Graphical Patterns , 2000, ACL.

[39]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[40]  Steven L. Scott,et al.  Inferring causal impact using Bayesian structural time-series models , 2015, 1506.00356.

[41]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[42]  Jerry R. Hobbs Toward a Useful Concept of Causality for Lexical Semantics , 2005, J. Semant..

[43]  Matthew Richardson,et al.  Towards Decision Support and Goal Achievement: Identifying Action-Outcome Relationships From Social Media , 2015, KDD.

[44]  Wray L. Buntine Operations for Learning with Graphical Models , 1994, J. Artif. Intell. Res..

[45]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[46]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[47]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[48]  Bengt Jonsson,et al.  A logic for reasoning about time and reliability , 1990, Formal Aspects of Computing.

[49]  Bing Liu,et al.  Mining Opinion Features in Customer Reviews , 2004, AAAI.

[50]  Steven Skiena,et al.  Statistically Significant Detection of Linguistic Change , 2014, WWW.

[51]  R. M. Kaplan,et al.  Knowledge-based acquisition of causal relationships in text , 1991 .

[52]  Yan Liu,et al.  Temporal causal modeling with graphical granger methods , 2007, KDD '07.

[53]  Samantha Kleinberg,et al.  Causality, Probability, and Time , 2012 .

[54]  Du-Seong Chang,et al.  Incremental cue phrase learning and bootstrapping method for causality extraction using cue phrase and word pair probabilities , 2006, Inf. Process. Manag..

[55]  Iryna Gurevych,et al.  A Comparative Study of Feature Extraction Algorithms in Customer Reviews , 2008, 2008 IEEE International Conference on Semantic Computing.

[56]  Ghislaine M. Lawrence The social construction of technological systems: new directions in the sociology and history of technology , 1989, Medical History.

[57]  Nir Friedman,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004, Science.

[58]  Jianwen Zhang,et al.  Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora , 2010, KDD.

[59]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[60]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[61]  Michael Gertz,et al.  Retro: Time-Based Exploration of Product Reviews , 2012, ECIR.