Topics may Evolve: Using Complaint Data for Analysis

User complaint data are quite valuable because they can reflect deficiencies of companies. Analyzing these short texts can help companies discover what topics users are complaining about. It is critical to locate and respond to these complaints timely so that companies can improve users’ satisfaction and loyalty. As the data volume is large, topic model can help discover key complaint topics quickly. The complaint data are in the form of short texts and streams, traditional topic models like LDA and BTM are not suitable in this scenario, for the reason that LDA is designed for long texts and BTM can not handle streams. This paper firstly proposes an improved shorttext topic model called PMITI-BTM to generate topics from user complaint data statically, and then further extends this algorithm into a dynamic one to suit the streaming feature of data. To further analyze the topic evolution, we finally propose a clustering algorithm called TDWAP to acquire the evolution process of these topics in different time slices. For each algorithm, we do several experiments to prove its efficiency. Results show that our methods not only can improve the performance of short texts topic discovery, but also can discover the evolution of topics.

[1]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[2]  Boriana L. Milenova Clustering Large Databases with Numeric and Nominal Values Using Orthogonal Projections , 2004 .

[3]  Changsheng Xu,et al.  Multi-Modal Event Topic Model for Social Event Analysis , 2016, IEEE Transactions on Multimedia.

[4]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[5]  Verónica Bolón-Canedo,et al.  Feature selection and classification in multiple class datasets: An application to KDD Cup 99 dataset , 2011, Expert Syst. Appl..

[6]  Scott Shenker,et al.  Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters , 2012, HotCloud.

[7]  Xiaohui Yan,et al.  A biterm topic model for short texts , 2013, WWW.

[8]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[11]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[12]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[13]  Michèle Sebag,et al.  Data Streaming with Affinity Propagation , 2008, ECML/PKDD.

[14]  Junyuan Xie,et al.  Don't Forget the Quantifiable Relationship between Words: Using Recurrent Neural Network for Short Text Topic Discovery , 2017, AAAI.

[15]  Daniel Barbará,et al.  On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[16]  Richard M. Schwartz,et al.  Topic tracking for radio, TV broadcast, and newswire , 1999, EUROSPEECH.

[17]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[18]  Massih-Reza Amini,et al.  Streaming-LDA: A Copula-based Approach to Modeling Topic Dependencies in Document Streams , 2016, KDD.

[19]  Yuan Zuo,et al.  Word network topic model: a simple but general solution for short and imbalanced texts , 2014, Knowledge and Information Systems.

[20]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[21]  Chong Wang,et al.  Continuous Time Dynamic Topic Models , 2008, UAI.

[22]  Hung-Yu Kao,et al.  Word Co-occurrence Augmented Topic Model in Short Text , 2015, Int. J. Comput. Linguistics Chin. Lang. Process..