Topic mining for call centers based on LDA

Latent Dirichlet Allocation, which is a non-supervised learning method, can be used for topic detection, automatic text categorization, keyword extraction and so on. It only focuses on the text itself, not considering other external correlation properties. External association property refers to some structured attributes that correspondence with the text data, for example, a paper usually has several properties like authors, publishing time etc. A telephone call usually has several properties like caller number, call time etc. To iron out flaws; we propose an improved model A-LDA based LDA. We use data sets from telephone call centers (a kind of data centers in rapid growth) to experiment on topic detection. The topic results show that A-LDA with introduce of external correlation properties, compared with the traditional LDA, is decreased in perplexity value and has better generalization performance. At the same time, we can obtain the topic that external attributes contained.

[1]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[2]  Weisi Lin,et al.  Geometric Optimum Experimental Design for Collaborative Image Retrieval , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  James R. Foulds,et al.  Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation , 2013, KDD.

[4]  A. Doucet,et al.  Particle Markov chain Monte Carlo methods , 2010 .

[5]  Christopher M. Bishop Latent Variable Models , 1998, Learning in Graphical Models.

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[8]  William W. Cohen,et al.  Block-LDA: Jointly Modeling Entity-Annotated Text and Entity-Entity Links , 2014, Handbook of Mixed Membership Models and Their Applications.

[9]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[10]  Andrew McCallum,et al.  Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression , 2008, UAI.

[11]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[12]  David B. Dunson,et al.  Probabilistic topic models , 2012, Commun. ACM.

[13]  P. Molenaar Latent variable models are network models , 2010, Behavioral and Brain Sciences.