Data association for topic intensity tracking

We present a unified model of what was traditionally viewed as two separate tasks: data association and intensity tracking of multiple topics over time. In the data association part, the task is to assign a topic (a class) to each data point, and the intensity tracking part models the bursts and changes in intensities of topics over time. Our approach to this problem combines an extension of Factorial Hidden Markov models for topic intensity tracking with exponential order statistics for implicit data association. Experiments on text and email datasets show that the interplay of classification and topic intensity tracking improves the accuracy of both classification and intensity tracking. Even a little noise in topic assignments can mislead the traditional algorithms. However, our approach detects correct topic intensities even with 30% topic noise.

[1]  James Allan,et al.  Automatic generation of overview timelines , 2000, SIGIR '00.

[2]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[3]  Jeffrey O. Kephart,et al.  MailCat: an intelligent assistant for organizing e-mail , 1999, AGENTS '99.

[4]  Jon Kleinberg,et al.  Traffic-based feedback on the web , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[6]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[7]  Andreas Krause,et al.  Optimal Nonmyopic Value of Information in Graphical Models - Efficient Algorithms and Theoretical Limits , 2005, IJCAI.

[8]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[9]  Yiming Yang,et al.  Improving text categorization methods for event tracking , 2000, SIGIR '00.

[10]  Uri Lerner,et al.  Hybrid Bayesian networks for reasoning about complex systems , 2002 .

[11]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[12]  R. Papka,et al.  On-line new event detection and tracking , 1998, SIGIR '98.

[13]  Avi Pfeffer,et al.  Continuous Time Particle Filtering , 2005, IJCAI.

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  Daphne Koller,et al.  Learning Continuous Time Bayesian Networks , 2002, UAI.

[16]  Uri Lerner,et al.  Inference in Hybrid Networks: Theoretical Limits and Practical Algorithms , 2001, UAI.

[17]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.