Incorporating Metadata into Dynamic Topic Analysis

Everyday millions of blogs and micro-blogs are posted on the Internet These posts usually come with useful metadata, such as tags, authors, locations, etc. Much of these data are highly specific or personalized. Tracking the evolution of these data helps us to discover trending topics and users' interests, which are key factors in recommendation and advertisement placement systems. In this paper, we use topic models to analyze topic evolution in social media corpora with the help of metadata. Specifically, we propose a flexible dynamic topic model which can easily incorporate various type of metadata. Since our model adds negligible computation cost on the top of Latent Dirichlet Allocation, it can be implemented very efficiently. We test our model on both Twitter data and NIPS paper collection. The results show that our approach provides better performance in terms of held-out likelihood, yet still retains good interpretability.

[1]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[2]  Kathryn B. Laskey,et al.  Bayesian semantics for the semantic web , 2005 .

[3]  Ole J. Mengshoel,et al.  Integrating Probabilistic Reasoning and Statistical Quality Control Techniques for Fault Diagnosis in Hybrid Domains , 2011 .

[4]  Mary Shaw,et al.  Engineering Self-Adaptive Systems through Feedback Loops , 2009, Software Engineering for Self-Adaptive Systems.

[5]  B. Marcot,et al.  Guidelines for developing and updating Bayesian belief networks applied to ecological modeling and conservation , 2006 .

[6]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[7]  Paulo Cesar G. da Costa,et al.  Modeling a probabilistic ontology for Maritime Domain Awareness , 2011, 14th International Conference on Information Fusion.

[8]  Mark E. Borsuk,et al.  A Bayesian network of eutrophication models for synthesis, prediction, and uncertainty analysis , 2004 .

[9]  Khalil Shihab DYNAMIC MODELING OF GROUNDWATER POLLUTANTS WITH BAYESIAN NETWORKS , 2008, Appl. Artif. Intell..

[10]  Eric R. Ziegel,et al.  Practical Nonparametric and Semiparametric Bayesian Statistics , 1998, Technometrics.

[11]  Gal Chechik,et al.  Euclidean Embedding of Co-occurrence Data , 2004, J. Mach. Learn. Res..

[12]  Andrew McCallum,et al.  Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression , 2008, UAI.

[13]  Joseph L. Hellerstein,et al.  Using Control Theory to Achieve Service Level Objectives In Performance Management , 2002, Real-Time Systems.

[14]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[15]  Ole J. Mengshoel,et al.  Methods for Probabilistic Fault Diagnosis: An Electrical Power System Case Study , 2009 .

[16]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[17]  Ian Horrocks,et al.  Description Logics as Ontology Languages for the Semantic Web , 2005, Mechanizing Mathematical Reasoning.

[18]  Richard Mortier,et al.  CT-NOR: Representing and Reasoning About Events in Continuous Time , 2008, UAI.

[19]  J. L. Roux An Introduction to the Kalman Filter , 2003 .

[20]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[21]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[22]  R. Kass,et al.  Multiple neural spike train data analysis: state-of-the-art and future challenges , 2004, Nature Neuroscience.

[23]  Ann Elizabeth Nicholson,et al.  Monitoring discrete environments using dynamic belief networks (robotics) , 1992 .

[24]  Kathryn B. Laskey MEBN: A language for first-order Bayesian knowledge bases , 2008, Artif. Intell..

[25]  Erik Blasch,et al.  High Level Information Fusion developments, issues, and grand challenges: Fusion 2010 panel discussion , 2010, 2010 13th International Conference on Information Fusion.

[26]  David Maxwell Chickering,et al.  A Bayesian Approach to Learning Bayesian Networks with Local Structure , 1997, UAI.

[27]  Ole J. Mengshoel,et al.  Belief Propagation by Message Passing in Junction Trees: Computing Each Message Faster Using GPU Parallelization , 2011, UAI.

[28]  Rafael Rumí,et al.  Bayesian networks in environmental modelling , 2011, Environ. Model. Softw..

[29]  Matthias M. Boer,et al.  Deriving state-and-transition models from an image series of grassland pattern dynamics , 2010 .

[30]  Mark S. Boddy,et al.  An Analysis of Time-Dependent Planning , 1988, AAAI.

[31]  Sakari Kuikka,et al.  Learning Bayesian decision analysis by doing: lessons from environmental and natural resources management , 1999 .

[32]  Donald F. Towsley,et al.  On designing improved controllers for AQM routers supporting TCP flows , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[33]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[34]  Michael I. Jordan,et al.  Modeling Events with Cascades of Poisson Processes , 2010, UAI.

[35]  Susan T. Dumais,et al.  Characterizing Microblogs with Topic Models , 2010, ICWSM.

[36]  Josef Kittler,et al.  Application of a Bayesian Network in a GIS Based Decision Making System , 1998, Int. J. Geogr. Inf. Sci..

[37]  Carmel Pollino,et al.  Bayesian modelling for risk-based environmental water allocation , 2009 .

[38]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[39]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[40]  Yixin Diao,et al.  Using MIMO feedback control to enforce policies for interrelated metrics with application to the Apache Web server , 2002, NOMS 2002. IEEE/IFIP Network Operations and Management Symposium. ' Management Solutions for the New Communications World'(Cat. No.02CH37327).

[41]  Yaakov Bar-Shalom,et al.  Estimation and Tracking: Principles, Techniques, and Software , 1993 .

[42]  Daphne Koller,et al.  Sampling in Factored Dynamic Systems , 2001, Sequential Monte Carlo Methods in Practice.

[43]  Kevin P. Murphy,et al.  Learning the Structure of Dynamic Probabilistic Networks , 1998, UAI.

[44]  Kevin Murphy,et al.  Bayes net toolbox for Matlab , 1999 .

[45]  Laura Uusitalo,et al.  Advantages and challenges of Bayesian networks in environmental modelling , 2007 .

[46]  Rina Dechter,et al.  Bucket Elimination: A Unifying Framework for Reasoning , 1999, Artif. Intell..

[47]  Philippe Flandre,et al.  Genotypic resistance analyses in nucleoside-pretreated patients failing an indinavir containing regimen: results from a randomized comparative trial: (Novavir ANRS 073). , 2005, Journal of clinical virology : the official publication of the Pan American Society for Clinical Virology.

[48]  Enrique J. Gómez,et al.  Temporal Data Mining of HIV Registries: Results from a 25 Years Follow-Up , 2009, AIME.

[49]  C. Petropoulos,et al.  Improving lopinavir genotype algorithm through phenotype correlations: novel mutation patterns and amprenavir cross-resistance , 2003, AIDS.

[50]  C. Striebel,et al.  On the maximum likelihood estimates for linear dynamic systems , 1965 .

[51]  Ross D. Shachter,et al.  Simulation Approaches to General Probabilistic Inference on Belief Networks , 2013, UAI.

[52]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[53]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[54]  Andrew McCallum,et al.  Rethinking LDA: Why Priors Matter , 2009, NIPS.

[55]  Kerrie Mengersen,et al.  Integrating Bayesian networks and geographic information systems: Good practice examples , 2012, Integrated environmental assessment and management.

[56]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[57]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Erik Blasch,et al.  Ontology alignment in geographical hard-soft information fusion systems , 2010, 2010 13th International Conference on Information Fusion.

[59]  Quan Pan,et al.  Learning Dynamic Bayesian Networks Structure Based on Bayesian Optimization Algorithm , 2007, ISNN.

[60]  Alexander J. Smola,et al.  Scalable distributed inference of dynamic user interests for behavioral targeting , 2011, KDD.

[61]  Jian Pei,et al.  Detecting topic evolution in scientific literature: how can citations help? , 2009, CIKM.

[62]  D. Aldous Exchangeability and related topics , 1985 .

[63]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[64]  Jing Xu,et al.  Continuous Time Bayesian Network Reasoning and Learning Engine , 2010, J. Mach. Learn. Res..

[65]  B. Wintle,et al.  State-and-transition modelling for Adaptive Management of native woodlands , 2011 .

[66]  Paulo Cesar G. da Costa,et al.  A First-Order Bayesian Tool for Probabilistic Ontologies , 2008, FLAIRS Conference.

[67]  Galina L. Rogova,et al.  Designing ontologies for higher level fusion , 2009, Inf. Fusion.

[68]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[69]  Shlomo Zilberstein,et al.  Optimal Composition of Real-Time Systems , 1996, Artif. Intell..

[70]  Uffe Kjærulff,et al.  A Computational Scheme for Reasoning in Dynamic Probabilistic Networks , 1992, UAI.

[71]  Heiner Stuckenschmidt,et al.  Probabilistic Extensions of Semantic Web Languages - A Survey , 2008 .

[72]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[73]  M. Escobar Estimating Normal Means with a Dirichlet Process Prior , 1994 .

[74]  S. J. Miller,et al.  Response of willow (Salix caroliniana Michx.) in a floodplain marsh to a growing season prescribed fire. , 2005 .

[75]  DarwicheAdnan A differential approach to inference in Bayesian networks , 2003 .

[76]  Yixin Diao,et al.  Feedback Control of Computing Systems , 2004 .

[77]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[78]  Sean Gerrish,et al.  Predicting Legislative Roll Calls from Text , 2011, ICML.

[79]  Paulo Cesar G. da Costa,et al.  PR-OWL: A Framework for Probabilistic Ontologies , 2006, FOIS.

[80]  F. Douglas Shields,et al.  Effects of soil moisture regimes on growth and survival of black willow (Salix nigra) posts (cuttings) , 1998, Wetlands.

[81]  Kathryn B. Laskey,et al.  Uncertainty Reasoning for the World Wide Web: Report on the URW3-XG Incubator Group , 2008, URSW.

[82]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[83]  Ole J. Mengshoel,et al.  Advanced Diagnostics and Prognostics Testbed , 2007 .

[84]  Vanessa Didelez,et al.  Graphical models for marked point processes based on local independence , 2007, 0710.5874.

[85]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[86]  Ole J. Mengshoel,et al.  Understanding the role of noise in stochastic local search: Analysis and experiments , 2008, Artif. Intell..

[87]  A. Wensing,et al.  A novel genetic pathway involving L76V and M46I leading to lopinavir/r resistance , 2007 .

[88]  Padhraic Smyth,et al.  Hidden Markov models for fault detection in dynamic system , 1993, Pattern Recognit..

[89]  Ole J. Mengshoel,et al.  Designing Resource-Bounded Reasoners using Bayesian Networks: System Health Monitoring and Diagnosis , 2007 .

[90]  P. Ghys,et al.  Global and regional distribution of HIV-1 genetic subtypes and recombinants in 2004 , 2006, AIDS.

[91]  Ole J. Mengshoel,et al.  Understanding the scalability of Bayesian network inference using clique tree growth curves , 2010, Artif. Intell..

[92]  Thomas Beauvisage,et al.  Computer usage in daily life , 2009, CHI.

[93]  Mei-Yuh Hwang,et al.  Predicting unseen triphones with senones , 1996, IEEE Trans. Speech Audio Process..

[94]  Eric Horvitz,et al.  Bounded Conditioning: Flexible Inference for Decisions under Scarce Resources , 2013, UAI 1989.

[95]  Elizabeth L. Wilmer,et al.  Markov Chains and Mixing Times , 2008 .

[96]  Khalil Shihab,et al.  Dynamic Modeling of Ground‐Water Quality Using Bayesian Techniques 1 , 2007 .

[97]  Padhraic Smyth,et al.  Continuous-Time Regression Models for Longitudinal Networks , 2011, NIPS.

[98]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[99]  Xavier Boyen,et al.  Approximate Learning of Dynamic Models , 1998, NIPS.

[100]  Anders L. Madsen,et al.  Lazy Propagation in Junction Trees , 1998, UAI.

[101]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[102]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[103]  Geoffrey E. Hinton,et al.  Variational Learning for Switching State-Space Models , 2000, Neural Computation.

[104]  Puyang Xu,et al.  A Model for Temporal Dependencies in Event Streams , 2011, NIPS.

[105]  Miroslav Krstic,et al.  Stabilization of stochastic nonlinear systems driven by noise of unknown covariance , 2001, IEEE Trans. Autom. Control..

[106]  Jure Leskovec,et al.  Patterns of temporal variation in online media , 2011, WSDM '11.

[107]  Bronwyn Price,et al.  Using a Bayesian belief network to predict suitable habitat of an endangered mammal – The Julia Creek dunnart (Sminthopsis douglasi) , 2007 .

[108]  Pablo Hernandez-Leal,et al.  Learning temporal nodes Bayesian networks , 2013, Int. J. Approx. Reason..

[109]  Uri Lerner,et al.  Hybrid Bayesian networks for reasoning about complex systems , 2002 .

[110]  Cheng-Zhong Xu,et al.  Model Predictive Feedback Control for QoS Assurance in Webservers , 2008, Computer.

[111]  Gautam Biswas,et al.  Bayesian Fault Detection and Diagnosis in Dynamic Systems , 2000, AAAI/IAAI.

[112]  Ann E. Nicholson,et al.  Combining state and transition models with dynamic Bayesian networks , 2011 .

[113]  B. Bestelmeyer,et al.  Development and use of state-and-transition models for rangelands. , 2003 .

[114]  David D. Lewis,et al.  Reuters-21578 Text Categorization Test Collection, Distribution 1.0 , 1997 .

[115]  Ole J. Mengshoel,et al.  A Tutorial on Bayesian Networks for System Health Management , 2011 .

[116]  Daniel Zelterman,et al.  Bayesian Artificial Intelligence , 2005, Technometrics.

[117]  David Madigan,et al.  Probabilistic Temporal Reasoning , 2005, Handbook of Temporal Reasoning in Artificial Intelligence.

[118]  Paulo Cesar G. da Costa,et al.  PROGNOS: Predictive situational awareness with probabilistic ontologies , 2010, 2010 13th International Conference on Information Fusion.

[119]  Eyal Amir,et al.  Real Time Assessment of Drinking Water Systems Using a Dynamic Bayesian Network , 2007 .

[120]  Dan Roth,et al.  On the Hardness of Approximate Reasoning , 1993, IJCAI.

[121]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[122]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[123]  A. Darwiche,et al.  Complexity Results and Approximation Strategies for MAP Explanations , 2011, J. Artif. Intell. Res..

[124]  Judea Pearl,et al.  The recovery of causal poly-trees from statistical data , 1987, Int. J. Approx. Reason..

[125]  Yasushi Sakurai,et al.  Online multiscale dynamic topic models , 2010, KDD.

[126]  Richard A. Rode,et al.  Identification of Genotypic Changes in Human Immunodeficiency Virus Protease That Correlate with Reduced Susceptibility to the Protease Inhibitor Lopinavir among Viral Isolates from Protease Inhibitor-Experienced Patients , 2001, Journal of Virology.

[127]  Prakash P. Shenoy,et al.  A valuation-based language for expert systems , 1989, Int. J. Approx. Reason..

[128]  Ruslan Salakhutdinov,et al.  Evaluation methods for topic models , 2009, ICML '09.

[129]  David Barber,et al.  Expectation Correction for Smoothed Inference in Switching Linear Dynamical Systems , 2006, J. Mach. Learn. Res..

[130]  Steven J. Miller,et al.  Dormant Season Prescribed Fire as a Management Tool for the Control of Salix caroliniana Michx. in a Floodplain Marsh , 2005, Wetlands Ecology and Management.

[131]  Shlomo Zilberstein,et al.  Decentralized monitoring of distributed anytime algorithms , 2011, AAMAS.

[132]  R. Dechter,et al.  Stochastic Local Search for Bayesian Networks , 1999 .

[133]  Paulo Cesar G. da Costa,et al.  Evaluating uncertainty representation and reasoning in HLF systems , 2011, 14th International Conference on Information Fusion.

[134]  Thomas D. Nielsen,et al.  Latent Classification Models , 2005, Machine Learning.

[135]  Ockie J. H. Bosch,et al.  Developing decision support tools for rangeland management by combining state and transition models and Bayesian belief networks , 2008 .

[136]  Uri T Eden,et al.  A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. , 2005, Journal of neurophysiology.

[137]  Michael,et al.  On a Class of Bayesian Nonparametric Estimates : I . Density Estimates , 2008 .

[138]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[139]  Sorin Draghici,et al.  Predicting HIV drug resistance with neural networks , 2003, Bioinform..

[140]  Shahar Ben-Menahem,et al.  Stochastic stability of a neural‐net robot controller subject to signal‐dependent noise in the learning rule , 2009 .

[141]  Paulo Cesar G. da Costa,et al.  A GUI Tool for Plausible Reasoning in the Semantic Web using MEBN , 2007, Seventh International Conference on Intelligent Systems Design and Applications (ISDA 2007).

[142]  Kuo-Chu Chang,et al.  High Level Fusion and Predictive Situational Awareness with Probabilistic Ontologies , 2010 .

[143]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[144]  Jesus A. Gonzalez,et al.  Unveiling HIV mutational networks associated to pharmacological selective pressure: a temporal Bayesian approach , 2011 .

[145]  Adnan Darwiche,et al.  Compiling Bayesian Networks Using Variable Elimination , 2007, IJCAI.

[146]  David C. Wilkins,et al.  Portfolios in Stochastic Local Search: Efficiently Computing Most Probable Explanations in Bayesian Networks , 2011, Journal of Automated Reasoning.

[147]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[148]  Eric Horvitz,et al.  Dynamic Network Models for Forecasting , 1992, UAI.