Cooperative Hierarchical Dirichlet Processes: Superposition vs. Maximization

The cooperative hierarchical structure is a common and significant data structure observed in, or adopted by, many research areas, such as: text mining (author-paper-word) and multi-label classification (label-instance-feature). Renowned Bayesian approaches for cooperative hierarchical structure modeling are mostly based on topic models. However, these approaches suffer from a serious issue in that the number of hidden topics/factors needs to be fixed in advance and an inappropriate number may lead to overfitting or underfitting. One elegant way to resolve this issue is Bayesian nonparametric learning, but existing work in this area still cannot be applied to cooperative hierarchical structure modeling. In this paper, we propose a cooperative hierarchical Dirichlet process (CHDP) to fill this gap. Each node in a cooperative hierarchical structure is assigned a Dirichlet process to model its weights on the infinite hidden factors/topics. Together with measure inheritance from hierarchical Dirichlet process, two kinds of measure cooperation, i.e., superposition and maximization, are defined to capture the many-to-many relationships in the cooperative hierarchical structure. Furthermore, two constructive representations for CHDP, i.e., stick-breaking and international restaurant process, are designed to facilitate the model inference. Experiments on synthetic and real-world data with cooperative hierarchical structures demonstrate the properties and the ability of CHDP for cooperative hierarchical structure modeling and its potential for practical application scenarios.

[1]  Samuel J. Gershman,et al.  A Tutorial on Bayesian Nonparametric Models , 2011, 1106.2697.

[2]  C. J-F,et al.  THE COALESCENT , 1980 .

[3]  Mingyuan Zhou,et al.  Augmentable Gamma Belief Networks , 2015, J. Mach. Learn. Res..

[4]  Zoubin Ghahramani,et al.  Pitman-Yor Diffusion Trees , 2011, UAI.

[5]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[6]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[7]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[8]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[9]  Rong Yan,et al.  Mining Social Emotions from Affective Text , 2012, IEEE Transactions on Knowledge and Data Engineering.

[10]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[11]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[12]  Charles J. Geyer,et al.  Practical Markov Chain Monte Carlo , 1992 .

[13]  Lawrence Carin,et al.  Negative Binomial Process Count and Mixture Modeling. , 2012, IEEE transactions on pattern analysis and machine intelligence.

[14]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[15]  Michael I. Jordan,et al.  Tree-Structured Stick Breaking for Hierarchical Data , 2010, NIPS.

[16]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[17]  Michael I. Jordan,et al.  Nonparametric Bayesian Learning of Switching Linear Dynamical Systems , 2008, NIPS.

[18]  Eric P. Xing,et al.  MedLDA: maximum margin supervised topic models , 2012, J. Mach. Learn. Res..

[19]  Thomas L. Griffiths,et al.  Learning author-topic models from text corpora , 2010, TOIS.

[20]  Emily B. Fox,et al.  Bayesian nonparametric learning of complex dynamical phenomena , 2009 .

[21]  Yee Whye Teh,et al.  Bayesian Agglomerative Clustering with Coalescents , 2007, NIPS.

[22]  David B. Dunson,et al.  The dynamic hierarchical Dirichlet process , 2008, ICML '08.

[23]  Lawrence Carin,et al.  Large-Scale Bayesian Multi-Label Learning via Topic-Based Label Embeddings , 2015, NIPS.

[24]  Zoubin Ghahramani,et al.  Pitman Yor Diffusion Trees for Bayesian Hierarchical Clustering , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Mark W. Schmidt,et al.  Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions , 2015, UAI.

[26]  Jie Lu,et al.  Infinite Author Topic Model Based on Mixed Gamma-Negative Binomial Process , 2015, 2015 IEEE International Conference on Data Mining.

[27]  W. Sudderth,et al.  Polya Trees and Random Distributions , 1992 .

[28]  Zoubin Ghahramani,et al.  Flexible Martingale Priors for Deep Hierarchies , 2012, AISTATS.

[29]  Xiangfeng Luo,et al.  Topic Model for Graph Mining , 2015, IEEE Transactions on Cybernetics.

[30]  Yizhou Sun,et al.  ETM: Entity Topic Models for Mining Documents Associated with Entities , 2012, 2012 IEEE 12th International Conference on Data Mining.

[31]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[32]  David B. Dunson,et al.  Probabilistic topic models , 2012, Commun. ACM.

[33]  Pieter Abbeel,et al.  Gradient Estimation Using Stochastic Computation Graphs , 2015, NIPS.

[34]  Jianwen Zhang,et al.  Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora , 2010, KDD.

[35]  Thomas L. Griffiths,et al.  A Nonparametric Bayesian Model of Multi-Level Category Learning , 2011, AAAI.

[36]  Thomas L. Griffiths,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[37]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[38]  Radford M. Neal,et al.  Density Modeling and Clustering Using Dirichlet Diffusion Trees , 2003 .

[39]  Chiranjib Bhattacharyya,et al.  EntScene: Nonparametric Bayesian Temporal Segmentation of Videos Aimed at Entity-Driven Scene Detection , 2015, IJCAI.

[40]  Brian Litt,et al.  Modeling the complex dynamics and changing correlations of epileptic events , 2014, Artif. Intell..

[41]  Haixun Wang,et al.  Tracking and Connecting Topics via Incremental Hierarchical Dirichlet Processes , 2011, 2011 IEEE 11th International Conference on Data Mining.

[42]  Joshua B. Tenenbaum,et al.  Learning with Hierarchical-Deep Models , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Michael I. Jordan,et al.  Combinatorial Clustering and the Beta Negative Binomial Process , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[45]  Chong Wang,et al.  Variational inference in nonconjugate models , 2012, J. Mach. Learn. Res..

[46]  Ashish Kapoor,et al.  Multilabel Classification using Bayesian Compressed Sensing , 2012, NIPS.

[47]  Mingyuan Zhou,et al.  Augmentable Gamma Belief Networks , 2016, J. Mach. Learn. Res..

[48]  Yee Whye Teh,et al.  Dependent Normalized Random Measures , 2013, ICML.

[49]  Chong Wang,et al.  Online Variational Inference for the Hierarchical Dirichlet Process , 2011, AISTATS.

[50]  Stephen P. Brooks,et al.  Assessing Convergence of Markov Chain Monte Carlo Algorithms , 2007 .

[51]  John W. Fisher,et al.  Coupling Nonparametric Mixtures via Latent Dirichlet Processes , 2012, NIPS.

[52]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[53]  Zoubin Ghahramani,et al.  Beta Diffusion Trees , 2014, ICML.

[54]  Chong Wang,et al.  Nested Hierarchical Dirichlet Processes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[56]  Yee Whye Teh,et al.  Modelling Genetic Variations using Fragmentation-Coagulation Processes , 2011, NIPS.

[57]  K. Bretonnel Cohen,et al.  A shared task involving multi-label classification of clinical free text , 2007, BioNLP@ACL.

[58]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[59]  Shamim Nemati,et al.  Bayesian nonparametric learning of switching dynamics in cohort physiological time series: application in critical care patient monitoring , 2015 .

[60]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[61]  Pascal Fua,et al.  Kullback-Leibler Proximal Variational Inference , 2015, NIPS.

[62]  Ingrid Zukerman,et al.  Authorship Attribution with Topic Models , 2014, CL.

[63]  Qun Liu,et al.  Topic-based term translation models for statistical machine translation , 2016, Artif. Intell..

[64]  Yee Whye Teh,et al.  Dirichlet Process , 2017, Encyclopedia of Machine Learning and Data Mining.

[65]  Peter A. Flach On the state of the art in machine learning: A personal review , 2001, Artif. Intell..

[66]  Siyuan Liu,et al.  Effective Mobile Context Pattern Discovery via Adapted Hierarchical Dirichlet Processes , 2014, 2014 IEEE 15th International Conference on Mobile Data Management.

[67]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[68]  Andrew M. Dai,et al.  The Supervised Hierarchical Dirichlet Process , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69]  Victor De Oliveira,et al.  Hierarchical Poisson models for spatial count data , 2013, J. Multivar. Anal..

[70]  Soumya Ghosh,et al.  Nonparametric Clustering with Distance Dependent Hierarchies , 2014, UAI.

[71]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[72]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.