Experiments with learning graphical models on text

A rich variety of models are now in use for unsupervised modelling of text documents, and, in particular, a rich variety of graphical models exist, with and without latent variables. To date, there is inadequate understanding about the comparative performance of these, partly because they are subtly different, and they have been proposed and evaluated in different contexts. This paper reports on our experiments with a representative set of state of the art models: chordal graphs, matrix factorisation, and hierarchical latent tree models. For the chordal graphs, we use different scoring functions. For matrix factorisation models, we use different hierarchical priors, asymmetric priors on components. We use Boolean matrix factorisation rather than topic models, so we can do comparable evaluations. The experiments perform a number of evaluations: probability for each document, omni-directional prediction which predicts different variables, and anomaly detection. We find that matrix factorisation performed well at anomaly detection but poorly on the prediction task. Chordal graph learning performed the best generally, and probably due to its lower bias, often out-performed hierarchical latent trees.

[1]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[2]  Lawrence Carin,et al.  Non-negative Matrix Factorization for Discrete Data with Hierarchical Side-Information , 2016, AISTATS.

[3]  Wray L. Buntine,et al.  Bibliographic analysis on research publications using authors, categorical labels and the citation network , 2016, Machine Learning.

[4]  Xiaojun Wu,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Mingyuan Zhou,et al.  Infinite Edge Partition Models for Overlapping Community Detection and Link Prediction , 2015, AISTATS.

[6]  Tengfei Liu,et al.  Hierarchical Latent Tree Analysis for Topic Detection , 2014, ECML/PKDD.

[7]  Éric Gaussier,et al.  Relation between PLSA and NMF and implications , 2005, SIGIR '05.

[8]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[9]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[10]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[11]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[12]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[13]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[14]  David B. Dunson,et al.  Probabilistic topic models , 2011, KDD '11 Tutorials.

[15]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[16]  Geoffrey I. Webb,et al.  Scaling log-linear analysis to datasets with thousands of variables , 2015, SDM.

[17]  Guillaume Bouchard,et al.  Latent IBP Compound Dirichlet Allocation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Charu C. Aggarwal,et al.  A Survey of Text Clustering Algorithms , 2012, Mining Text Data.

[19]  David B. Dunson,et al.  Beta-Negative Binomial Process and Poisson Factor Analysis , 2011, AISTATS.

[20]  Aleks Jakulin,et al.  Applying Discrete PCA in Data Analysis , 2004, UAI.

[21]  Geoffrey I. Webb,et al.  Scalable Learning of Graphical Models , 2016, KDD.

[22]  Joe Suzuki,et al.  A theoretical analysis of the BDeu scores in Bayesian network structure learning , 2016, ArXiv.

[23]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[24]  Ruslan Salakhutdinov,et al.  Evaluation methods for topic models , 2009, ICML '09.

[25]  Dat Quoc Nguyen,et al.  Improving Topic Models with Latent Feature Word Representations , 2015, TACL.

[26]  Aixin Sun,et al.  Topic Modeling for Short Texts with Auxiliary Word Embeddings , 2016, SIGIR.

[27]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[28]  Swapnil Mishra,et al.  Experiments with non-parametric topic models , 2014, KDD.

[29]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[30]  ZaragozaHugo,et al.  The Probabilistic Relevance Framework , 2009 .

[31]  Farhan Khawar,et al.  Latent tree models for hierarchical topic detection , 2016, Artif. Intell..

[32]  Joe Suzuki,et al.  Branch and Bound for Regular Bayesian Network Structure Learing , 2017, UAI.

[33]  Chong Wang,et al.  Nested Hierarchical Dirichlet Processes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Jianyong Wang,et al.  A dirichlet multinomial mixture model-based approach for short text clustering , 2014, KDD.