Multi-objective Topic Modeling

Topic Modeling (TM) is a rapidly-growing area at the interfaces of text mining, artificial intelligence and statistical modeling, that is being increasingly deployed to address the ‘information overload’ associated with extensive text repositories. The goal in TM is typically to infer a rich yet intuitive summary model of a large document collection, indicating a specific collection of topics that characterizes the collection – each topic being a probability distribution over words – along with the degrees to which each individual document is concerned with each topic. The model then supports segmentation, clustering, profiling, browsing, and many other tasks. Current approaches to TM, dominated by Latent Dirichlet Allocation (LDA), assume a topic-driven document generation process and find a model that maximizes the likelihood of the data with respect to this process. This is clearly sensitive to any mismatch between the ‘true’ generating process and statistical model, while it is also clear that the quality of a topic model is multi-faceted and complex. Individual topics should be intuitively meaningful, sensibly distinct, and free of noise. Here we investigate multi-objective approaches to TM, which attempt to infer coherent topic models by navigating the trade-offs between objectives that are oriented towards coherence as well as coverage of the corpus at hand. Comparisons with LDA show that adoption of MOEA approaches enables significantly more coherent topics than LDA, consequently enhancing the use and interpretability of these models in a range of applications, without significant degradation in generalization ability.

[1]  David Buttler,et al.  Exploring Topic Coherence over Many Models and Many Topics , 2012, EMNLP.

[2]  N. Chater,et al.  The probabilistic mind: prospects for Bayesian cognitive science , 2008 .

[3]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .

[4]  Qingfu Zhang,et al.  MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition , 2007, IEEE Transactions on Evolutionary Computation.

[5]  Padmini Srinivasan,et al.  Topic models and a revisit of text-related applications , 2008, PIKM '08.

[6]  Etienne Barnard,et al.  Evaluating topic models with stability , 2008 .

[7]  Carlos A. Coello Coello,et al.  Evolutionary multi-objective optimization: a historical view of the field , 2006, IEEE Comput. Intell. Mag..

[8]  Timothy Baldwin,et al.  Evaluating topic models for digital libraries , 2010, JCDL '10.

[9]  Xin Chen,et al.  Probabilistic topic modeling for genomic data interpretation , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[10]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[11]  Ruslan Salakhutdinov,et al.  Evaluation methods for topic models , 2009, ICML '09.

[12]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[13]  Gabriel Doyle Financial Topic Models , 2009 .

[14]  Tomasz Malisiewicz Detecting Objects via Multiple Segmentations and Latent Topic Models , 2006 .

[15]  Costas S. Iliopoulos,et al.  An algorithm for mapping short reads to a dynamically changing genomic sequence , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[16]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[17]  K. Glasgow,et al.  Los Angeles, California , 2003 .

[18]  Thomas L. Griffiths,et al.  Rational analysis as a link between human memory and information retrieval , 2008 .

[19]  Rong Yan,et al.  Joint Emotion-Topic Modeling for Social Affective Text Mining , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[20]  David B. Dunson,et al.  Probabilistic topic models , 2011, KDD '11 Tutorials.

[21]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[22]  Paris Smaragdis,et al.  Topic Models for Audio Mixture Analysis , 2009 .

[23]  V. Pareto,et al.  Vilfredo Pareto. Cours d’Économie Politique , 1897 .

[24]  Carlos A. Coello Coello,et al.  Evolutionary Multi-Objective Optimization: Basic Concepts and Some Applications in Pattern Recognition , 2011, MCPR.

[25]  Mehran Sahami,et al.  Text Mining: Classification, Clustering, and Applications , 2009 .

[26]  Charles Gide,et al.  Cours d'économie politique , 1911 .

[27]  Shiwen Yu,et al.  Using Pointwise Mutual Information to Identify Implicit Features in Customer Reviews , 2006, ICCPOL.

[28]  Hans-Georg Kemper,et al.  Management Support with Structured and Unstructured Data—An Integrated Business Intelligence Framework , 2008, Inf. Syst. Manag..