A review of topic modeling methods

Abstract Topic modeling is a popular analytical tool for evaluating data. Numerous methods of topic modeling have been developed which consider many kinds of relationships and restrictions within datasets; however, these methods are not frequently employed. Instead many researchers gravitate to Latent Dirichlet Analysis, which although flexible and adaptive, is not always suited for modeling more complex data relationships. We present different topic modeling approaches capable of dealing with correlation between topics, the changes of topics over time, as well as the ability to handle short texts such as encountered in social media or sparse text data. We also briefly review the algorithms which are used to optimize and infer parameters in topic modeling, which is essential to producing meaningful results regardless of method. We believe this review will encourage more diversity when performing topic modeling and help determine what topic modeling method best suits the user needs.

[1]  Christos Boutsidis,et al.  SVD based initialization: A head start for nonnegative matrix factorization , 2008, Pattern Recognit..

[2]  S. Kumar,et al.  Twitter Analytics-Based Assessment: Are the United States Coastal Regions Prepared for Climate Changeƒ , 2018, 2018 IEEE International Symposium on Technology and Society (ISTAS).

[3]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[4]  Sheng Tang,et al.  A density-based method for adaptive LDA model selection , 2009, Neurocomputing.

[5]  Thomas L. Griffiths,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[6]  Lixin Gao,et al.  Accelerating distributed Expectation-Maximization algorithms with frequent updates , 2018, J. Parallel Distributed Comput..

[7]  Shaogang Gong,et al.  Video Behaviour Mining Using a Dynamic Topic Model , 2011, International Journal of Computer Vision.

[8]  Jaegul Choo,et al.  Nonnegative Matrix Factorization for Interactive Topic Modeling and Document Clustering , 2014 .

[9]  Konstantin Vorontsov,et al.  Tutorial on Probabilistic Topic Modeling: Additive Regularization for Stochastic Matrix Factorization , 2014, AIST.

[10]  Derek Greene,et al.  Stability of topic modeling via matrix factorization , 2017, Expert Syst. Appl..

[11]  Gregory Dudek,et al.  Autonomous Adaptive Underwater Exploration using Online Topic Modeling , 2012, ISER.

[12]  Michael I. Jordan,et al.  Improving the Mean Field Approximation Via the Use of Mixture Distributions , 1999, Learning in Graphical Models.

[13]  Yee Whye Teh,et al.  Collapsed Variational Inference for HDP , 2007, NIPS.

[14]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[15]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[16]  Yee Whye Teh,et al.  A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation , 2006, NIPS.

[17]  Yee Whye Teh,et al.  Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes , 2004, NIPS.

[18]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[19]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[20]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[21]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[22]  Zhenlong Li,et al.  An Evaluation of Geotagged Twitter Data during Hurricane Irma Using Sentiment Analysis and Topic Modeling for Disaster Resilience , 2019, 2019 IEEE International Symposium on Technology and Society (ISTAS).

[23]  Weizhong Zhao,et al.  A heuristic approach to determine an appropriate number of topics in topic modeling , 2015, BMC Bioinformatics.

[24]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[25]  Zhenlong Li,et al.  Topic modeling and sentiment analysis of global climate change tweets , 2019, Social Network Analysis and Mining.

[26]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[27]  Charles J. Geyer,et al.  Practical Markov Chain Monte Carlo , 1992 .

[28]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Julie Harvey,et al.  Machine Learning-Based Models for Assessing Impacts Before, During and After Hurricane Florence , 2019, 2019 IEEE Symposium Series on Computational Intelligence (SSCI).

[30]  Tim Menzies,et al.  What is wrong with topic modeling? And how to fix it using search-based software engineering , 2016, Inf. Softw. Technol..

[31]  Jen-Tzung Chien,et al.  Latent Dirichlet mixture model , 2018, Neurocomputing.

[32]  David J. C. MacKay,et al.  A hierarchical Dirichlet language model , 1995, Natural Language Engineering.

[33]  Shaowen Yao,et al.  An overview of topic modeling and its current applications in bioinformatics , 2016, SpringerPlus.

[34]  Krzysztof Janowicz,et al.  Things and Strings: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence with Topic Modeling , 2016, EKAW.

[35]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.