Local Topic Discovery via Boosted Ensemble of Nonnegative Matrix Factorization

Nonnegative matrix factorization (NMF) has been increasingly popular for topic modeling of largescale documents. However, the resulting topics often represent only general, thus redundant information about the data rather than minor, but potentially meaningful information to users. To tackle this problem, we propose a novel ensemble model of nonnegative matrix factorization for discovering high-quality local topics. Our method leverages the idea of an ensemble model to successively perform NMF given a residual matrix obtained from previous stages and generates a sequence of topic sets. The novelty of our method lies in the fact that it utilizes the residual matrix inspired by a state-of-theart gradient boosting model and applies a sophisticated local weighting scheme on the given matrix to enhance the locality of topics, which in turn delivers high-quality, focused topics of interest to users.1

[1]  Jaegul Choo,et al.  L-EnsNMF: Boosted Local Topic Discovery via Ensemble of Nonnegative Matrix Factorization , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[2]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[3]  Haesun Park,et al.  Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework , 2014, J. Glob. Optim..

[4]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[5]  Niklas Elmqvist,et al.  TopicLens: Efficient Multi-Level Visual Topic Exploration of Large-Scale Document Collections , 2017, IEEE Transactions on Visualization and Computer Graphics.

[6]  Qingyao Wu,et al.  NMFE-SSCC: Non-negative matrix factorization ensemble for semi-supervised collective classification , 2015, Knowl. Based Syst..

[7]  Jaegul Choo,et al.  UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization , 2013, IEEE Transactions on Visualization and Computer Graphics.

[8]  Hyunsoo Kim,et al.  Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares , 2006 .

[9]  Derek Greene,et al.  Ensemble non-negative matrix factorization methods for clustering protein-protein interactions , 2008, Bioinform..

[10]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[13]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[14]  Haesun Park,et al.  Fast rank-2 nonnegative matrix factorization for hierarchical document clustering , 2013, KDD.

[15]  Peng Yang,et al.  Microbial community pattern detection in human body habitats via ensemble clustering framework , 2014, BMC Systems Biology.