An overview of topic modeling methods and tools

Topic modeling is a powerful technique for analysis of a huge collection of a document. Topic modeling is used for discovering hidden structure from the collection of a document. The topic is viewed as a recurring pattern of co-occurring words. A topic includes a group of words that often occurs together. Topic modeling can link words with the same context and differentiate across the uses of words with different meanings. In this paper, we discuss methods of Topic Modeling which includes Vector Space Model (VSM), Latent Semantic Indexing (LSI), Probabilistic Latent Semantic Analysis (PLSA), Latent Dirichlet Allocation (LDA) with their features and limitations. After that, we will discuss tools available for topic modeling such as Gensim, Standford topic modeling toolbox, MALLET, BigARTM. Then some of the applications of Topic Modeling covered. Topic models have a wide range of applications like tag recommendation, text categorization, keyword extraction, information filtering and similarity search in the fields of text mining, information retrieval.

[1]  Ruixuan Li,et al.  RankTopic: Ranking Based Topic Modeling , 2012, 2012 IEEE 12th International Conference on Data Mining.

[2]  B. S. Jadhav,et al.  Pattern Enhanced Topic Model for Information Filtering , 2016 .

[3]  Xiang Cheng,et al.  Incremental probabilistic latent semantic analysis for automatic question recommendation , 2008, RecSys '08.

[4]  Barbara Rosario,et al.  Latent Semantic Indexing : An Overview 1 Latent Semantic Indexing : An overview INFOSYS 240 Spring 2000 Final Paper , 2001 .

[5]  Rainer Lienhart,et al.  PLSA on Large Scale Image Databases , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[6]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[7]  Constantine Kotropoulos,et al.  RPLSA: A novel updating scheme for Probabilistic Latent Semantic Analysis , 2011, Comput. Speech Lang..

[8]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[9]  Ch. Dayakar Reddy,et al.  Mining User Aware Rare Sequential Topic Patterns in Document Streams , 2017 .

[10]  Shaowen Yao,et al.  An overview of topic modeling and its current applications in bioinformatics , 2016, SpringerPlus.

[11]  Suresh Jain,et al.  Evaluation of Stemming and Stop Word Techniques on Text Classification Problem , 2015 .

[12]  Ramin Zabih,et al.  Comparing images using color coherence vectors , 1997, MULTIMEDIA '96.

[13]  Wongkot Sriurai,et al.  IMPROVING TEXT CATEGORIZATION BY USING A TOPIC MODEL , 2011 .

[14]  Rainer Lienhart,et al.  Multimodal pLSA on visual features and tags , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[15]  E. E. Ogheneovo Application of Vector Space Model to Query Ranking and Information Retrieval , 2016 .

[16]  Khalid Alfalqi,et al.  A Survey of Topic Modeling in Text Mining , 2015 .

[17]  이주연,et al.  Latent Dirichlet Allocation (LDA) 모델 기반의 인공지능(A.I.) 기술 관련 연구 활동 및 동향 분석 , 2018 .