Comparison of Topic Modeling Methods for Type Detection of Turkish News

Today, with the increase of Internet-based documents, we are presented with many data that need to be processed and evaluated. Media, news and advertising are some of the areas where these data are evaluated. For the news, the classification of people in the media sector is an important problem in terms of time. In this paper, it is aimed to determine which types of news titles belong to. The dataset consists of 4200 Turkish new titles belonging to 7 class labels. In order to determine the types, classical Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA) and Non-Negative Matrix Factorization (NMF) algorithms were used in topic modeling. In addition, the LDA-based n-LDA method was also used. The accuracy of all methods used was measured and compared. NMF was the most successful method for three classes, while for five and seven classes LSA was the most successful method.

[1]  David Buttler,et al.  Exploring Topic Coherence over Many Models and Many Topics , 2012, EMNLP.

[2]  Daniel M. Dunlavy,et al.  TopicView: Visually Comparing Topic Models of Text Collections , 2011, 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence.

[3]  Sonia Bergamaschi,et al.  Comparing Topic Models for a Movie Recommendation System , 2014, WEBIST.

[4]  Yuxin Chen,et al.  An experimental comparison between NMF and LDA for active cross-situational object-word learning , 2016, 2016 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).

[5]  Nihar Ranjan Roy,et al.  Comparison between LDA & NMF for event-detection from large text stream data , 2017, 2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT).

[6]  Vili Podgorelec,et al.  Text classification method based on self-training and LDA topic models , 2017, Expert Syst. Appl..

[7]  Michael W. Berry,et al.  Algorithms and applications for approximate nonnegative matrix factorization , 2007, Comput. Stat. Data Anal..

[8]  Akira Utsumi Evaluating the performance of nonnegative matrix factorization for constructing semantic spaces: Comparison to latent semantic analysis , 2010, 2010 IEEE International Conference on Systems, Man and Cybernetics.

[9]  Banu Diri,et al.  Classification of New Titles by Two Stage Latent Dirichlet Allocation , 2018, 2018 Innovations in Intelligent Systems and Applications Conference (ASYU).

[10]  Michael W. Berry,et al.  Email Surveillance Using Non-negative Matrix Factorization , 2005, Comput. Math. Organ. Theory.

[11]  Mariano Sigman,et al.  The interpretation of dream meaning: Resolving ambiguity using Latent Semantic Analysis in a small corpus of text , 2017, Consciousness and Cognition.

[12]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[13]  Khalid Alfalqi,et al.  A Survey of Topic Modeling in Text Mining , 2015 .

[14]  Hui Zhang,et al.  Experimental explorations on short text topic mining between LDA and NMF based Schemes , 2019, Knowl. Based Syst..

[15]  Amy Nicole Langville,et al.  Algorithms, Initializations, and Convergence for the Nonnegative Matrix Factorization , 2014, ArXiv.

[16]  Oshin Vartanian,et al.  Assessing the Big Five personality traits with latent semantic analysis , 2016 .