A survey on textual semantic classification algorithms

This paper provides a broad overview of three popular textual semantic classification algorithms used both in the industry and in the scientific community. The three algorithms are TF-IDF, Latent Semantic Analysis and Latent Dirichlet Allocation. We selected these three algorithms because they are the foundation of semantic classification and they are still widely used in the field of semantic classification. Firstly, this paper exhibits the inner workings of each of the algorithm both in the original authors intuition and the mathematical model utilized. Next, we discuss the advantages of each of the algorithms based on recent and credible research papers and articles. We also critically dissect the limitations of each of the algorithms. Lastly, we provide a general argument on the way forward in improving of the algorithms. This paper aims to give a general understanding on these algorithms which we hope will spur more research in improving the field of semantic classification.

[1]  Geoffrey Zweig,et al.  Polarity Inducing Latent Semantic Analysis , 2012, EMNLP.

[2]  Gene H. Golub,et al.  Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.

[3]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[4]  Colorado Reed Latent Dirichlet Allocation: Towards a Deeper Understanding , 2012 .

[5]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[6]  Rajkumar Darbar,et al.  Using Hall Effect Sensors for 3D Space Text Entry on Smartwatches , 2015, IHCI.

[7]  Izzatdin Abdul Aziz,et al.  Inference Algorithms in Latent Dirichlet Allocation for Semantic Classification , 2017 .

[8]  Man Lan,et al.  A comparative study on term weighting schemes for text categorization , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[9]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[10]  Le Minh Nguyen,et al.  Text analytics in industry: Challenges, desiderata and trends , 2016, Comput. Ind..

[11]  S. Dumais Latent Semantic Analysis. , 2005 .

[12]  Kai-Wei Chang,et al.  Multi-Relational Latent Semantic Analysis , 2013, EMNLP.

[13]  Bart Van Looy,et al.  Exploring the feasibility and accuracy of Latent Semantic Analysis based text mining techniques to detect similarity between patent documents and scientific publications , 2010, Scientometrics.

[14]  A. Jayapal Topic Models - Latent Dirichlet Allocation , 2014 .

[15]  James R. Foulds,et al.  Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation , 2013, KDD.

[16]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[17]  Gerard J. Tellis,et al.  Extracting Dimensions of Consumer Satisfaction with Quality from Online Chatter: Strategic Brand Analysis of Big Data Using Latent Dirichlet Allocation , 2014 .

[18]  Izzatdin Abdul Aziz,et al.  CO2 corrosion rate determination mechanism implementing de Waard-Milliams model for oil & gas pipeline , 2016, 2016 3rd International Conference on Computer and Information Sciences (ICCOINS).

[19]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[20]  Fritz Günther,et al.  LSAfun - An R package for computations based on Latent Semantic Analysis , 2014, Behavior Research Methods.

[21]  Juan Enrique Ramos,et al.  Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .

[22]  Menno van Zaanen,et al.  Automatic Mood Classification Using TF*IDF Based on Lyrics , 2010, ISMIR.

[23]  Xi Chen,et al.  Sparse Latent Semantic Analysis , 2011, SDM.

[24]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[25]  Kam-Fai Wong,et al.  Interpreting TF-IDF term weights as making relevance decisions , 2008, TOIS.

[26]  Andrew Olney,et al.  Generalizing Latent Semantic Analysis , 2009, 2009 IEEE International Conference on Semantic Computing.

[27]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[28]  Akiko Aizawa,et al.  An information-theoretic perspective of tf-idf measures , 2003, Inf. Process. Manag..

[29]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[30]  William M. Darling A Theoretical and Practical Implementation Tutorial on Topic Modeling and Gibbs Sampling , 2011 .