Latent Support Measure Machines for Bag-of-Words Data Classification

In many classification problems, the input is represented as a set of features, e.g., the bag-of-words (BoW) representation of documents. Support vector machines (SVMs) are widely used tools for such classification problems. The performance of the SVMs is generally determined by whether kernel values between data points can be defined properly. However, SVMs for BoW representations have a major weakness in that the co-occurrence of different but semantically similar words cannot be reflected in the kernel calculation. To overcome the weakness, we propose a kernel-based discriminative classifier for BoW data, which we call the latent support measure machine (latent SMM). With the latent SMM, a latent vector is associated with each vocabulary term, and each document is represented as a distribution of the latent vectors for words appearing in the document. To represent the distributions efficiently, we use the kernel embeddings of distributions that hold high order moment information about distributions. Then the latent SMM finds a separating hyperplane that maximizes the margins between distributions of different classes while estimating latent vectors for words to improve the classification performance. In the experiments, we show that the latent SMM achieves state-of-the-art accuracy for BoW text classification, is robust with respect to its own hyper-parameters, and is useful to visualize words.

[1]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[4]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[5]  Yuji Matsumoto,et al.  Chunking with Support Vector Machines , 2001, NAACL.

[6]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[7]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[8]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[9]  Hsin-Hsi Chen,et al.  Emotion Classification Using Web Blog Corpora , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).

[10]  Timothy W. Finin,et al.  SVMs for the Blogosphere: Blog Identification and Splog Detection , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[11]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[12]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  Bernhard Schölkopf,et al.  Learning from Distributions via Support Measure Machines , 2012, NIPS.

[15]  Naonori Ueda,et al.  Probabilistic latent semantic visualization: topic model for visualizing documents , 2008, KDD.

[16]  Ana Margarida de Jesus,et al.  Improving Methods for Single-label Text Categorization , 2007 .

[17]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[18]  Yunqian Ma,et al.  Practical selection of SVM parameters and noise estimation for SVM regression , 2004, Neural Networks.

[19]  Eric P. Xing,et al.  MedLDA: maximum margin supervised topic models for regression and classification , 2009, ICML '09.

[20]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[21]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[22]  Chih-Jen Lin,et al.  A Comparison of Methods for Multi-class Support Vector Machines , 2015 .