Text classification based on the word subspace representation

In this paper, we propose a novel framework for text classification based on subspace-based methods. Recent studies showed the advantages of modeling texts as linear subspaces in a high-dimensional word vector space, to which we refer as word subspace. Therefore, we propose solving topic classification and sentiment analysis by using the word subspace along with different subspace-based methods. We explore the word embeddings geometry to decide which subspace-based method is more suitable for each task. We empirically demonstrate that a word subspace generated from sets of texts is a unique representation of a semantic topic that can be spanned by basis vectors derived from different texts. Therefore, texts can be classified by comparing their word subspace with the topic class subspaces. We achieve this framework by using the mutual subspace method that effectively handles multiple subspaces for classification. For sentiment analysis, as word embeddings do not necessarily consider sentiment information (i.e., opposite sentiment words have similar word vectors), we introduce the orthogonal mutual subspace method, to push opposite sentiment words apart. Furthermore, as there may be overlap between the sentiment class subspaces due to overlapping topics, we propose modeling a sentiment class by a set of multiple word subspaces, generated from each text belonging to the class. We further model the sentiment classes on a Grassmann manifold by using the Grassmann subspace method and its discriminative extension, the Grassmann orthogonal subspace method. We show the validity of each framework through experiments on four widely used datasets.

[1]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[2]  Jinjun Xiong,et al.  Document Similarity for Texts of Varying Lengths via Hidden Topics , 2018, ACL.

[3]  Jing-Hao Xue,et al.  Randomized time warping for motion recognition , 2016, Image Vis. Comput..

[4]  Iryna Gurevych,et al.  Concatenated Power Mean Word Embeddings as Universal Cross-Lingual Sentence Representations , 2018, 1803.01400.

[5]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[6]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[7]  N. Ahmed,et al.  Discrete Cosine Transform , 1996 .

[8]  Pramod Viswanath,et al.  Geometry of Polysemy , 2016, ICLR.

[9]  George Tsatsaronis,et al.  EigenSent: Spectral sentence embeddings using higher-order Dynamic Mode Decomposition , 2019, ACL.

[10]  Mona T. Diab,et al.  Efficient Sentence Embedding using Discrete Cosine Transform , 2019, EMNLP/IJCNLP.

[11]  R. Clarke,et al.  Theory and Applications of Correspondence Analysis , 1985 .

[12]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[13]  Yunming Ye,et al.  Locality Reconstruction Models for Book Representation , 2018, IEEE Transactions on Knowledge and Data Engineering.

[14]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[15]  Françoise Chatelin Eigenvalues of Matrices: Revised Edition , 2012 .

[16]  Chenguang Zhu,et al.  Parameter-free Sentence Embedding via Orthogonal Basis , 2019, EMNLP/IJCNLP.

[17]  Ana Margarida de Jesus,et al.  Improving Methods for Single-label Text Categorization , 2007 .

[18]  Pramod Viswanath,et al.  Representing Sentences as Low-Rank Subspaces , 2017, ACL.

[19]  Keinosuke Fukunaga,et al.  Application of the Karhunen-Loève Expansion to Feature Selection and Ordering , 1970, IEEE Trans. Computers.

[20]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[21]  Hinrich Schütze,et al.  Intrinsic Subspace Evaluation of Word Embedding Representations , 2016, ACL.

[22]  Kazuhiro Fukui,et al.  Text Classification Based On Word Subspace With Term-Frequency , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[23]  Felix Hill,et al.  Learning Distributed Representations of Sentences from Unlabelled Data , 2016, NAACL.

[24]  Atsuto Maki,et al.  Difference Subspace and Its Generalization for Subspace-Based Methods , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[26]  W. Ledermann,et al.  Eigenvalues of matrices , 2012 .

[27]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[28]  Daniel D. Lee,et al.  Grassmann discriminant analysis: a unifying view on subspace-based learning , 2008, ICML '08.

[29]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[30]  Eulanda M. dos Santos,et al.  Hankel subspace method for efficient gesture representation , 2017, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).

[31]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[32]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[33]  S. Afriat Orthogonal and oblique projectors and the characteristics of pairs of vector spaces , 1957, Mathematical Proceedings of the Cambridge Philosophical Society.

[34]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[35]  Osamu Yamaguchi,et al.  Face Recognition Using Multi-viewpoint Patterns for Robot Vision , 2003, ISRR.

[36]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[37]  Vikas Raunak,et al.  Effective Dimensionality Reduction for Word Embeddings , 2017, RepL4NLP@ACL.

[38]  Dean P. Foster,et al.  Eigenwords: spectral word embeddings , 2015, J. Mach. Learn. Res..

[39]  Soledad Le Clainche Martínez,et al.  Higher Order Dynamic Mode Decomposition , 2017, SIAM J. Appl. Dyn. Syst..