Classifying Extremely Short Texts by Exploiting Semantic Centroids in Word Mover's Distance Space

Automatically classifying extremely short texts, such as social media posts and web page titles, plays an important role in a wide range of content analysis applications. However, traditional classifiers based on bag-of-words (BoW) representations often fail in this task. The underlying reason is that the document similarity can not be accurately measured under BoW representations due to the extreme sparseness of short texts. This results in significant difficulty to capture the generality of short texts. To address this problem, we use a better regularized word mover's distance (RWMD), which can measure distances among short texts at the semantic level. We then propose a RWMD-based centroid classifier for short texts, named RWMD-CC. Basically, RWMD-CC computes a representative semantic centroid for each category under the RWMD measure, and predicts test documents by finding the closest semantic centroid. The testing is much more efficient than the prior art of K nearest neighbor classifier based on WMD. Experimental results indicate that our RWMD-CC can achieve very competitive classification performance on extremely short texts.

[1]  Jiafeng Guo,et al.  BTM: Topic Modeling over Short Texts , 2014, IEEE Transactions on Knowledge and Data Engineering.

[2]  Bo Fu,et al.  Integrating Topic Modeling with Word Embeddings by Mixtures of vMFs , 2016, COLING.

[3]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[4]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[5]  Andreas Maurer,et al.  A Vector-Contraction Inequality for Rademacher Complexities , 2016, ALT.

[6]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[7]  Yue Wang,et al.  Filtering out the noise in short text topic modeling , 2018, Inf. Sci..

[8]  Cícero Nogueira dos Santos,et al.  Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts , 2014, COLING.

[9]  Richard Sinkhorn Diagonal equivalence to matrices with prescribed row and column sums. II , 1967 .

[10]  V. Bogachev,et al.  The Monge-Kantorovich problem: achievements, connections, and perspectives , 2012 .

[11]  Jihong Ouyang,et al.  Short text topic modeling by exploring original documents , 2017, Knowledge and Information Systems.

[12]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[13]  Yuan Zuo,et al.  Word network topic model: a simple but general solution for short and imbalanced texts , 2014, Knowledge and Information Systems.

[14]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[15]  Lin Wu,et al.  Iterative Views Agreement: An Iterative Low-Rank Based Structured Optimization Method to Multi-View Spectral Clustering , 2016, IJCAI.

[16]  Minyi Guo,et al.  A class-feature-centroid classifier for text categorization , 2009, WWW '09.

[17]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[18]  Songbo Tan,et al.  Using hypothesis margin to boost centroid text classifier , 2007, SAC '07.

[19]  C. Villani Optimal Transport: Old and New , 2008 .

[20]  Lin Wu,et al.  Multiview Spectral Clustering via Structured Low-Rank Matrix Factorization , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Nan Sun,et al.  Exploiting internal and external semantics for the clustering of short texts using world knowledge , 2009, CIKM.

[22]  John N. Tsitsiklis,et al.  Introduction to linear optimization , 1997, Athena scientific optimization and computation series.

[23]  Susumu Horiguchi,et al.  Learning to classify short and sparse text & web with hidden topics from large-scale data collections , 2008, WWW.

[24]  Peng Wang,et al.  Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification , 2016, Neurocomputing.

[25]  Michael Werman,et al.  Fast and robust Earth Mover's Distances , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[26]  Tao Wang,et al.  Entropy-Based Term Weighting Schemes for Text Categorization in VSM , 2015, 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI).

[27]  Qiang Yang,et al.  Query enrichment for web-query classification , 2006, TOIS.

[28]  Ying Li,et al.  Detecting online commercial intention (OCI) , 2006, WWW '06.

[29]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[30]  Michael Biehl,et al.  Adaptive Relevance Matrices in Learning Vector Quantization , 2009, Neural Computation.

[31]  Lin Wu,et al.  Robust Subspace Clustering for Multi-View Data by Exploiting Correlation Consensus , 2015, IEEE Transactions on Image Processing.

[32]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[33]  Jihong Ouyang,et al.  Dataless Text Classification: A Topic Modeling Approach with Document Manifold , 2018, CIKM.

[34]  Zhi-Hua Zhou,et al.  Label Distribution Learning by Optimal Transport , 2018, AAAI.

[35]  Aixin Sun,et al.  Short text classification using very few words , 2012, SIGIR '12.

[36]  Bo Yang,et al.  A Pseudo Label based Dataless Naive Bayes Algorithm for Text Classification with Seed Words , 2018, COLING.

[37]  Matt J. Kusner,et al.  Supervised Word Mover's Distance , 2016, NIPS.

[38]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[39]  Yang Song,et al.  Query suggestion by constructing term-transition graphs , 2012, WSDM '12.

[40]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[41]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[42]  Zhoujun Li,et al.  Concept-based Short Text Classification and Ranking , 2014, CIKM.

[43]  Hossein Mobahi,et al.  Learning with a Wasserstein Loss , 2015, NIPS.

[44]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.