Learning Class-Informed Semantic Similarity

Exponential kernel, which models semantic similarity by means of a diffusion process on a graph defined by lexicon and co-occurrence information, has been successfully applied to the task of text categorization. However, the diffusion is an unsupervised process, which fails to exploit the class information in a supervised classification scenario. To address the limitation, we present a class-informed exponential kernel to make use of the class knowledge of training documents in addition to the co-occurrence knowledge. The basic idea is to construct an augmented term-document matrix by encoding class information as additional terms and appending to training documents. Diffusion is then performed on the augmented term-document matrix. In this way, the words belonging to the same class are indirectly drawn closer to each other, hence the class-specific word correlations are strengthened. The proposed approach was demonstrated with several variants of the popular 20Newsgroup data set.

[1]  Carlo Strapparava,et al.  Domain Kernels for Text Categorization , 2005, CoNLL.

[2]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[3]  Sutanu Chakraborti,et al.  Supervised Latent Semantic Indexing Using Adaptive Sprinkling , 2007, IJCAI.

[4]  Murat Can Ganiz,et al.  A corpus-based semantic kernel for text classification by using meaning values of terms , 2015, Eng. Appl. Artif. Intell..

[5]  Sutanu Chakraborti,et al.  Sprinkling: Supervised Latent Semantic Indexing , 2006, ECIR.

[6]  Sutanu Chakraborti,et al.  Sprinkling Topics for Weakly Supervised Text Classification , 2014, ACL.

[7]  Stephan Bloehdorn,et al.  Semantic Kernels for Text Classification Based on Topological Measures of Feature Similarity , 2006, Sixth International Conference on Data Mining (ICDM'06).

[8]  Nello Cristianini,et al.  Learning Semantic Similarity , 2002, NIPS.

[9]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[10]  Nello Cristianini,et al.  Latent Semantic Kernels , 2001, Journal of Intelligent Information Systems.

[11]  Carlotta Domeniconi,et al.  Building semantic kernels for text classification using wikipedia , 2008, KDD.

[12]  William M. Pottenger,et al.  A Software Infrastructure for Research in Textual Data Mining , 2004, Int. J. Artif. Intell. Tools.

[13]  Jian Zhong,et al.  Text Classification Using SVM with Exponential Kernel , 2014, CIT 2014.

[14]  Qi Hu,et al.  Supervised word sense disambiguation using semantic diffusion kernel , 2014, Eng. Appl. Artif. Intell..

[15]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.