A corpus-based semantic kernel for text classification by using meaning values of terms

Text categorization plays a crucial role in both academic and commercial platforms due to the growing demand for automatic organization of documents. Kernel-based classification algorithms such as Support Vector Machines (SVM) have become highly popular in the task of text mining. This is mainly due to their relatively high classification accuracy on several application domains as well as their ability to handle high dimensional and sparse data which is the prohibitive characteristics of textual data representation. Recently, there is an increased interest in the exploitation of background knowledge such as ontologies and corpus-based statistical knowledge in text categorization. It has been shown that, by replacing the standard kernel functions such as linear kernel with customized kernel functions which take advantage of this background knowledge, it is possible to increase the performance of SVM in the text classification domain. Based on this, we propose a novel semantic smoothing kernel for SVM. The suggested approach is based on a meaning measure, which calculates the meaningfulness of the terms in the context of classes. The documents vectors are smoothed based on these meaning values of the terms in the context of classes. Since we efficiently make use of the class information in the smoothing process, it can be considered a supervised smoothing kernel. The meaning measure is based on the Helmholtz principle from Gestalt theory and has previously been applied to several text mining applications such as document summarization and feature extraction. However, to the best of our knowledge, ours is the first study to use meaning measure in a supervised setting to build a semantic kernel for SVM. We evaluated the proposed approach by conducting a large number of experiments on well-known textual datasets and present results with respect to different experimental conditions. We compare our results with traditional kernels used in SVM such as linear kernel as well as with several corpus-based semantic kernels. Our results show that classification performance of the proposed approach outperforms other kernels.

[1]  Stephan Bloehdorn,et al.  Combined Syntactic and Semantic Kernels for Text Classification , 2007, ECIR.

[2]  Steven J. Simske,et al.  On helmholtz's principle for documents processing , 2010, DocEng '10.

[3]  William M. Pottenger,et al.  A Framework for Understanding LSI Performance , 2004 .

[4]  Steven J. Simske,et al.  Document sentences as a small world , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[5]  Stephan Bloehdorn,et al.  Semantic Kernels for Text Classification Based on Topological Measures of Feature Similarity , 2006, Sixth International Conference on Data Mining (ICDM'06).

[6]  Steven J. Simske,et al.  On the Helmholtz Principle for Data Mining , 2012, 2012 Third International Conference on Emerging Security Technologies.

[7]  Banu Diri,et al.  A novel higher-order semantic kernel for text classification , 2013, 2013 International Conference on Electronics, Computer and Computation (ICECCO).

[8]  Peter Wittek,et al.  A kernel-based feature weighting for text classification , 2009, 2009 International Joint Conference on Neural Networks.

[9]  Stephen E. Robertson,et al.  Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.

[10]  William M. Pottenger,et al.  Higher Order Naïve Bayes: A Novel Non-IID Approach to Text Classification , 2011, IEEE Transactions on Knowledge and Data Engineering.

[11]  William M. Pottenger,et al.  A framework for understanding Latent Semantic Indexing (LSI) performance , 2006, Inf. Process. Manag..

[12]  Steven J. Simske,et al.  Rapid change detection and text mining , 2011 .

[13]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[14]  Iraklis Varlamis,et al.  Semantic smoothing for text clustering , 2013, Knowl. Based Syst..

[15]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[16]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[17]  Roberto Basili,et al.  A Semantic Kernel to Classify Texts with Very Few Training Examples , 2006, Informatica.

[18]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[19]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[20]  M. Kubát An Introduction to Machine Learning , 2017, Springer International Publishing.

[21]  Ian Witten,et al.  Data Mining , 2000 .

[22]  Myoung-Ho Kim,et al.  Information Retrieval Based on Conceptual Distance in is-a Hierarchies , 1993, J. Documentation.

[23]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[24]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[25]  Florence d'Alché-Buc,et al.  Support Vector Machines based on a semantic kernel for text categorization , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[26]  Hui Xiong,et al.  A semantic term weighting scheme for text categorization , 2011, Expert Syst. Appl..

[27]  Pei-Ying Zhang A HowNet-Based Semantic Relatedness Kernel for Text Classification , 2013 .

[28]  Carlotta Domeniconi,et al.  Building semantic kernels for text classification using wikipedia , 2008, KDD.

[29]  Gerard Salton,et al.  On the Specification of Term Values in Automatic Indexing , 1973 .

[30]  Banu Diri,et al.  A Semantic Kernel for Text Classification Based on Iterative Higher-Order Relations between Words and Documents , 2014, ICAISC.

[31]  Banu Diri,et al.  A simple semantic kernel approach for SVM using higher-order paths , 2014, 2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA) Proceedings.

[32]  Nello Cristianini,et al.  Learning Semantic Similarity , 2002, NIPS.

[33]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[34]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[35]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[36]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[37]  James M. Peters,et al.  A Knowledge-Based , 1988 .

[38]  Qi Hu,et al.  Supervised word sense disambiguation using semantic diffusion kernel , 2014, Eng. Appl. Artif. Intell..

[39]  Gerhard Weikum,et al.  Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification , 2005, PKDD.

[40]  William M. Pottenger,et al.  Leveraging Higher Order Dependencies Between Features for Text Classification , 2009 .

[41]  Murat Can Ganiz,et al.  A Novel Semantic Smoothing Method Based on Higher Order Paths for Text Classification , 2012, 2012 IEEE 12th International Conference on Data Mining.

[42]  Murat Can Ganiz,et al.  Higher-Order Smoothing: A Novel Semantic Smoothing Method for Text Classification , 2014, Journal of Computer Science and Technology.

[43]  T. Theeramunkong,et al.  Analysis of inverse class frequency in centroid-based text classification , 2004, IEEE International Symposium on Communications and Information Technology, 2004. ISCIT 2004..

[44]  Jean-Michel Morel,et al.  From Gestalt Theory to Image Analysis: A Probabilistic Approach , 2007 .

[45]  H WittenIan,et al.  The WEKA data mining software , 2009 .

[46]  Youngjoong Ko,et al.  Automatic Text Categorization by Unsupervised Learning , 2000, COLING.

[47]  Ziqi Zhang,et al.  Recent advances in methods of lexical semantic relatedness – a survey , 2012, Natural Language Engineering.

[48]  Stan Matwin,et al.  Text Classification Using WordNet Hypernyms , 1998, WordNet@ACL/COLING.

[49]  Gilles Bisson,et al.  Chi-Sim: A New Similarity Measure for the Co-clustering Task , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[50]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[51]  Iraklis Varlamis,et al.  Text Relatedness Based on a Word Thesaurus , 2010, J. Artif. Intell. Res..

[52]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[53]  van Gerardus Noord,et al.  Special issue: finite state methods in natural language processing , 2003 .

[54]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[55]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.