A HowNet-Based Semantic Relatedness Kernel for Text Classification

The exploitation of the semantic relatedness kernel has always been an appealing subject in the context of text retrieval and information management. Typically, in text classification the documents are represented in the vector space using the bag-of-words (BOW) approach. The BOW approach does not take into account the semantic relatedness information. To further improve the text classification performance, this paper presents a new semantic-based kernel of support vector machine algorithm for text classification. This method firstly using CHI method to select document feature vectors, secondly calculates the feature vector weights using TF-IDF method, and utilizes the semantic relatedness kernel which involves the semantic similarity computation and semantic relevance computation to classify the document using support vector machines. Experimental results show that compared with the traditional support vector machine algorithm, the algorithm in the text classification achieves improved classification F1-measure. DOI:  http://dx.doi.org/10.11591/telkomnika.v11i4.2361

[1]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[2]  Stephan Bloehdorn,et al.  Combined Syntactic and Semantic Kernels for Text Classification , 2007, ECIR.

[3]  D. Uribe Domain Adaptation in Sentiment Classification , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[4]  Gerhard Weikum,et al.  Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification , 2005, PKDD.

[5]  Nello Cristianini,et al.  Latent Semantic Kernels , 2001, Journal of Intelligent Information Systems.

[6]  Marko Grobelnik,et al.  Interaction of Feature Selection Methods and Linear Classification Models , 2002 .

[7]  Fitri Maya Puspita,et al.  An Improved Optimization Model of Internet Charging Scheme in Multi Service Networks , 2012 .

[8]  Myoung-Ho Kim,et al.  Information Retrieval Based on Conceptual Distance in is-a Hierarchies , 1993, J. Documentation.

[9]  Slava Kisilevich,et al.  Classification of Web Documents Using Concept Extraction from Ontologies , 2007, AIS-ADM.

[10]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[11]  Yang Zhexi,et al.  Informatization Expectation with Cloud Computing in China , 2012 .

[12]  Minghu Wang,et al.  Sparse Representation for Detection of Microcalcification Clusters , 2012 .

[13]  Eneko Agirre,et al.  A Proposal for Word Sense Disambiguation using Conceptual Distance , 1995, ArXiv.

[14]  Qun Liu,et al.  基於《知網》的辭彙語義相似度計算 (Word Similarity Computing Based on How-net) [In Chinese] , 2002, ROCLING/IJCLCLP.

[15]  Lillian Lee,et al.  Similarity-Based Approaches to Natural Language Processing , 1997, ArXiv.

[16]  Chu-Ren Huang,et al.  A Framework of Feature Selection Methods for Text Categorization , 2009, ACL.

[17]  Roberto Basili,et al.  A Semantic Kernel to Classify Texts with Very Few Training Examples , 2006, Informatica.

[18]  Vasudeva Varma,et al.  Applying Lexical Semantics to Improve Text Classification by , .

[19]  Florence d'Alché-Buc,et al.  Support Vector Machines based on a semantic kernel for text categorization , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[20]  Iraklis Varlamis,et al.  A Knowledge-Based Semantic Kernel for Text Classification , 2011, SPIRE.

[21]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[22]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[23]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[24]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .