Semantic Enrichments in Text Supervised Classification: Application to Medical Domain

The use of semantics in supervised text classification can improve its effectiveness especially in specific domains. Most state of the art works use concepts as an alternative to words in order to transform the classical bag of words (BOW) into a Bag of concepts (BOC). This transformation is done through conceptualization task. Furthermore, the resulting BOC can be enriched using other related concepts from semantic resources. This enrichment may enhance classification effectiveness as well. This paper focuses on two strategies for semantic enrichment of conceptualized text representation. The first one is based on semantic kernel method while the second one is based on enriching vectors method. These two semantic enrichment strategies are evaluated through experiments using Rocchio as the supervised classification method in the medical domain, using UMLS ontology and Ohsumed corpus.

[1]  Hoa A. Nguyen,et al.  A Cluster-Based Approach for Semantic Similarity in the Biomedical Domain , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[2]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[3]  Diarmuid Ó Séaghdha Semantic Classification with WordNet Kernels , 2009, HLT-NAACL.

[4]  Yong Yu,et al.  Conceptual Graph Matching for Semantic Search , 2002, ICCS.

[5]  Sébastien Fournier,et al.  The Impact of Conceptualization on Text Classification , 2012, WISE.

[6]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[7]  Ian H. Witten,et al.  Learning a concept-based document similarity measure , 2012, J. Assoc. Inf. Sci. Technol..

[8]  Carlotta Domeniconi,et al.  Building semantic kernels for text classification using wikipedia , 2008, KDD.

[9]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[10]  Stephan Bloehdorn,et al.  Boosting for Text Classification with Semantic Features , 2004, WebKDD.

[11]  George Karypis,et al.  Centroid-Based Document Classification: Analysis and Experimental Results , 2000, PKDD.

[12]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[13]  James J. Cimino,et al.  Towards the development of a conceptual distance metric for the UMLS , 2004, J. Biomed. Informatics.

[14]  Younès Bennani,et al.  Semi-structured document categorization with a semantic kernel , 2009, Pattern Recognit..

[15]  Stephan Bloehdorn,et al.  Combined Syntactic and Semantic Kernels for Text Classification , 2007, ECIR.

[16]  David Sánchez,et al.  Ontology-based semantic similarity: A new feature-based approach , 2012, Expert Syst. Appl..

[17]  Paolo Rosso,et al.  Does Semantic Information Help in the Text Categorization Task? , 2008 .

[18]  Steffen Staab,et al.  Text Clustering Based on Background Knowledge , 2003 .