Using Sentence Semantic Similarity to Improve LMF Standardized Arabic Dictionary Quality

This paper presents a novel algorithm to measure semantic similarity between sentences. It will introduce a method that takes into account of not only semantic knowledge but also syntactico-semantic knowledge notably semantic predicate, semantic class and thematic role. Firstly, semantic similarity between sentences is derived from words synonymy. Secondly, syntactico-semantic similarity is computed from the common semantic class and thematic role of words in the sentence. Indeed, this information is related to semantic predicate. Finally, semantic similarity is computed as a combination of lexical similarity, semantic similarity and syntactico-semantic similarity using a supervised learning. The proposed algorithm is applied to detect the information redundancy in LMF Arabic dictionary especially the definitions and the examples of lexical entries. Experimental results show that the proposed algorithm reduces the redundant information to improve the content quality of LMF Arabic dictionary.

[1]  Mohamed S. Kamel,et al.  New Semantic Similarity Based Model for Text Clustering Using Extended Gloss Overlaps , 2009, MLDM.

[2]  Vasile Rus,et al.  A Comparison of Greedy and Optimal Assessment of Natural Language Student Input Using Word-to-Word Similarity Metrics , 2012, BEA@NAACL-HLT.

[3]  Bilel Gargouri,et al.  Modélisation des paradigmes de flexion des verbes arabes selon la norme LMF - ISO 24613 , 2007 .

[4]  Vasile Rus,et al.  A Sentence Similarity Method Based on Chunking and Information Content , 2014, CICLing.

[5]  Abdelmajid Ben Hamadou,et al.  ISO standard modeling of a large Arabic dictionary , 2015, Natural Language Engineering.

[6]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[7]  Ted Pedersen,et al.  Information Content Measures of Semantic Similarity Perform Better Without Sense-Tagged Text , 2010, NAACL.

[8]  Wen Zhou,et al.  Sentence Similarity Measure Based on Events and Content Words , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[9]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[10]  Claudia Soria,et al.  Lexical Markup Framework (LMF) , 2006, LREC.

[11]  Abdelmajid Ben Hamadou,et al.  Supervised Learning to Measure the Semantic Similarity Between Arabic Sentences , 2015, ICCCI.

[12]  Abdelmajid Ben Hamadou,et al.  Using Standardized Lexical Semantic Knowledge to Measure Similarity , 2014, KSEM.

[13]  N. H. N. D. de Silva,et al.  Sentence similarity measuring by vector space model , 2014, 2014 14th International Conference on Advances in ICT for Emerging Regions (ICTer).

[14]  Diana Inkpen,et al.  Semantic text similarity using corpus-based word similarity and string similarity , 2008, ACM Trans. Knowl. Discov. Data.

[15]  Jan Snajder,et al.  TakeLab: Systems for Measuring Semantic Text Similarity , 2012, *SEMEVAL.

[16]  Jia Wei Chang,et al.  A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences , 2014, TheScientificWorldJournal.

[17]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[18]  Christopher D. Manning,et al.  Random Walks for Text Semantic Similarity , 2009, Graph-based Methods for Natural Language Processing.

[19]  M. Dolores del Castillo,et al.  SyMSS: A syntax-based measure for short-text semantic similarity , 2011, Data Knowl. Eng..

[20]  Vasile Rus,et al.  An Optimal Assessment of Natural Language Student Input Using Word-to-Word Similarity Metrics , 2012, ITS.

[21]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[22]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[23]  Peter D. Turney Measuring Semantic Similarity by Latent Relational Analysis , 2005, IJCAI.

[24]  Danushka Bollegala,et al.  A Web Search Engine-Based Approach to Measure Semantic Similarity between Words , 2011, IEEE Transactions on Knowledge and Data Engineering.

[25]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[26]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[27]  Jun Wang,et al.  Measuring sentence similarity from different aspects , 2009, 2009 International Conference on Machine Learning and Cybernetics.

[28]  Cui Baojiang,et al.  Sentence Similarity Based on Semantic Vector Model , 2014, 2014 Ninth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing.