A new sentence similarity assessment measure based on a three-layer sentence representation

Sentence similarity is used to measure the degree of likelihood between sentences. It is used in many natural language applications, such as text summarization, information retrieval, text categorization, and machine translation. The current methods for assessing sentence similarity represent sentences as vectors of bag of words or the syntactic information of the words in the sentence. The degree of likelihood between phrases is calculated by composing the similarity between the words in the sentences. Two important concerns in the area, the meaning problem and the word order, are not handled, however. This paper proposes a new sentence similarity assessment measure that largely improves and refines a recently published method that takes into account the lexical, syntactic and semantic components of sentences. The new method proposed here was benchmarked using a publically available standard dataset. The results obtained show that the new similarity assessment measure proposed outperforms the state of the art systems and achieve results comparable to the evaluation made by humans.

[1]  Ani Nenkova,et al.  The Pyramid Method: Incorporating human content selection variation in summarization evaluation , 2007, TSLP.

[2]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[3]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[4]  Frederic P. Miller,et al.  Levenshtein Distance: Information theory, Computer science, String (computer science), String metric, Damerau?Levenshtein distance, Spell checker, Hamming distance , 2009 .

[5]  Elena Lloret,et al.  Text summarisation in progress: a literature review , 2011, Artificial Intelligence Review.

[6]  Pushpak Bhattacharyya,et al.  Text Clustering using Semantics , 2002 .

[7]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[8]  Ani Nenkova,et al.  Summarization evaluation for text and speech: issues and approaches , 2006, INTERSPEECH.

[9]  Vlado Keselj,et al.  Text Similarity Using Google Tri-grams , 2012, Canadian Conference on AI.

[10]  Tao Liu,et al.  Text Similarity Computing Based on Standard Deviation , 2005, ICIC.

[11]  Diana Inkpen,et al.  Semantic text similarity using corpus-based word similarity and string similarity , 2008, ACM Trans. Knowl. Discov. Data.

[12]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[13]  Jeff Z. Pan,et al.  Resource Description Framework , 2020, Definitions.

[14]  Noah A. Smith,et al.  Probabilistic Frame-Semantic Parsing , 2010, NAACL.

[15]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[16]  George D. C. Cavalcanti,et al.  Assessing sentence scoring techniques for extractive text summarization , 2013, Expert Syst. Appl..

[17]  Rafael Dueire Lins,et al.  A New Sentence Similarity Method Based on a Three-Layer Sentence Representation , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[18]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[19]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[20]  Bingru Yang,et al.  Graph-based text representation model and its realization , 2010, Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010).

[21]  Andreas Hotho,et al.  A Brief Survey of Text Mining , 2005, LDV Forum.

[22]  Berthier A. Ribeiro-Neto,et al.  Image retrieval using multiple evidence ranking , 2004, IEEE Transactions on Knowledge and Data Engineering.

[23]  Chung-Hsien Wu,et al.  Psychiatric document retrieval using a discourse-aware model , 2009, Artif. Intell..

[24]  M. Dolores del Castillo,et al.  SyMSS: A syntax-based measure for short-text semantic similarity , 2011, Data Knowl. Eng..

[25]  Dragomir R. Radev,et al.  Summarization evaluation using relative utility , 2003, CIKM '03.

[26]  Dipanjan Das Andr,et al.  A Survey on Automatic Text Summarization , 2007 .

[27]  Furu Wei,et al.  A document-sensitive graph model for multi-document summarization , 2010, Knowledge and Information Systems.