Fusing Syntax and Word Embedding Knowledge for Measuring Semantic Similarity

The explosive growth of information makes it an important issue to effectively mine useful information from massive information. Text is an important carrier of information, so the processing and analysis of text has become one of the hot spots of data mining and information retrieval. Sentence similarity is the basis of most text-related tasks. The majority of current approaches leverage pairwise similarity characteristics to represent text pairs. Unlike the current approaches, we propose a new method to analyze and quantify the semantic textual similarity between sentences by encoding semantic knowledge based on word embedding into the syntax tree of sentences. We use SemEval-2012 task to test our approach and evaluate the performance with two widely used benchmarks:the Spearman and Pearson correlations, the experimental results show that compared with the best systems of semantic textual similarity (STS) task, our method can effectively improve the accuracy of sentence similarity judgment.

[1]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[2]  Alessandro Moschitti,et al.  Building structures from classifiers for passage reranking , 2013, CIKM.

[3]  Alessandro Moschitti,et al.  Structural relationships for large-scale learning of answer re-ranking , 2012, SIGIR '12.

[4]  Paolo Rosso,et al.  Silhouette + attraction: A simple and effective method for text clustering† , 2015, Natural Language Engineering.

[5]  Alon Lavie,et al.  The Meteor metric for automatic evaluation of machine translation , 2009, Machine Translation.

[6]  M. de Rijke,et al.  Short Text Similarity with Word Embeddings , 2015, CIKM.

[7]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[8]  Eneko Agirre,et al.  SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[9]  Philip Resnik,et al.  WordNet and Distributional Analysis: A Class-based Approach to Lexical Discovery , 1992, AAAI 1992.

[10]  Iryna Gurevych,et al.  Using Wiktionary for Computing Semantic Relatedness , 2008, AAAI.

[11]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[12]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[13]  Alexander F. Gelbukh,et al.  Semantic Textual Similarity Methods, Tools, and Applications: A Survey , 2016, Computación y Sistemas.

[14]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[15]  Oren Etzioni,et al.  Open question answering over curated and extracted knowledge bases , 2014, KDD.

[16]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[17]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[18]  Shrikanth S. Narayanan,et al.  Distributional Semantic Models for Affective Text Analysis , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[20]  Mona T. Diab Semantic Textual Similarity: past present and future , 2013, JSSP.

[21]  Alessandro Moschitti,et al.  Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees , 2006, ECML.

[22]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[23]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[24]  Holger Schwenk,et al.  Building and using multimodal comparable corpora for machine translation† , 2016, Natural Language Engineering.

[25]  Rada Mihalcea,et al.  Measuring the Semantic Similarity of Texts , 2005, EMSEE@ACL.

[26]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[27]  Arantxa Otegi,et al.  Using knowledge-based relatedness for information retrieval , 2014, Knowledge and Information Systems.

[28]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[29]  Roberto Navigli,et al.  From senses to texts: An all-in-one graph-based approach for measuring semantic similarity , 2015, Artif. Intell..