Human-inspired semantic similarity between sentences

Abstract Following the Principle of Compositionality, the meaning of a complex expression is influenced, to some extent, not only by the meanings of its individual words, but also the structural way the words are assembled. Compositionality has been a central research issue for linguists and psycholinguists. However, it remains unclear how does syntax influence the meaning of a sentence. In this paper, we propose an interdisciplinary approach to better understand that relation. We present an empirical study that seeks for the different weights given by humans to different syntactic roles when computing semantic similarity. In order to test the validity of the hypotheses derived from the psychological study, we use a computational paradigm. We incorporate the results of that study to a psychologically plausible computational measure of semantic similarity. The results shown by this measure in terms of correlation with human judgments on a paraphrase recognition task confirm the different importance that humans give to different syntactic roles in the computation of semantic similarity. This results contrast with generative grammar theories but support neurolinguistic evidence.

[1]  James F. Allen Natural language understanding , 1987, Bejnamin/Cummings series in computer science.

[2]  Sharlene D. Newman,et al.  Neural bases of event knowledge and syntax integration in comprehension of complex sentences , 2015, Neurocase.

[3]  Adele E. Goldberg,et al.  The contribution of argument structure constructions to sentence meaning , 2000 .

[4]  D. De Ridder,et al.  Neural correlates of high frequency repetitive transcranial magnetic stimulation improvement in post-stroke non-fluent aphasia: A case study , 2014, Neurocase.

[5]  Peter Wiemer-Hastings,et al.  Adding syntactic information to LSA , 2000 .

[6]  Tharam S. Dillon,et al.  Tree model guided candidate generation for mining frequent subtrees from XML documents , 2008, TKDD.

[7]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[8]  Ray Jackendoff,et al.  A Parallel Architecture perspective on language processing , 2007, Brain Research.

[9]  Peter Wiemer-Hastings,et al.  All parts are not created equal: SIAM-LSA , 2004 .

[10]  Erik Cambria,et al.  Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article] , 2014, IEEE Computational Intelligence Magazine.

[11]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[12]  Diana Inkpen,et al.  Semantic text similarity using corpus-based word similarity and string similarity , 2008, ACM Trans. Knowl. Discov. Data.

[13]  Christopher C. Yang,et al.  Addressing the Variability of Natural Language Expression in Sentence Similarity with Semantic Structure of the Sentences , 2009, PAKDD.

[14]  David M. W. Powers,et al.  Verb similarity on the taxonomy of WordNet , 2006 .

[15]  Rada Mihalcea,et al.  Measuring the Semantic Similarity of Texts , 2005, EMSEE@ACL.

[16]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[17]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[18]  Dong-Yul Ra,et al.  Techniques for improving web retrieval effectiveness , 2005, Inf. Process. Manag..

[19]  Kenneth de Jong,et al.  Evolutionary computation: a unified approach , 2007, GECCO.

[20]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[21]  Brian MacWhinney,et al.  The Handbook of Child Language , 1995 .

[22]  Zuhair Bandar,et al.  A Comparative Study of Two Short Text Semantic Similarity Measures , 2008, KES-AMSTA.

[23]  Robert L. Goldstone Similarity, interactive activation, and mapping , 1994 .

[24]  Peter Wiemer-Hastings,et al.  Rules for Syntax, Vectors for Semantics , 2001 .

[25]  Peter M. Wiemer-Hastings,et al.  How Latent is Latent Semantic Analysis? , 1999, IJCAI.

[26]  G. Miller,et al.  The verb as the main determinant of sentence meaning , 1970 .

[27]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[28]  Ramiz M. Aliguliyev,et al.  A new sentence similarity measure and sentence based extractive technique for automatic text summarization , 2009, Expert Syst. Appl..

[29]  M. Dolores del Castillo,et al.  SyMSS: A syntax-based measure for short-text semantic similarity , 2011, Data Knowl. Eng..