Sentence paraphrase detection: When determiners and word order make the difference

Researchers working on distributional semantics have recently taken up the challenge of going beyond lexical meaning and tackle the issue of compositionality. Several Compositional Distributional Semantics Models (CDSMs) have been developed and promising results have been obtained in evaluations carried out against data sets of small phrases and as well as data sets of sentences. However, we believe there is the need to further develop good evaluation tasks that show whether CDSM truly capture compositionality. To this end, we present an evaluation task that highlights some differences among the CDSMs currently available by challenging them in detecting semantic differences caused by word order switch and by determiner replacements. We take as starting point simple intransitive and transitive sentences describing similar events, that we consider to be paraphrases of each other but not of the foil paraphrases we generate from them. Only the models sensitive to word order and determiner phrase meaning and their role in the sentence composition will not be captured into the foils’ trap.

[1]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[2]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[3]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[4]  Mirella Lapata,et al.  Language Models Based on Semantic Composition , 2009, EMNLP.

[5]  Raffaella Bernardi,et al.  Analyzing Interactive QA Dialogues Using Logistic Regression Models , 2009, AI*IA.

[6]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[7]  G. Karypis,et al.  Criterion Functions for Document Clustering ∗ Experiments and Analysis , 2001 .

[8]  Scott A. McDonald,et al.  Environmental Determinants of Lexical Processing Effort , 2000 .

[9]  Mehrnoosh Sadrzadeh,et al.  Experimenting with transitive verbs in a DisCoCat , 2011, GEMS.

[10]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[11]  Mehrnoosh Sadrzadeh,et al.  Experimental Support for a Categorical Compositional Distributional Model of Meaning , 2011, EMNLP.

[12]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[13]  George Karypis,et al.  CLUTO - A Clustering Toolkit , 2002 .

[14]  Mehrnoosh Sadrzadeh,et al.  Multi-Step Regression Learning for Compositional Distributional Semantics , 2013, IWCS.

[15]  Stephen Clark,et al.  Mathematical Foundations for a Compositional Distributional Model of Meaning , 2010, ArXiv.

[16]  Marco Baroni,et al.  Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space , 2010, EMNLP.

[17]  William B. Dolan,et al.  Collecting Highly Parallel Data for Paraphrase Evaluation , 2011, ACL.

[18]  Mark Steyvers,et al.  Topics in semantic representation. , 2007, Psychological review.

[19]  Mirella Lapata,et al.  A Comparison of Vector-based Representations for Semantic Composition , 2012, EMNLP.

[20]  Stephen Clark Type-Driven Syntax and Semantics for Composing Meaning Vectors , 2013, Quantum Physics and Linguistics.

[21]  Peter D. Turney Domain and Function: A Dual-Space Model of Semantic Relations and Compositions , 2012, J. Artif. Intell. Res..

[22]  E. Guevara A Regression Model of Adjective-Noun Compositionality in Distributional Semantics , 2010 .

[23]  Raffaella Bernardi,et al.  Entailment above the word level in distributional semantics , 2012, EACL.

[24]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[25]  Walter Kintsch,et al.  Predication , 2001, Cogn. Sci..