UNITOR: Combining Semantic Text Similarity functions through SV Regression

This paper presents the UNITOR system that participated to the SemEval 2012 Task 6: Semantic Textual Similarity (STS). The task is here modeled as a Support Vector (SV) regression problem, where a similarity scoring function between text pairs is acquired from examples. The semantic relatedness between sentences is modeled in an unsupervised fashion through different similarity functions, each capturing a specific semantic aspect of the STS, e. g. syntactic vs. lexical or topical vs. paradigmatic similarity. The SV regressor effectively combines the different models, learning a scoring function that weights individual scores in a unique resulting STS. It provides a highly portable method as it does not depend on any manually built resource (e.g. WordNet) nor controlled, e. g. aligned, corpus.

[1]  Magnus Sahlgren,et al.  The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .

[2]  李幼升,et al.  Ph , 1989 .

[3]  Richard Johansson,et al.  LTH: Semantic Structure Extraction using Nonprojective Dependency Trees , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[4]  Danilo Croce,et al.  Manifold Learning for the Semi-Supervised Induction of FrameNet Predicates: An Empirical Investigation , 2010 .

[5]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[6]  Rebecca Hwa,et al.  Regression for Sentence-Level MT Evaluation with Pseudo References , 2007, ACL.

[7]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[8]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[9]  Alessandro Moschitti,et al.  Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees , 2006, ECML.

[10]  Sarel van Vuuren,et al.  Evaluating Questions in Context , 2011, AAAI Fall Symposium: Question Generation.

[11]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[12]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[13]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[14]  Roberto Basili,et al.  Structured Lexical Similarity via Convolution Kernels on Dependency Trees , 2011, EMNLP.

[15]  Diana Inkpen,et al.  Semantic text similarity using corpus-based word similarity and string similarity , 2008, ACM Trans. Knowl. Discov. Data.

[16]  Julia Hirschberg,et al.  An Unsupervised Approach to Biography Production Using Wikipedia , 2008, ACL.

[17]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[18]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[19]  Roberto Basili,et al.  Space Projections as Distributional Models for Semantic Composition , 2012, CICLing.