UNITOR-CORE_TYPED: Combining Text Similarity and Semantic Filters through SV Regression

This paper presents the UNITOR system that participated in the *SEM 2013 shared task on Semantic Textual Similarity (STS). The task is modeled as a Support Vector (SV) regression problem, where a similarity scoring function between text pairs is acquired from examples. The proposed approach has been implemented in a system that aims at providing high applicability and robustness, in order to reduce the risk of over-fitting over a specific datasets. Moreover, the approach does not require any manually coded resource (e.g. WordNet), but mainly exploits distributional analysis of unlabeled corpora. A good level of accuracy is achieved over the shared task: in the Typed STS task the proposed system ranks in 1st and 2nd position.

[1]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[2]  Alexander F. Gelbukh,et al.  Soft Cardinality: A Parameterized Similarity Function for Text Comparison , 2012, *SEMEVAL.

[3]  Jan Snajder,et al.  TakeLab: Systems for Measuring Semantic Text Similarity , 2012, *SEMEVAL.

[4]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[5]  Rada Mihalcea,et al.  UNT: A Supervised Synergistic Approach to Semantic Text Similarity , 2012, *SEMEVAL.

[6]  Roberto Basili,et al.  Space Projections as Distributional Models for Semantic Composition , 2012, CICLing.

[7]  Roberto Basili,et al.  Structured Lexical Similarity via Convolution Kernels on Dependency Trees , 2011, EMNLP.

[8]  Alessandro Moschitti,et al.  Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees , 2006, ECML.

[9]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[10]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[11]  Roberto Basili,et al.  UNITOR: Combining Semantic Text Similarity functions through SV Regression , 2012, SemEval@NAACL-HLT.

[12]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[13]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[14]  Danilo Croce,et al.  Manifold Learning for the Semi-Supervised Induction of FrameNet Predicates: An Empirical Investigation , 2010 .

[15]  Eneko Agirre,et al.  *SEM 2013 shared task: Semantic Textual Similarity , 2013, *SEMEVAL.

[16]  Eneko Agirre,et al.  SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[17]  Roberto Basili,et al.  Verb Classification using Distributional Similarity in Syntactic and Semantic Structures , 2012, ACL.

[18]  Diana Inkpen,et al.  Semantic text similarity using corpus-based word similarity and string similarity , 2008, ACM Trans. Knowl. Discov. Data.

[19]  Roberto Basili,et al.  Distributional Compositional Semantics and Text Similarity , 2012, 2012 IEEE Sixth International Conference on Semantic Computing.

[20]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.