论文信息 - UNITOR: Combining Semantic Text Similarity functions through SV Regression - 字舞流文

UNITOR: Combining Semantic Text Similarity functions through SV Regression

This paper presents the UNITOR system that participated to the SemEval 2012 Task 6: Semantic Textual Similarity (STS). The task is here modeled as a Support Vector (SV) regression problem, where a similarity scoring function between text pairs is acquired from examples. The semantic relatedness between sentences is modeled in an unsupervised fashion through different similarity functions, each capturing a specific semantic aspect of the STS, e. g. syntactic vs. lexical or topical vs. paradigmatic similarity. The SV regressor effectively combines the different models, learning a scoring function that weights individual scores in a unique resulting STS. It provides a highly portable method as it does not depend on any manually built resource (e.g. WordNet) nor controlled, e. g. aligned, corpus.

Roberto Basili | Danilo Croce | Paolo Annesi | Valerio Storch

[1] Magnus Sahlgren,et al. The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .

[2] 李幼升,et al. Ph , 1989 .

[3] Richard Johansson,et al. LTH: Semantic Structure Extraction using Nonprojective Dependency Trees , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[4] Danilo Croce,et al. Manifold Learning for the Semi-Supervised Induction of FrameNet Predicates: An Empirical Investigation , 2010 .

[5] Silvia Bernardini,et al. The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[6] Rebecca Hwa,et al. Regression for Sentence-Level MT Evaluation with Pseudo References , 2007, ACL.

[7] Mehran Sahami,et al. A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[8] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[9] Alessandro Moschitti,et al. Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees , 2006, ECML.

[10] Sarel van Vuuren,et al. Evaluating Questions in Context , 2011, AAAI Fall Symposium: Question Generation.

[11] Bo Pang,et al. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[12] Mirella Lapata,et al. Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[13] Bernhard Schölkopf,et al. A tutorial on support vector regression , 2004, Stat. Comput..

[14] Roberto Basili,et al. Structured Lexical Similarity via Convolution Kernels on Dependency Trees , 2011, EMNLP.

[15] Diana Inkpen,et al. Semantic text similarity using corpus-based word similarity and string similarity , 2008, ACM Trans. Knowl. Discov. Data.

[16] Julia Hirschberg,et al. An Unsupervised Approach to Biography Production Using Wikipedia , 2008, ACL.

[17] T. Landauer,et al. A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[18] Carlo Strapparava,et al. Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[19] Roberto Basili,et al. Space Projections as Distributional Models for Semantic Composition , 2012, CICLing.