论文信息 - CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection Methods for Semantic Textual Similarity - 字舞流文

CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection Methods for Semantic Textual Similarity

We present our submitted systems for Semantic Textual Similarity (STS) Track 4 at SemEval-2017. Given a pair of Spanish-English sentences, each system must estimate their semantic similarity by a score between 0 and 5. In our submission, we use syntax-based, dictionary-based, context-based, and MT-based methods. We also combine these methods in unsupervised and supervised way. Our best run ranked 1st on track 4a with a correlation of 83.02% with human annotations.

Laurent Besacier | Jérémy Ferrero | Didier Schwab | Frédéric Agnès | L. Besacier | J. Ferrero | D. Schwab | Frédéric Agnès

[1] Alberto Barrón-Cedeño,et al. Cross-Language High Similarity Search Using a Conceptual Thesaurus , 2012, CLEF.

[2] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.

[3] Didier Schwab,et al. A Multilingual, Multi-style and Multi-granularity Dataset for Cross-language Textual Similarity Detection , 2016, LREC.

[4] Olivier Pietquin,et al. MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP , 2016, LREC.

[5] Slav Petrov,et al. A Universal Part-of-Speech Tagset , 2011, LREC.

[6] Máté Pataki. A new approach for searching translated plagiarism , 2012 .

[7] Hinrich Schütze,et al. Introduction to Information Retrieval: Scoring, term weighting, and the vector space model , 2008 .

[8] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.

[9] Ian H. Witten,et al. Induction of model trees for predicting continuous classes , 1996 .

[10] Benno Stein,et al. Cross-language plagiarism detection , 2011, Lang. Resour. Evaluation.

[11] Eneko Agirre,et al. SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation , 2016, *SEMEVAL.

[12] James Mayfield,et al. Character N-Gram Tokenization for European Language Text Retrieval , 2004, Information Retrieval.

[13] Steven Bethard,et al. DLS@CU: Sentence Similarity from Word Alignment and Semantic Vector Composition , 2015, *SEMEVAL.

[14] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .

[15] J. R. Quinlan. Learning With Continuous Classes , 1992 .

[16] Gilles Sérasset,et al. DBnary: Wiktionary as a Lemon-based multilingual lexical resource in RDF , 2015, Semantic Web.

[17] Tomas Brychcin,et al. UWB at SemEval-2016 Task 1: Semantic Textual Similarity using Lexical, Syntactic, and Semantic Information , 2016, *SEMEVAL.

[18] Eneko Agirre,et al. SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[19] Laurent Besacier,et al. Using Word Embedding for Cross-Language Plagiarism Detection , 2017, EACL.

[20] Frank Vanden Berghen,et al. CONDOR, a new parallel, constrained extension of Powell's UOBYQA algorithm: experimental results and comparison with the DFO algorithm , 2005 .

[21] Helmut Schmidt,et al. Probabilistic part-of-speech tagging using decision trees , 1994 .