论文信息 - DPIL@FIRE2016: Overview of the Shared task on Detecting Paraphrases in Indian language

DPIL@FIRE2016: Overview of the Shared task on Detecting Paraphrases in Indian language

This paper explains the overview of the shared task "Detecting Paraphrases in Indian Languages" (DPIL) conducted at FIRE 2016. Given a pair of sentences in the same language, participants are asked to detect the semantic equivalence between the sentences. The shared task is proposed for four Indian languages namely Tamil, Malayalam, Hindi and Punjabi. The dataset created for the shared task has been made available online and it is the first open-source paraphrase detection corpora for Indian languages.

P SomanK. | M. Anand Kumar | Shivkaran Singh | B Kavirajan

[1] Chris Brockett,et al. Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.

[2] Sumam Mary Idicula,et al. Fingerprinting based detection system for identifying plagiarism in Malayalam text documents , 2015, 2015 International Conference on Computing and Network Communications (CoCoNet).

[3] Elena Yagunova,et al. Construction of a Russian Paraphrase Corpus: Unsupervised Paraphrase Extraction , 2015, RuSSIR.

[4] Ditty Mathew,et al. Paraphrase identification of malayalam sentences - an experience , 2013, 2013 Fifth International Conference on Advanced Computing (ICoAC).

[5] Vasudeva Varma,et al. Cross Lingual Text Reuse Detection Based on Keyphrase Extraction and Similarity Measures , 2011, FIRE.

[6] Chris Callison-Burch,et al. SemEval-2015 Task 1: Paraphrase and Semantic Similarity in Twitter (PIT) , 2015, *SEMEVAL.

[7] Vasile Rus,et al. On Paraphrase Identification Corpora , 2014, LREC.

[8] M. Anand Kumar,et al. Paraphrase Detection for Tamil language using Deep learning algorithms , 2015, Big Data 2015.

[9] Benno Stein,et al. An Evaluation Framework for Plagiarism Detection , 2010, COLING.

[10] Chris Callison-Burch,et al. Extracting Lexically Divergent Paraphrases from Twitter , 2014, TACL.