BITS_PILANI@DPIL-FIRE2016: Paraphrase Detection in Hindi Language using Syntactic Features of Phrase

Paraphrasing means expressing or conveying the same meaning or essence of a sentence or text using different words or rearrangement of words. Paraphrase detection is a challenge, especially in Indian languages like Hindi, because it is very essential to understand the semantics of the language. Detecting paraphrases is very relevant in real life because it has a lot of importance in applications like Information Retrieval, Extraction and Text Summarization. This paper focuses on using Machine Learning classification techniques for detecting paraphrases in Hindi language for the DPIL Task in Fire 2016. A feature vector based approach has been used for detecting paraphrases. The task involves checking whether a given pair of sentences conveys the same information and meaning even if they are written in different forms. Given a pair of sentences in Hindi, the proposed technique labels whether the pair of sentences are Paraphrases (P), Semi-Paraphrases (SP) or Not Paraphrases (NP).