A Hybrid Approach for Detection of Plagiarism using Natural Language Processing

Detection of plagiarism in research papers selected for conference publications and journals, or in assignments handed over for evaluation is very important. It ensures the work submitted is free from any copied content. Commercially available plagiarism detection tools suffer from certain drawbacks: these tools are unable to detect plagiarism if the grammatical construct of the sentence is changed or if the words used in a sentence are blindly replaced by their synonyms. This project attempts to improve the efficacy of plagiarism detection tools by using the concepts of natural language processing and text mining so as to ensure that these tools are not fooled by the aforementioned changes made in the semantics of the language used in the paper. It proposes a framework for detection of plagiarism that not only analyses the sentences forming the document but also its structure and semantics.