PlagAL: Plagiarism detection system for Albanian texts

Nowadays the amount of information that is on the web is considerable. Students who prepare projects, seminars or diploma theses can copy others ideas, results or papers from from the internet. The information they copy and use as their own without referencing is plagiarism. To prevent this, several plagiarism detections programs have been developed in recent years, but most Kosovo Universities do not have their plagiarism system. In this paper, we proposed system for identifying cross-language plagiarism and text plagiarism in Albanian and English languages. This paper is expected to increase the quality and responsibility in universities and educational institutions by monitoring student work through this system. The system is web based and developed with Python and Php programming languages. In this project we have used the Rabin Karp algorithm to identify plagiarism and some heuristics are used to determine the location of the copied text and identifies the language of the uploaded document to the system. Also, we used multiprocessing to make the system work faster. Our system has been evaluated on University “Ukshin Hot” dataset. The experiments results show that our system can be used for cross-language plagiarism and text plagiarism detection.