Accelerating text-based plagiarism detection using GPUs

Plagiarism is known as an unauthorized use of other's contents in writing and ideas in thinking without proper acknowledgment. There are several tools implemented for text-based plagiarism detection using various methods and techniques. However, these tools become inefficient while handling a large number of datasets due to the process of plagiarism detection which comprises of a lot of computational tasks and large memory requirement. Therefore, when we deal with a large number of datasets, there should be a way to accelerate the process by applying acceleration techniques to optimize the plagiarism detection. In response to this, we have developed a parallel algorithm using Compute Unified Device Architecture (CUDA) and tested it on a Graphics Processing Unit (GPU) platform. An equivalent algorithm is run on CPU platform as well. From the comparison of the results, CPU shows better performance when the number and the size of the documents are small. Meantime, GPU is an effective and efficient platform when handling a large number of documents and high in data size due to the increase in the amount of parallelism. It was found out that for our dataset, the performance of the algorithm on the GPU platform is approximately 6x faster than CPU. Thus, introducing GPU based optimization algorithm to the plagiarism detection gives a real solution while handling a large number of data for inter-document plagiarism detection.

[1]  Maya Gokhale,et al.  Massively parallel acceleration of a document-similarity classifier to detect web attacks , 2011, J. Parallel Distributed Comput..

[2]  Fintan Culwin,et al.  Classifications of plagiarism detection engines , 2005 .

[3]  Janis Grundspenkis,et al.  Computer-based plagiarism detection methods and tools: an overview , 2007, CompSysTech '07.

[4]  Günther Specht,et al.  Detecting Plagiarism in Text Documents through Grammar-Analysis of Authors , 2013, BTW.

[5]  Roshan G. Ragel,et al.  AntiPlag: Plagiarism detection on electronic submissions of text based assignments , 2013, 2013 IEEE 8th International Conference on Industrial and Information Systems.

[6]  Sergey Butakov,et al.  The toolbox for local and global plagiarism detection , 2009, Comput. Educ..

[7]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[8]  Ping Li,et al.  Hashing Algorithms for Large-Scale Learning , 2011, NIPS.

[9]  Diego Cazorla,et al.  Similarity search implementations for multi-core and many-core processors , 2011, 2011 International Conference on High Performance Computing & Simulation.

[10]  Athena Vakali,et al.  PDetect: A Clustering Approach for Detecting Plagiarism in Source Code Datasets , 2005, Comput. J..

[11]  Jyuo-Min Shyu,et al.  Accelerating String Matching Using Multi-Threaded Algorithm on GPU , 2010, 2010 IEEE Global Telecommunications Conference GLOBECOM 2010.

[12]  Weihua Gui,et al.  Optimizing a Near-duplicate Document Detection System with SIMD Technologies , 2011 .

[13]  Jurriaan Hage,et al.  A comparison of plagiarism detection tools , 2010 .

[14]  Michael Luck,et al.  Plagiarism in programming assignments , 1999 .

[15]  Michael Philippsen,et al.  Finding Plagiarisms among a Set of Programs with JPlag , 2002, J. Univers. Comput. Sci..

[16]  Roshan G. Ragel,et al.  Plagiarism detection on electronic text based assignments using vector space model , 2014, 7th International Conference on Information and Automation for Sustainability.

[17]  Mark Fiala,et al.  A Real Time Augmented Reality System Using GPU Acceleration , 2012, 2012 Ninth Conference on Computer and Robot Vision.

[18]  Cheng-Hung Lin,et al.  Accelerating Regular Expression Matching Using Hierarchical Parallel Machines on GPU , 2011, 2011 IEEE Global Telecommunications Conference - GLOBECOM 2011.

[19]  Darshana Jayasinghe,et al.  Accelerating correlation power analysis using graphics processing units (GPUs) , 2014, 7th International Conference on Information and Automation for Sustainability.

[20]  Cristian Grozea,et al.  FPGA vs. Multi-core CPUs vs. GPUs: Hands-On Experience with a Sorting Application , 2010, Facing the Multicore-Challenge.

[21]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..