Best Parameter Selection Of Rabin-Karp Algorithm In Detecting Document Similarity

Text mining is usually used to detect document similarities and plagiarism. The field of education is one area that is prone to plagiarism. Plagiarism can kill someone's creativity because this action does not require energy and does not have to think hard. Therefore, the act of plagiarism must be prevented from causing harm to various parties. By using matching strings on documents, it can be used to detect plagiarism. One method that can be used is Rabin-Karp Algorithm, but in several studies that have been done the researchers did not test the k-gram value and database value, in theory, this would affect the performance of the Rabin-Karp Algorithm. Therefore in this study, the selection of k-gram values and prime bases was conducted to determine the effect on the performance of the Rabin-Karp Algorithm. The results showed that the selection of gram values and prime bases affected the processing time in testing the data and the similarity values of the documents being tested. In this study the value of k = 5 on k-gram has the fastest time for the testing process, both testing with multiple data 25 and testing the data for all amounts of data the number is 300.

[2]  Ahmad Fathan Hidayatullah,et al.  Analysis of Stemming Influence on Indonesian Tweet Classification , 2016 .

[3]  Sunu Wibirama,et al.  Real-time traffic classification with Twitter data mining , 2016, 2016 8th International Conference on Information Technology and Electrical Engineering (ICITEE).

[4]  Rizqi Bayu Aji Pradana Automatic Essay Grading System Menggunakan Metode Latent Semantic Analysis , 2011 .

[5]  Hafiz Ridha Pramudita PENERAPAN ALGORITMA STEMMING NAZIEF & ADRIANI DAN SIMILARITY PADA PENERIMAAN JUDUL THESIS , 2014 .

[6]  Naomie Salim,et al.  Survey of Text Plagiarism Detection , 2012 .

[7]  Sonawane Kiran Shivaji,et al.  Plagiarism Detection by using Karp-Rabin and String Matching Algorithm Together , 2015 .

[8]  James O. Hamblen,et al.  Computer algorithms for plagiarism detection , 1989 .

[9]  Eko Hariyanto,et al.  Combination of levenshtein distance and rabin-karp to improve the accuracy of document equivalence level , 2018 .

[10]  Syeda Shabnam Hasan,et al.  Approximate String Matching Algorithms: A Brief Survey and Comparison , 2015 .

[11]  Robbi Rahim,et al.  K-Gram As A Determinant Of Plagiarism Level in Rabin-Karp Algorithm , 2017 .

[12]  Teguh Bharata Adji,et al.  Stemming Influence on Similarity Detection of Abstract Written in Indonesia , 2016 .

[13]  Anton Yudhana,et al.  Implementation of Pattern Matching Algorithm for Portable Document Format , 2017 .

[14]  Andysah Putera Utama Siahaan,et al.  Rabin-Karp Elaboration in Comparing Pattern Based on Hash Data , 2018 .