Stemming Influence on Similarity Detection of Abstract Written in Indonesia

In this paper we would like to discuss about stemming effect by using Nazief and Adriani algorithm against similarity detection result of Indonesian written abstract. The contents of the publication abstract similarity detection can be used as an early indication of whether or not the act of plagiarism in a writing. Mostly in processing the text adding a pre-process, one of it which is called a stemming by changing the word into the root word in order to maximize the searching process. The result of stemming process will be changed as a certain word n-gram set then applied an analysis of similarity using Fingerprint Matching to perform similarity matching between text. Based on the F 1 -score which used to balance the precision and recall number, the detection that implements stemming and stopword removal has a better result in detecting similarity between the text with an average is 42%. It is higher comparing to the similarity detection by using only stemming process (31%) or the one that was done without involving the text pre-process (34%) while applying the bigram.