Study on a Text Reuse Measurement Method Using Expanded Index Term

Text reuse has become prominent in the process of information content digitalization owing to the popularization of the Internet and smartphones. Problems related to text reuse are various and complex, and these include problems related to text insertion, deletion, and replacement, and changing of word order. Moreover, in order to inspect reuse in texts with many sources, there must be an efficient method to inspect within a reasonable amount of time and using a reasonable amount of resources. This work is an attempt to improve accuracy of text reuse measurement by using expanded index terms, expanding the range of reused inspection sentences, and circularizing words in order to resolve the issue of undetected reused sentences that arise from the replacement of similar terms. The efficiency of the proposed method was proven through a comparative evaluation with the existing reuse inspection methods.