The research of performance of Lucene's Chinese tokenizer

Almost all the websites and content management systems provided full-text search. The developer and customer always focus on the full-text search module during the process of develop and maintain the website and content management system. In order to meet the demands of customers for the full-text search capability, we must study the existing Chinese tokenizer. This article will examine four kinds of Chinese tokenizer and compare their performance.

[1]  Kui Yi,et al.  Design of Paper Duplicate Detection System Based on Lucene , 2010, 2010 Asia-Pacific Conference on Wearable Computing Systems.