Construction of web search engine supporting intelligent Chinese word segmentation

Chinese word segmentation has a vital effect on the precision and the recall of web search engine for Chinese.By analyzing an open source web search engine Nutch,a scalable lexical analyzer is implemented based on JavaCC.Then through integrating it with Nutch,a web search engine NutchEnhanced which supports intelligent Chinese word segmentation is constructed,and is used as a plat-form for testing the effect of various Chinese word segmentation algorithms in search engine.The experimental result show,for Chinese query,NutchEnhanced outperforms Nutch on the precision.With recall of 0.74 and precision of top 30 results getting 0.86,its Chinese search quality is as good as Google and Baidu in general.