A fast chinese web-document clustering method under Pareto’s Principle
暂无分享,去创建一个
Nowadays most search engine like Google, Baidu, demonstrate their query results by the value of item, listing them in several pages. As we are now in an age of information explosion, the number of pages will be huge and users have to glance over several before they get what they want. If we cluster the results, this problem will be solved. There are several clustering methods, but not quite accurate and efficient, especially when the result sets are consist of millions of items. this article describe an fast method under Paretopsilas Principle.
[1] Oren Etzioni,et al. Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.
[2] Naftali Tishby,et al. Document clustering using word clusters via the information bottleneck method , 2000, SIGIR '00.
[3] Oren Etzioni,et al. Web document clustering: a feasibility demonstration , 1998, SIGIR '98.