A fast chinese web-document clustering method under Pareto’s Principle

Nowadays most search engine like Google, Baidu, demonstrate their query results by the value of item, listing them in several pages. As we are now in an age of information explosion, the number of pages will be huge and users have to glance over several before they get what they want. If we cluster the results, this problem will be solved. There are several clustering methods, but not quite accurate and efficient, especially when the result sets are consist of millions of items. this article describe an fast method under Paretopsilas Principle.