Similar Detection Algorithm Research Based on the Features Keyword of Web Page

To solve near-replicas of large-scale Web pages crawled by search engine,a similarity dealing algorithm was proposed based on terms extracted from the Web pages.The algorithm reduces the scale of Web pages that to be processed and improves efficiency largely.