Web Content Filtering Grab Based on Heritrix

On the basis of in-depth study of the system framework of Web-Crawler,Heritrix and its source code,adds the filtering functions on its target contents as well as the search scope.Realizes the function expansion through object-oriented analysis and design techniques,and constructs some classes that inherit the related classes in Heritrix.Makes Website collection experiment by runnning Hertrix in the campus network,and the result shows that the newly additional functions run well.