A Mixed Strategy Topic Crawler Based On Network Log Analysis

This article provides a mixed strategy topic crawler which is based on network log analysis in order to adapt the dynamics and integrality of topic. Firstly, through network log analysis,new seeds are discovered to extend web community and users ’ interest is mined which makes the further description of the topic possible. In addition,according to the new seeds, with the application of the mixed strategy, the crawler filters the pages by referring to page user interest. Experiment results show that this system can fetch more topic pages efficiently.