Financial topical crawler based on SVM prediction

With the rapid growth of information and the explosion of web pages from the World Wide Web,it gets harder for general crawlers to retrieve the information relevant to a user.Topical crawlers are becoming important tools to gather web pages on a specific topic.Training set of topical crawler based on classifier prediction comes from different kinds of Web contents,but most of classifier can predict according to some links information of parent Web pages in actual condition.As being different kinds of information between training and testing,the accuracy of this kind of classifier is low.SVM classifier is used in this paper to train the contexts and anchors of URLs,and train different information from different character selection methods,the DF and information gain to contrast experiment results based on all sorts of factors which will impact on classifier.It can validate that there is of very high accuracy in actual prediction when classifier being on-line experiments.