论文信息 - A machine learning approach to web page filtering using content and structure analysis

A machine learning approach to web page filtering using content and structure analysis

Web filtering is an inductive process which automatically builds a filter by learning the description of user interest from a set of pre-assigned web pages, and uses the filter to assign unprocessed web pages. In web filtering, content similarity analysis is the core problem, the automatic-learning and relativity-analysis abilities of machine learning algorithms help solve the above problems and make ML useful in web filtering. While in practical applications, different filtering task implies different userinterest and thus implies different filtering result. This work studies how to adjust the web filtering results to be more fit for the user interest. The web filtering result are divided into three categories: relative pages, similar pages and homologous pages according to different user interest. A Biased Support Vector Machine (BSVM) algorithm, which imports a stimulant function, uses training examples distribution n+/n−− and a user-adaptable parameter k to deal imbalancedly different classes of the pre-assigned pages, is introduced to adjust the filtering result to be best fit for the user interest. Experiments show that BSVM can greatly improve the web filtering performance.

Bin Li | Binxing Fang | A-Ning Du | Hsinchun Chen | M. Chau

[1] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[2] Andrew McCallum,et al. A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[3] Peter Clark,et al. The CN2 Induction Algorithm , 1989, Machine Learning.

[4] J. Ross Quinlan,et al. Induction of Decision Trees , 1986, Machine Learning.

[5] Nicholas J. Belkin,et al. Information filtering and information retrieval: two sides of the same coin? , 1992, CACM.

[6] Donald Michie,et al. Expert systems in the micro-electronic age , 1979 .

[7] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[8] Pavel Brazdil,et al. Proceedings of the European Conference on Machine Learning , 1993 .