Machine learning used by Personal WebWatcher

This paper describes design of personal browsing assistant Personal WebWatcher that suggests interesting hyperlinks on the requested Web documents. Machine learning is used to generate a model of user's interests. We consider two approaches that di er in the information included in training examples: (1) include information presented to the user, that is a part of the text from the document that contains a hyperlink and (2) include information that was not presented to the user, that is the content of the document pointed to by a hyperlink. We compare two classication algorithms k-Nearest Neighbor and Naive Bayes. Bag of words document representation is used and features are selected using Information gain. Preliminary experiments show that there is no signi cant difference between the used classi ers and that using only a small number of features gives almost the same results as using all features. In all experiments the achieved classi cation accuracy is the same or slightly higher than the default accuracy. Since the default accuracy is higher for approach (1) than for approach (2), the results of approach (1) show higher classi cation accuracy.