Learning from hotlists and coldlists: towards a WWW information filtering and seeking agent

We describe a software agent that learns to find information on the World Wide Web (WWW), deciding what new pages might interest a user. The agent maintains a separate hotlist (for links that were interesting) and coldlist (for links that were not interesting) for each topic. By analyzing the information immediately accessible from each link, the agent learns the types of information the user is interested in. This can be used to inform the user when a new interesting page becomes available or to order the user's exploration of unseen existing links so that the more promising ones are investigated first. We compare four different learning algorithms on this task. We describe an experiment in which a simple Bayesian classifier acquires a user profile that agrees with a user's judgment over 90% of the time.