Learning and Revising User Profiles: The Identification of Interesting Web Sites

We discuss algorithms for learning and revising user profiles that can determine which World Wide Web sites on a given topic would be interesting to a user. We describe the use of a naive Bayesian classifier for this task, and demonstrate that it can incrementally learn profiles from user feedback on the interestingness of Web sites. Furthermore, the Bayesian classifier may easily be extended to revise user provided profiles. In an experimental evaluation we compare the Bayesian classifier to computationally more intensive alternatives, and show that it performs at least as well as these approaches throughout a range of different domains. In addition, we empirically analyze the effects of providing the classifier with background knowledge in form of user defined profiles and examine the use of lexical knowledge for feature selection. We find that both approaches can substantially increase the prediction accuracy.

[1]  David D. Lewis,et al.  Representation and Learning in Information Retrieval , 1991 .

[2]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[3]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[4]  David W. Aha,et al.  Towards a Better Understanding of Memory-based Reasoning Systems , 1994, ICML.

[5]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[6]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[7]  Gerald Salton,et al.  Automatic text processing , 1988 .

[8]  David B. Skalak,et al.  Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms , 1994, ICML.

[9]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[10]  Steven Salzberg,et al.  A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features , 2004, Machine Learning.

[11]  Michael J. Pazzani,et al.  Syskill & Webert: Identifying Interesting Web Sites , 1996, AAAI/IAAI, Vol. 1.

[12]  David D. Lewis,et al.  Text categorization of low quality images , 1995 .

[13]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[14]  Thorsten Joachims,et al.  WebWatcher : A Learning Apprentice for the World Wide Web , 1995 .

[15]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[16]  M. E. Maron,et al.  Automatic Indexing: An Experimental Inquiry , 1961, JACM.

[17]  Donna Harman The Second Text Retrieval Conference (TREC-2) | NIST , 1994 .

[18]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[19]  Henry Lieberman,et al.  Letizia: An Agent That Assists Web Browsing , 1995, IJCAI.

[20]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[21]  Yoav Shoham,et al.  An Adaptive Agent for Automated Web Browsing , 1997 .

[22]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[23]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[24]  J. Kittler Feature selection and extraction , 1978 .

[25]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[26]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .