Self-adaptive user profiles for large-scale data delivery

Push based data delivery requires knowledge of user interests for making scheduling, bandwidth allocation, and routing decisions. Such information is maintained as user profiles. We propose a novel incremental algorithm for constructing user profiles based on monitoring and user feedback. In contrast to earlier approaches, which typically represent profiles as a single weighted interest vector, we represent user profiles as multiple interest vectors, whose number, size, and elements change adaptively based on user access behavior. This flexible approach allows the profile to more accurately represent complex user interests. Although there has been significant research on user profiles, our approach is unique in that it can be tuned to trade-off profile complexity and quality. This feature, together with its incremental nature, makes our method suitable for use in large scale information filtering applications such as push based WWW page dissemination. We evaluate the method by experimentally investigating its ability to categorize WWW pages taken from Yahoo! categories. Our results show that the method can provide high filtering effectiveness with modest profile sizes and can effectively adapt to changes in users' interests.

[1]  Nicholas J. Belkin,et al.  Information filtering and information retrieval: two sides of the same coin? , 1992, CACM.

[2]  Calton Pu,et al.  CQ: a personalized update monitoring toolkit , 1998, SIGMOD '98.

[3]  Gerald Salton,et al.  Automatic text processing , 1988 .

[4]  Donna K. Harman,et al.  Overview of the Fifth Text REtrieval Conference (TREC-5) , 1996, TREC.

[5]  James P. Callan,et al.  Document filtering with inference networks , 1996, SIGIR '96.

[6]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[7]  David J. Harper,et al.  The WebCluster project. Using clustering for mediating access to the World Wide Web , 1998, SIGIR '98.

[8]  Susan T. Dumais,et al.  Personalized information delivery: an analysis of information filtering methods , 1992, CACM.

[9]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[10]  Howard R. Turtle Natural language vs. Boolean query evaluation: a comparison of retrieval performance , 1994, SIGIR '94.

[11]  James Allan,et al.  Incremental relevance feedback for information filtering , 1996, SIGIR '96.

[12]  Michael J. Franklin,et al.  Flexible User Profiles for Large Scale Data Delivery , 1999 .

[13]  Hector Garcia-Molina,et al.  SIFT - a Tool for Wide-Area Information Dissemination , 1995, USENIX.

[14]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[15]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[16]  Javed Mostafa,et al.  Detection of shifts in user interests for personalized information filtering , 1996, SIGIR '96.

[17]  W. Bruce Croft,et al.  INQUERY System Overview , 1993, TIPSTER.

[18]  Jennifer Widom,et al.  Representing and querying changes in semistructured data , 1998, Proceedings 14th International Conference on Data Engineering.

[19]  Stanley B. Zdonik,et al.  DBIS-toolkit: adaptable middleware for large scale data delivery , 1999, SIGMOD '99.

[20]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR.

[21]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[22]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[23]  IJsbrand Jan Aalbersberg,et al.  Incremental relevance feedback , 1992, SIGIR '92.