Discovering user communities on the Internet using unsupervised machine learning techniques

Interest in the analysis of user behaviour on the Internet has been increasing rapidly, especially since the advent of electronic commerce. In this context, we argue here for the usefulness of constructing communities of users with common behaviour, making use of machine learning techniques. In particular, we assume that the users of any service on the Internet constitute a large community and we aim to construct smaller communities of users with common characteristics. The paper presents the results of three case studies for three different types of Internet service: a digital library, an information broker and a Web site. Particular attention is paid on the different types of information access involved in the three case studies: query-based information retrieval, profile-based information filtering and Web-site navigation. Each type of access imposes different constraints on the representation of the learning task. Two different unsupervised learning methods are evaluated: conceptual clustering and cluster mining. One of our main concerns is the construction of meaningful communities that can be used for improving information access on the Internet. Analysis of the results in the three case studies brings to surface some of the important properties of the task, suggesting the feasibility of a common methodology for the three different types of information access on the Internet.

[1]  Inderjeet Mani,et al.  Machine Learning of User Profiles: Representational Issues , 1996, AAAI/IAAI, Vol. 1.

[2]  Pattie Maes,et al.  Agents that reduce work and information overload , 1994, CACM.

[3]  Jaideep Srivastava,et al.  Discovery of Interesting Usage Patterns from Web Data , 1999, WEBKDD.

[4]  Douglas H. Fisher,et al.  Iterative Optimization and Simplification of Hierarchical Clusterings , 1996, J. Artif. Intell. Res..

[5]  Pat Langley,et al.  User modeling in adaptive interfaces , 1999 .

[6]  Giovanni Guida,et al.  User modeling in intelligent information retrieval , 1987, Inf. Process. Manag..

[7]  Oren Etzioni,et al.  Adaptive Web Sites: Automatically Synthesizing Web Pages , 1998, AAAI/IAAI.

[8]  Peter Brusilovsky,et al.  User as Student: Towards an Adaptive Interface for Advanced Web-Based Applications , 1997 .

[9]  T. Joachims WebWatcher : A Tour Guide for the World Wide Web , 1997 .

[10]  Georgios Paliouras,et al.  From Web usage statistics to Web usage analysis , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[11]  Bhavani Raskutti,et al.  Acquiring User Preferences for Information Filtering in Interactive Multi-Media Services , 1996, PRICAI.

[12]  Steven J. Plimpton,et al.  Massively parallel methods for engineering and science problems , 1994, CACM.

[13]  Thorsten Joachims,et al.  Web Watcher: A Tour Guide for the World Wide Web , 1997, IJCAI.

[14]  Carlo Tasso,et al.  A shell for developing non-monotonic user modeling systems , 1994, Int. J. Hum. Comput. Stud..

[15]  Katia P. Sycara,et al.  WebMate: a personal agent for browsing and searching , 1998, AGENTS '98.

[16]  Georgios Paliouras,et al.  Learning User Communities for Improving the Services of Information Providers , 1998, ECDL.

[17]  Michael J. Pazzani,et al.  A hybrid user model for news story classification , 1999 .

[18]  Yoav Shoham,et al.  Fab: content-based, collaborative recommendation , 1997, CACM.

[19]  William W. Cohen,et al.  Recommendation as Classification: Using Social and Content-Based Information in Recommendation , 1998, AAAI/IAAI.

[20]  Paul Resnick,et al.  Recommender systems , 1997, CACM.

[21]  Michael J. Pazzani,et al.  Computational models of concept learning , 1991 .

[22]  Martin Eric Müller Inducing conceptual user models , 2002 .

[23]  Coenraad Bron,et al.  Finding all cliques of an undirected graph , 1973 .

[24]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[25]  Elaine Rich,et al.  Users are Individuals: Individualizing User Models , 1999, Int. J. Man Mach. Stud..

[26]  Jaideep Srivastava,et al.  Web usage mining: discovery and application of interesting patterns from web data , 2000 .

[27]  Donato Malerba,et al.  Adding machine learning and knowledge intensive techniques to a digital library service , 1998, International Journal on Digital Libraries.

[28]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[29]  Michael J. Pazzani,et al.  Learning and Revising User Profiles: The Identification of Interesting Web Sites , 1997, Machine Learning.

[30]  Elaine Rich Users are individuals: individualizing user models , 1999, Int. J. Hum. Comput. Stud..

[31]  Michael J. Pazzani,et al.  User Modeling for Adaptive News Access , 2000, User Modeling and User-Adapted Interaction.

[32]  Constantine D. Spyropoulos,et al.  Integrating User Modeling Into Information Extraction: The UMIE Prototype , 1997 .

[33]  Joëlle Coutaz User Modelling , 1992, Engineering for Human-Computer Interaction.

[34]  Liliana Ardissono,et al.  Tailoring the Interaction with Users in Web Stores , 2000, User Modeling and User-Adapted Interaction.

[35]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[36]  Oren Etzioni,et al.  Adaptive Web Sites: Conceptual Cluster Mining , 1999, IJCAI.

[37]  Alfred Kobsa,et al.  Adaptivität und Benutzermodellierung in interaktiven Softwaresystemen , 1993, KI.

[38]  Mark A. Gluck,et al.  Information, Uncertainty and the Utility of Categories , 1985 .

[39]  Alfred Kobsa,et al.  User Models in Dialog Systems , 1989, Symbolic Computation.

[40]  Yoav Shoham,et al.  Learning Information Retrieval Agents: Experiments with Automated Web Browsing , 1995 .

[41]  P. Langley,et al.  Concept formation in structured domains , 1991 .

[42]  Alexandros Moukas Amalthaea Information Discovery and Filtering Using a Multiagent Evolving Ecosystem , 1997, Appl. Artif. Intell..

[43]  M. Pazzani,et al.  Concept formation knowledge and experience in unsupervised learning , 1991 .

[44]  Eric Horvitz,et al.  Collaborative Filtering by Personality Diagnosis: A Hybrid Memory and Model-Based Approach , 2000, UAI.