ARBUD : A Reusable Architecture for Building User Models from Massive Datasets

In many situations, it is common that a large single source of data serves as input to multiple applications, each of which may use a different user model. It is often the case that each user model is created using a different process; however, in many cases it would more efficient to use a common architecture for building different user models in different application areas. In this paper, we propose a distributed-computing architecture based on MapReduce that allows for the efficient processing of massive datasets using reusable components that compute different features of the final user model. A metamodel is used for specifying the characteristics of the desired user model – which can include both shortterm and long-term user models – and the architecture is responsible for building the user model from the specified data and reusable components. We present an instantiation of the architecture in the context of telecommunications applications and empirically evaluate the scalability of the proposed architecture with a real dataset. Our results indicate that complex user models for millions of users can be obtained in just a few hours on a small computer cluster.

[1]  Alfred Kobsa,et al.  Adaptable and Adaptive Information Access for All Users, Including the Disabled and the Elderly , 1997 .

[2]  Ricardo Baeza-Yates,et al.  A model for fast web mining prototyping , 2009, WSDM '09.

[3]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[4]  Robert L. Grossman,et al.  Data mining using high performance data clouds: experimental studies using sector and sphere , 2008, KDD.

[5]  Natalia Stash,et al.  AHA! The adaptive hypermedia architecture , 2003, HYPERTEXT '03.

[6]  Alfred Kobsa,et al.  Generic User Modeling Systems , 2001, User modeling and user-adapted interaction.

[7]  Terence T. Ow,et al.  WEBVIEW: an SQL extension for joining corporate data to data derived from the web , 2005, CACM.

[8]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[9]  Sougata Mukherjea,et al.  Social ties and their relevance to churn in mobile telecom networks , 2008, EDBT '08.

[10]  Chris Volinsky,et al.  Building an Effective Representation for Dynamic Networks , 2005 .

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  Chih-Ping Wei,et al.  Turning telecommunications call details to churn prediction: a data mining approach , 2002, Expert Syst. Appl..

[13]  Peter Brusilovsky,et al.  User Modeling in a Distributed E-Learning Architecture , 2005, User Modeling.