A distributed recommender system architecture

In contemporary internet architectures, including server farms and blog aggregators, web log data may be scattered among multiple cooperating peers. In order to perform content personalisation through provision of recommendations on such architectures, it is necessary to employ a recommendation algorithm; however, the majority of such algorithms are centralised, necessitating excessive data transfers and exhibiting performance issues when the number of users or the volume of data increase. In this paper, we propose an approach where the clickstream information is distributed to a number of peers, which cooperate for discovering frequent patterns and for generating recommendations, introducing: a) architectures that allow the distribution of both the content and the clickstream database to the participating peers; b) algorithms that allow collaborative decisions on the recommendations to the users, in the presence of scattered log information. The proposed approach may be employed in various domains, including digital libraries, social data, server farms and content distribution networks.

[1]  Srinivasan Parthasarathy,et al.  Parallel Data Mining for Association Rules on Shared-Memory Multi-Processors , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[2]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[3]  Jun Wang,et al.  TRIBLER: a social‐based peer‐to‐peer system , 2008, IPTPS.

[4]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[5]  Byeong-Soo Jeong,et al.  Parallel and Distributed Algorithms for Frequent Pattern Mining in Large Databases , 2009 .

[6]  Diomidis Spinellis,et al.  A survey of peer-to-peer content distribution technologies , 2004, CSUR.

[7]  Fernando Gustavo Tinetti,et al.  Distributed systems: principles and paradigms (2nd edition): Andrew S. Tanenbaum, Maarten Van Steen Pearson Education, Inc., 2007 ISBN: 0-13-239227-5 , 2011 .

[8]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[9]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[10]  Osmar R. Zaïane,et al.  Fast parallel association rule mining without candidacy generation , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[11]  Tao Jiang,et al.  Mining RDF Metadata for Generalized Association Rules , 2006, DEXA.

[12]  Xiaoying Tai,et al.  A Distributed Algorithm Based on Competitive Neural Network for Mining Frequent Patterns , 2005, 2005 International Conference on Neural Networks and Brain.

[13]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[14]  Karl Aberer,et al.  Indexing Data-oriented Overlay Networks , 2005, VLDB.

[15]  Mary Baker,et al.  The LOCKSS peer-to-peer digital preservation system , 2005, TOCS.

[16]  Beng Chin Ooi,et al.  Histogram-Based Global Load Balancing in Structured Peer-to-Peer Systems , 2009, IEEE Transactions on Knowledge and Data Engineering.

[17]  Yao Wang,et al.  Reliable and Scalable DHT-Based SIP Server Farm , 2008, IEEE GLOBECOM 2008 - 2008 IEEE Global Telecommunications Conference.

[18]  Alexandru Iosup,et al.  TRIBLER: a social-based peer-to-peer system: Research Articles , 2008 .

[19]  Iraklis Varlamis,et al.  Mining Frequent Generalized Patterns for Web Personalization in the Presence of Taxonomies , 2010, Int. J. Data Warehous. Min..

[20]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[21]  Ying Wang,et al.  P2P Volunteers for Reliable Server Farms , 2005, IASTED PDCS.

[22]  Ke Wang,et al.  Mining Generalized Associations of Semantic Relations from Textual Web Content , 2007, IEEE Transactions on Knowledge and Data Engineering.

[23]  Osmar R. Zaïane,et al.  Parallel leap: large-scale maximal pattern mining in a distributed environment , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).

[24]  Panagiotis G. Ipeirotis,et al.  Automatic Extraction of Useful Facet Hierarchies from Text Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[25]  Sonja Buchegger,et al.  PeerSoN: P2P social networking: early experiences and insights , 2009, SNS '09.

[26]  Flora S. Tsai,et al.  Design and development of a mobile peer-to-peer social networking application , 2009, Expert Syst. Appl..

[27]  Manolis Koubarakis,et al.  Query Processing in Super-Peer Networks with Languages Based on Information Retrieval: The P2P-DIET Approach , 2004, EDBT Workshops.

[28]  Jian Pei,et al.  Mining Access Patterns Efficiently from Web Logs , 2000, PAKDD.

[29]  Masaru Kitsuregawa,et al.  Parallel FP-Growth on PC Cluster , 2003, PAKDD.

[30]  Xiao-hui Cheng,et al.  A Privacy-Preserving Distributed Method for Mining Association Rules , 2009, 2009 International Conference on Artificial Intelligence and Computational Intelligence.

[31]  Chunhua Su,et al.  A Distributed Privacy-Preserving Association Rules Mining Scheme Using Frequent-Pattern Tree , 2008, ADMA.

[32]  Márk Jelasity,et al.  Large-Scale Newscast Computing on the Internet , 2002 .

[33]  Tore Risch,et al.  EDUTELLA: a P2P networking infrastructure based on RDF , 2002, WWW.

[34]  Bernhard Bauer,et al.  HiSbase: Histogram-based P2P Main Memory Data Management , 2007, VLDB.

[35]  Masaru Kitsuregawa,et al.  Parallel mining algorithms for generalized association rules with classification hierarchy , 1997, SIGMOD '98.

[36]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[37]  Hector Garcia-Molina,et al.  Collaborative Creation of Communal Hierarchical Taxonomies in Social Tagging Systems , 2006 .

[38]  Jiayi Zhou,et al.  Load Balancing Approach Parallel Algorithm for Frequent Pattern Mining , 2007, PaCT.

[39]  Márk Jelasity,et al.  A Robust and Scalable Peer-to-Peer Gossiping Protocol , 2003, AP2PC.

[40]  Katsumi Takahashi,et al.  Processing Load Prediction for Parallel FP-growth , 2005 .

[41]  Iraklis Varlamis,et al.  Mining Frequent Generalized Patterns for Web Personalization , 2008 .

[42]  Axel Bruns,et al.  PREPARING FOR AN AGE OF PARTICIPATORY NEWS , 2007 .

[43]  Masaru Kitsuregawa,et al.  FP-tax: tree structure based generalized association rule mining , 2004, DMKD '04.

[44]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[45]  D. Dominic,et al.  A Comparative Study of FP-growth Variations , 2009 .

[46]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM 2001.

[47]  Viswanath Poosala Histogram-Based Estimation Techniques in Database Systems , 1997 .

[48]  Raj P. Gopalan,et al.  Efficiently Mining Frequent Patterns from Dense Datasets Using a Cluster of Computers , 2003, Australian Conference on Artificial Intelligence.

[49]  Edward Y. Chang,et al.  Pfp: parallel fp-growth for query recommendation , 2008, RecSys '08.

[50]  Wolfgang Nejdl,et al.  OAI-P2P: a peer-to-peer network for open archives , 2002, Proceedings. International Conference on Parallel Processing Workshop.

[51]  James E. Pitkow,et al.  Characterizing Browsing Behaviors on the World-Wide Web , 1995 .

[52]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[53]  Ashfaq Khokhar,et al.  Frequent Pattern Mining on Message Passing Multiprocessor Systems , 2004, Distributed and Parallel Databases.

[54]  Yi Lu,et al.  Mining Web Log Sequential Patterns with Position Coded Pre-Order Linked WAP-Tree , 2005, Data Mining and Knowledge Discovery.

[55]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.

[56]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).