Cache-conscious graph collaborative filtering on multi-socket multicore systems

Recommendation systems using graph collaborative filtering often require responses in real time and high throughput. Therefore, besides recommendation accuracy, it is critical to study high performance concurrent collaborative filtering on modern platforms. To achieve high performance, we study the graph data locality characteristics of collaborative filtering. Our experiments demonstrate that although an individual graph traversal exhibits poor data locality, multiple queries have a tendency of sharing their data footprints, especially in the case of queries with neighboring root vertices. Such characteristics lead to both inter- and intra-thread data locality, which can be utilized to significantly improve collaborative filtering performance. Based on these observations, we present a cache-conscious system for collaborative filtering on modern multi-socket multicore platforms. In this system, we propose a cache-conscious query scheduling technique and an in-memory graph representation, and to maximize cache performance and minimize cross-core/socket communication overhead, we address both inter- and intra-thread data locality. To address the workload balancing issue, this study introduces a dynamic work-stealing mechanism to explore the tradeoff between workload balancing and cache-consciousness. The proposed system was evaluated on a Power7+ system against the IBM Knowledge Repository graph dataset. The results demonstrated both good scalability and throughput. Compared with the basic system that does not perform cache-conscious scheduling, inter-thread scheduling improves throughput by up to 18%. Intra-thread scheduling can further improve throughput by as much as 22%. By enabling dynamic work-stealing, the proposed technique balances workloads across all threads with a low standard deviation of the per-thread processing time.

[1]  Bilel Derbel,et al.  Distributed Graph Traversals by Relabelling Systems with Applications , 2006, GT-VC@CONCUR.

[2]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[3]  Sally A. McKee,et al.  Computation regrouping: restructuring programs for temporal data cache locality , 2002, ICS '02.

[4]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[5]  Guojing Cong,et al.  A Study on the Locality Behavior of Minimum Spanning Tree Algorithms , 2006, HiPC.

[6]  Edmond Chow,et al.  A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[7]  Mark Rosenstein,et al.  Recommending and evaluating choices in a virtual community of use , 1995, CHI '95.

[8]  David A. Bader,et al.  On the architectural requirements for efficient execution of graph algorithms , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[9]  Sriram Krishnamoorthy,et al.  An approach to locality-conscious load balancing and transparent memory hierarchy management with a global-address-space parallel programming model , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[10]  Philip S. Yu,et al.  Horting hatches an egg: a new graph-theoretic approach to collaborative filtering , 1999, KDD '99.

[11]  Bradley N. Miller,et al.  GroupLens: applying collaborative filtering to Usenet news , 1997, CACM.

[12]  David A. Bader,et al.  Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2 , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[13]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[14]  Liang Yuan,et al.  Modeling the Locality in Graph Traversals , 2012, 2012 41st International Conference on Parallel Processing.

[15]  Aaron Tay CSCW '94 , 1995, SGCH.

[16]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[17]  Michael A. Bender,et al.  Cache-oblivious priority queue and graph algorithm applications , 2002, STOC '02.

[18]  Qing Yang,et al.  Efficient Multicore Collaborative Filtering , 2011, ArXiv.

[19]  Sophie Ahrens,et al.  Recommender Systems , 2012 .

[20]  Kurt Keutzer,et al.  Parallel BFS graph traversal on images using structured grid , 2010, 2010 IEEE International Conference on Image Processing.

[21]  Pattie Maes,et al.  Social information filtering: algorithms for automating “word of mouth” , 1995, CHI '95.

[22]  Richard E. Ladner,et al.  Cache performance analysis of traversals and random accesses , 1999, SODA '99.