Social Content Matching in MapReduce

Matching problems are ubiquitous. They occur in economic markets, labor markets, internet advertising, and elsewhere. In this paper we focus on an application of matching for social media. Our goal is to distribute content from information suppliers to information consumers. We seek to maximize the overall relevance of the matched content from suppliers to consumers while regulating the overall activity, e.g., ensuring that no consumer is overwhelmed with data and that all suppliers have chances to deliver their content. We propose two matching algorithms, GreedyMR and StackMR, geared for the MapReduce paradigm. Both algorithms have provable approximation guarantees, and in practice they produce high-quality solutions. While both algorithms scale extremely well, we can show that Stack-MR requires only a poly-logarithmic number of MapReduce steps, making it an attractive option for applications with very large datasets. We experimentally show the trade-offs between quality and efficiency of our solutions on two large datasets coming from real-world social-media web sites.

[1]  Jonathan Cohen,et al.  Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[2]  Ravi Kumar,et al.  Max-cover in map-reduce , 2010, WWW '10.

[3]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[4]  Wojciech Rytter,et al.  A Simple Randomized Parallel Algorithm for Maximal f-Matchings , 1996, Inf. Process. Lett..

[5]  Ranieri Baraglia,et al.  Document Similarity Self-Join with MapReduce , 2010, 2010 IEEE International Conference on Data Mining.

[6]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[7]  Moshe Tennenholtz,et al.  Constrained multi-object auctions and b-matching , 2000, Inf. Process. Lett..

[8]  Christos Koufogiannakis,et al.  Distributed Fractional Packing and Maximum Weighted b-Matching via Tail-Recursive Duality , 2009, DISC.

[9]  Jimmy J. Lin,et al.  Book Reviews: Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer , 2010, CL.

[10]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[11]  Shang-Hua Teng,et al.  Electrical flows, laplacian systems, and faster approximation of maximum flow in undirected graphs , 2010, STOC '11.

[12]  Kurt Mehlhorn,et al.  Assigning Papers to Referees , 2009, Algorithmica.

[13]  Andrew V. Goldberg,et al.  Approximating Matchings in Parallel , 1993, Inf. Process. Lett..

[14]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[15]  Alessandro Panconesi,et al.  Fast primal-dual distributed algorithms for scheduling and matching problems , 2010, Distributed Computing.

[16]  Andrew V. Goldberg,et al.  Beyond the flow decomposition barrier , 1998, JACM.

[17]  Roger Wattenhofer,et al.  Distributed Weighted Matching , 2004, DISC.

[18]  Sergei Vassilvitskii,et al.  A model of computation for MapReduce , 2010, SODA '10.

[19]  Tony Jebara,et al.  B-Matching for Spectral Clustering , 2006, ECML.

[20]  Shih-Fu Chang,et al.  Graph construction and b-matching for semi-supervised learning , 2009, ICML '09.

[21]  Nikhil R. Devanur,et al.  Fast algorithms for finding matchings in lopsided bipartite graphs with applications to display ads , 2010, EC '10.

[22]  Harold N. Gabow,et al.  An efficient reduction technique for degree-constrained subgraph and bidirected network flow problems , 1983, STOC.

[23]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.