A Distributed Algorithm for Large-Scale Generalized Matching

Generalized matching problems arise in a number of applications, including computational advertising, recommender systems, and trade markets. Consider, for example, the problem of recommending multimedia items (e.g., DVDs) to users such that (1) users are recommended items that they are likely to be interested in, (2) every user gets neither too few nor too many recommendations, and (3) only items available in stock are recommended to users. State-of-the-art matching algorithms fail at coping with large real-world instances, which may involve millions of users and items. We propose the first distributed algorithm for computing near-optimal solutions to large-scale generalized matching problems like the one above. Our algorithm is designed to run on a small cluster of commodity nodes (or in a MapReduce environment), has strong approximation guarantees, and requires only a poly-logarithmic number of passes over the input. In particular, we propose a novel distributed algorithm to approximately solve mixed packing-covering linear programs, which include but are not limited to generalized matching problems. Experiments on real-world and synthetic data suggest that a practical variant of our algorithm scales to very large problem sizes and can be orders of magnitude faster than alternative approaches.

[1]  Noam Nisan,et al.  A parallel approximation algorithm for positive linear programming , 1993, STOC.

[2]  Noga Alon,et al.  A general approach to online network optimization problems , 2004, SODA '04.

[3]  Christos Koufogiannakis,et al.  Distributed Fractional Packing and Maximum Weighted b-Matching via Tail-Recursive Duality , 2009, DISC.

[4]  Nikhil R. Devanur,et al.  Fast algorithms for finding matchings in lopsided bipartite graphs with applications to display ads , 2010, EC '10.

[5]  Martin Theobald,et al.  Top-k query processing in probabilistic databases with non-materialized views , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[6]  Gerhard Weikum,et al.  Query Relaxation for Entity-Relationship Search , 2011, ESWC.

[7]  Harold N. Gabow,et al.  An efficient reduction technique for degree-constrained subgraph and bidirected network flow problems , 1983, STOC.

[8]  Jochen Könemann,et al.  Faster and simpler algorithms for multicommodity flow and other fractional packing problems , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[9]  Robert E. Tarjan,et al.  Faster Scaling Algorithms for Network Problems , 1989, SIAM J. Comput..

[10]  Tony Jebara,et al.  B-Matching for Spectral Clustering , 2006, ECML.

[11]  Sergei Vassilvitskii,et al.  Counting triangles and the curse of the last reducer , 2011, WWW.

[12]  Peter J. Haas,et al.  Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.

[13]  Sergei Vassilvitskii,et al.  Densest Subgraph in Streaming and MapReduce , 2012, Proc. VLDB Endow..

[14]  Sergei Vassilvitskii,et al.  Scalable K-Means++ , 2012, Proc. VLDB Endow..

[15]  Éva Tardos,et al.  Fast approximation algorithms for fractional packing and covering problems , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[16]  Shih-Fu Chang,et al.  Graph construction and b-matching for semi-supervised learning , 2009, ICML '09.

[17]  Julián Mestre,et al.  Greedy in Approximation Algorithms , 2006, ESA.

[18]  Moshe Tennenholtz,et al.  Constrained multi-object auctions and b-matching , 2000, Inf. Process. Lett..

[19]  Tony Jebara,et al.  Minimum Volume Embedding , 2007, AISTATS.

[20]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[21]  Kurt Mehlhorn,et al.  Assigning Papers to Referees , 2009, Algorithmica.

[22]  James Bennett,et al.  The Netflix Prize , 2007 .

[23]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[24]  Alessandro Panconesi,et al.  Fast primal-dual distributed algorithms for scheduling and matching problems , 2010, Distributed Computing.

[25]  Andrew V. Goldberg,et al.  Beyond the flow decomposition barrier , 1998, JACM.

[26]  Hans-Peter Seidel,et al.  Acquisition and Analysis of Bispectral Bidirectional Reflectance Distribution Functions , 2009 .

[27]  Baruch Awerbuch,et al.  Stateless distributed gradient descent for positive linear programs , 2008, SIAM J. Comput..

[28]  Yehuda Koren,et al.  The Yahoo! Music Dataset and KDD-Cup '11 , 2012, KDD Cup.

[29]  Christos Koufogiannakis,et al.  Distributed algorithms for covering, packing and maximum weighted matching , 2011, Distributed Computing.

[30]  Hans-Peter Seidel,et al.  Global stochastic optimization for robust and accurate human motion capture , 2007 .

[31]  Ravi Kumar,et al.  Max-cover in map-reduce , 2010, WWW '10.

[32]  Tony Jebara,et al.  Structure preserving embedding , 2009, ICML '09.

[33]  Aristides Gionis,et al.  Social Content Matching in MapReduce , 2011, Proc. VLDB Endow..

[34]  Neal E. Young,et al.  Sequential and parallel algorithms for mixed packing and covering , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[35]  Silvio Lattanzi,et al.  Filtering: a method for solving graph problems in MapReduce , 2011, SPAA '11.

[36]  Rajiv Gandhi,et al.  Dependent rounding and its applications to approximation algorithms , 2006, JACM.

[37]  Bert Huang,et al.  Fast b-matching via Sufficient Selection Belief Propagation , 2011, AISTATS.

[38]  Sivan Toledo,et al.  Characterizing the Performance of Flash Memory Storage Devices and Its Impact on Algorithm Design , 2008, WEA.

[39]  Jon Feldman,et al.  Online allocation of display ads with smooth delivery , 2012, KDD.

[40]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..