Solving Linear Programs in MapReduce

Most interesting discrete optimization problems are NP-hard, thus no efficient algorithm to find optimal solution to such problems is likely to exist. Linear programming plays a central role in design and analysis of many approximation algorithms. However, linear program instances in real-world applications grow enormously. In this thesis, we study the Awerbuch-Khandekar parallel algorithm for approximating linear programs, provide strategies for efficient realization of the algorithm in MapReduce, and discuss methods to improve its performance in practice. Further, we characterize numerical properties of the algorithm by comparing it with partially-distributed optimization methods. Finally, we evaluate the algorithm on a weighted maximum satisfiability problem generated by SOFIE knowledge extraction framework on the complete Academic Corpus.

[1]  Marcus Paradies An Efficient Blocking Technique for Reference Matching using MapReduce , 2011, Datenbank-Spektrum.

[2]  Baruch Awerbuch,et al.  Stateless distributed gradient descent for positive linear programs , 2008, SIAM J. Comput..

[3]  Gerhard Weikum,et al.  YAGO: A Large Ontology from Wikipedia and WordNet , 2008, J. Web Semant..

[4]  Chao Liu,et al.  Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce , 2010, WWW '10.

[5]  Gerhard Weikum,et al.  Automated construction and growth of large ontology , 2009 .

[6]  Ravi Kumar,et al.  Max-cover in map-reduce , 2010, WWW '10.

[7]  Tom White Hadoop - The Definitive Guide: MapReduce for the Cloud , 2009 .

[8]  Rajkumar Buyya,et al.  Cluster computing: the commodity supercomputer , 1999, Softw. Pract. Exp..

[9]  David P. Williamson,et al.  The Design of Approximation Algorithms , 2011 .

[10]  Gerhard Weikum,et al.  SOFIE: a self-organizing framework for information extraction , 2009, WWW '09.

[11]  Neal E. Young,et al.  Sequential and parallel algorithms for mixed packing and covering , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[12]  Xindong Wu,et al.  K-Means Clustering with Bagging and MapReduce , 2011, 2011 44th Hawaii International Conference on System Sciences.

[13]  Noam Nisan,et al.  A parallel approximation algorithm for positive linear programming , 1993, STOC.

[14]  Qing He,et al.  Parallel K-Means Clustering Based on MapReduce , 2009, CloudCom.

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[16]  Roger Wattenhofer,et al.  The price of being near-sighted , 2006, SODA '06.

[17]  Danny Raz,et al.  Fast, Distributed Approximation Algorithms for Positive Linear Programming with Applications to Flow Control , 2004, SIAM J. Comput..

[18]  Klaus Jansen Approximation Algorithm for the Mixed Fractional Packing and Covering Problem , 2006, SIAM J. Optim..

[19]  Lois Curfman McInnes,et al.  TAO users manual. , 2003 .

[20]  Mihalis Yannakakis,et al.  Linear programming without the matrix , 1993, STOC.