Parallelization Primitives for Dynamic Sparse Computations

We characterize a general class of algorithms common in machine learning, scientific computing, and signal processing, whose computational dependencies are both sparse, and dynamically defined throughout execution. Existing parallel computing runtimes, like MapReduce and GraphLab, are a poor fit for this class because they assume statically defined dependencies for resource allocation and scheduling decisions. As a result, changing load characteristics and straggling compute units degrade performance significantly. However, we show that the sparsity of computational dependencies and these algorithms' natural error tolerance can be exploited to implement a flexible execution model with large efficiency gains, using two simple primitives: selective push-pull and statistical barriers. With reconstruction for compressive time-lapse MRI as a motivating application, we deploy a large Orthogonal Matching Pursuit (OMP) computation on Amazon's EC2 cluster to demonstrate a 19x speedup over current static execution models.

[1]  Albert G. Greenberg,et al.  Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.

[2]  D. Donoho,et al.  Sparse MRI: The application of compressed sensing for rapid MR imaging , 2007, Magnetic resonance in medicine.

[3]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[4]  Kurt Keutzer,et al.  Practical parallel imaging compressed sensing MRI: Summary of two years of experience in accelerating body MRI of pediatric patients , 2011, 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[5]  Vahid Tarokh,et al.  A Coding Theory Approach to Noisy Compressive Sensing Using Low Density Frames , 2011, IEEE Transactions on Signal Processing.

[6]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[7]  David L. Donoho,et al.  Sparse Solution Of Underdetermined Linear Equations By Stagewise Orthogonal Matching Pursuit , 2006 .

[8]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[9]  D Atkinson,et al.  A computationally efficient OMP-based compressed sensing reconstruction for dynamic MRI , 2011, Physics in medicine and biology.

[10]  Martin C. Rinard Using early phase termination to eliminate load imbalances at barrier synchronization points , 2007, OOPSLA.

[11]  Deanna Needell,et al.  CoSaMP: Iterative signal recovery from incomplete and inaccurate samples , 2008, ArXiv.

[12]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.