LLMapReduce: Multi-level map-reduce for high performance data analysis

The map-reduce parallel programming model has become extremely popular in the big data community. Many big data workloads can benefit from the enhanced performance offered by supercomputers. LLMapReduce provides the familiar map-reduce parallel programming model to big data users running on a supercomputer. LLMapReduce dramatically simplifies map-reduce programming by providing simple parallel programming capability in one line of code. LLMapReduce supports all programming languages and many schedulers. LLMapReduce can work with any application without the need to modify the application. Furthermore, LLMapReduce can overcome scaling limits in the map-reduce parallel programming model via options that allow the user to switch to the more efficient single-program-multiple-data (SPMD) parallel programming model. These features allow users to reduce the computational overhead by more than 10x compared to standard map-reduce for certain applications. LLMapReduce is widely used by hundreds of users at MIT. Currently LLMapReduce works with several schedulers such as SLURM, Grid Engine and LSF.

[1]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[2]  Jeremy Kepner,et al.  Big Data strategies for Data Center Infrastructure management using a 3D gaming platform , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).

[3]  Jeremy Kepner,et al.  Driving big data with big compute , 2012, 2012 IEEE Conference on High Performance Extreme Computing.

[4]  Christina Freytag,et al.  Using Mpi Portable Parallel Programming With The Message Passing Interface , 2016 .

[5]  SkjellumAnthony,et al.  A high-performance, portable implementation of the MPI message passing interface standard , 1996 .

[6]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[7]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[8]  Daisuke Takahashi,et al.  The HPC Challenge (HPCC) benchmark suite , 2006, SC.

[9]  Jeremy Kepner,et al.  Achieving 100,000,000 database inserts per second using Accumulo and D4M , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[10]  John Shalf,et al.  A preliminary evaluation of the hardware acceleration of the cray gemini interconnect for PGAS languages and comparison with MPI , 2011, PMBS '11.

[11]  F. E. A Relational Model of Data Large Shared Data Banks , 2000 .

[12]  Moni Naor,et al.  Job Scheduling Strategies for Parallel Processing , 2017, Lecture Notes in Computer Science.

[13]  Michael Stonebraker,et al.  The design and implementation of INGRES , 1976, TODS.

[14]  Jeremy Kepner,et al.  LLgrid: Enabling On-Demand Grid Computing with gridMatlab and pMatlab , 2004 .

[15]  Ed Anderson,et al.  LAPACK Users' Guide , 1995 .

[16]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[17]  Hyung Seok Kim,et al.  Interactive Grid Computing at Lincoln Laboratory , 2006 .

[18]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[19]  M. Jette,et al.  Simple Linux Utility for Resource Management , 2009 .

[20]  Robert K. Cunningham,et al.  Computing on masked data: a high performance method for improving big data veracity , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[21]  Jeremy Kepner,et al.  LLSuperCloud: Sharing HPC systems for diverse rapid prototyping , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

[22]  Jack Dongarra,et al.  ScaLAPACK Users' Guide , 1987 .

[23]  Sayantan Sur,et al.  MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters , 2011, Computer Science - Research and Development.

[24]  Jeremy Kepner,et al.  Enabling on-demand database computing with MIT SuperCloud database management system , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).

[25]  Andy B. Yoo,et al.  Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .

[26]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[27]  Amith R. Mamidala,et al.  Optimization of MPI collective operations on the IBM Blue Gene/Q supercomputer , 2014, Int. J. High Perform. Comput. Appl..

[28]  Jeremy Kepner,et al.  Dynamic distributed dimensional data model (D4M) database and computation system , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  E. F. CODD,et al.  A relational model of data for large shared data banks , 1970, CACM.

[30]  Jeremy Kepner,et al.  D4M 2.0 schema: A general purpose high performance schema for the Accumulo database , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

[31]  Jeremy Kepner,et al.  D4M: Bringing associative arrays to database engines , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).