Languages and Compilers for Parallel Computing

Many recently proposed BigData processing frameworks make programming easier, but typically expect the datasets to fit in the memory of either a single multicore machine or a cluster of multicore machines. When this assumption does not hold, these frameworks fail. We introduce the InfiniMem framework that enables size oblivious processing of large collections of objects that do not fit in memory by making them disk-resident. InfiniMem is easy to program with: the user just indicates the large collections of objects that are to be made diskresident, while InfiniMem transparently handles their I/O management. The InfiniMem library can manage a very large number of objects in a uniform manner, even though the objects have different characteristics and relationships which, when processed, give rise to a wide range of access patterns requiring different organizations of data on the disk. We demonstrate the ease of programming and versatility of InfiniMem with 3 different probabilistic analytics algorithms, 3 different graph processing size oblivious frameworks; they require minimal effort, 6–9 additional lines of code. We show that InfiniMem can successfully generate a mesh with 7.5 million nodes and 300 million edges (4.5 GB on disk) in 40 min and it performs the PageRank computation on a 14GB graph with 134 million vertices and 805 million edges at 14 min per iteration on an 8-core machine with 8GB RAM. Many graph generators and processing frameworks cannot handle such large graphs. We also exploit InfiniMem on a cluster to scale-up an object-based DSM.

[1]  Gagan Agrawal,et al.  Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[2]  Tao Wang,et al.  Novel parallel Hough Transform on multi-core processors , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Ravi Narayanaswamy,et al.  Offload Compiler Runtime for the Intel® Xeon Phi Coprocessor , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[4]  Karsten Schwan,et al.  Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community , 2011, Computing in Science & Engineering.

[5]  Robert W. Numrich,et al.  Co-array Fortran for parallel programming , 1998, FORF.

[6]  Yi Yang,et al.  Apricot: an optimizing compiler and productivity tool for x86-compatible many-core coprocessors , 2012, ICS '12.

[7]  Vivek Sarkar,et al.  JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA , 2009, Euro-Par.

[8]  Michael F. P. O'Boyle,et al.  Portable and Transparent Host-Device Communication Optimization for GPGPU Environments , 2014, CGO '14.

[9]  Jun Shirako,et al.  Hierarchical Parallelism Control for Multigrain Parallel Processing , 2002, LCPC.

[10]  Nagisa Ishiura,et al.  Model Based Parallelization from the Simulink Models and Their Sequential C Code , 2012 .

[11]  M. Meyer,et al.  Production quality code generation from Simulink block diagrams , 1999, Proceedings of the 1999 IEEE International Symposium on Computer Aided Control System Design (Cat. No.99TH8404).

[12]  Avi Mendelson,et al.  Programming model for a heterogeneous x86 platform , 2009, PLDI '09.

[13]  Damodaram Kamma,et al.  Effect of Model Based Software Development on Productivity of Enhancement Tasks -- An Industrial Study , 2014, 2014 21st Asia-Pacific Software Engineering Conference.

[14]  David A. Padua,et al.  Performance Portability with the Chapel Language , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[15]  Hironori Kasahara,et al.  Automatic Parallelization of Hand Written Automotive Engine Control Codes Using OSCAR Compiler , 2013 .

[16]  Vivek Sarkar,et al.  X10: concurrent programming for modern architectures , 2007, PPOPP.

[17]  David A. Padua,et al.  An Evaluation of Vectorizing Compilers , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[18]  Jun Shirako,et al.  Selective inline expansion for improvement of multi grain parallelism , 2004, Parallel and Distributed Computing and Networks.

[19]  James Reinders,et al.  Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .

[20]  Henry Hoffmann,et al.  StreamIt: A Compiler for Streaming Applications ⁄ , 2002 .

[21]  Wang Chun Parallel Processing Technology of ICT Image Reconstruction , 2005 .

[22]  R. Govindarajan,et al.  Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[23]  Arquimedes Canedo,et al.  Automatic parallelization of simulink applications , 2010, CGO '10.

[24]  Andrew J. Kornecki,et al.  Automated Code Generation for Safety-Related Applications: a Case Study , 2007, Comput. Sci..

[25]  Jun Shirako,et al.  OSCAR API for Real-Time Low-Power Multicores and Its Performance on Multicores and SMP Servers , 2009, LCPC.