Access Declarations: Revealing Data Access Patterns to Hardware

This work addresses the challenge of programming on Explicitly Managed Memory (EMM) systems. These systems are characterized by a multi-level memory hierarchy where small and fast memories are placed closer to the processor than larger and slower ones. Unlike conventional processors, hardware does not automatically move the data along the memory hierarchy; there is no automatic caching or prefetching. Instead, the onus of managing data placement falls on the software. EMM systems are under active research, and some are already in production, because of the potential to achieve high performance per watt. A key impediment to wide deployment of EMMs is programmability. Having to worry about moving data between the levels of the memory hierarchy is like going back to the pre-virtual-memory days where the programmer had to manually transfer data between the RAM and the hard disk. We argue that the way to overcome this challenge is to use the old idea of separation between the interface and the implementation, but take it to the new level. Our solution introduces a new interface construct, access declarations, that expands the contract between the application and the underlying runtime system. An access declaration augments the abstract data type (ADT) with a meaningful hint about how the application will access the data. Armed with this information, the underlying implementation can pick the optimal memory layout and fetching strategy for the ADT. We demonstrate the effectiveness of this approach on image processing, MapReduce, and graphs applications.

[1]  Jaejin Lee,et al.  Design and implementation of software-managed caches for multicores with local memory , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[2]  Luca Benini,et al.  P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[3]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[4]  Basilio B. Fraguela,et al.  The Hierarchically Tiled Arrays programming approach , 2004, LCR.

[5]  Jonathan Eastep,et al.  Smart data structures: an online machine learning approach to multicore data structures , 2011, ICAC '11.

[6]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[7]  Nathan Clark,et al.  DDT: Design and evaluation of a dynamic program analysis for optimizing data structure usage , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[8]  Frédo Durand,et al.  Decoupling algorithms from schedules for easy optimization of image processing pipelines , 2012, ACM Trans. Graph..

[9]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[10]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[11]  P. Hanrahan,et al.  Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[12]  Margo I. Seltzer,et al.  Berkeley DB , 1999, USENIX Annual Technical Conference, FREENIX Track.

[13]  Luca Benini,et al.  An OpenMP Compiler for Efficient Use of Distributed Scratchpad Memory in MPSoCs , 2012, IEEE Transactions on Computers.

[14]  Jens Palsberg,et al.  Concurrent Collections , 2010 .

[15]  Stephen N. Zilles,et al.  Programming with abstract data types , 1974 .

[16]  Kunle Olukotun,et al.  A Heterogeneous Parallel Framework for Domain-Specific Languages , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[17]  Benoît Meister,et al.  Runnemede: An architecture for Ubiquitous High-Performance Computing , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).