Runtime Support for Memory Adaptation in Scientific Workloads via Local Disk and Remote Memory

The ever increasing memory demands of many scientific applications and the complexity of today’s shared computational resources still require the occasional use of virtual memory, network memory, or even out-of-core implementations, with well known drawbacks in performance and usability. In this paper, we present a general framework, based on our earlier MM LIB prototype [21], that enables fully customizable, memory malleability in a wide variety of scientific applications. We provide several necessary enhancements to the environment sensing capabilities of MMLIB and introduce a remote memory capability, based on MPI communication of cached memory blocks between ‘compute nodes’ and designated memory servers. We show experimental results from three important scientific applications that require the general MMLIB framework. Under constant memory pressure, we observe execution time improvements of factors between three and five over relying solely on the virtual memory system. With remote memory employed, these factors are even larger and significantly better than other, systemlevel remote memory implementations.

[1]  Miron Livny,et al.  Memory-Adaptive External Sorting , 1993, VLDB.

[2]  L. Iftode,et al.  Memory servers for multicomputers , 1993, Digest of Papers. Compcon Spring.

[3]  Anna R. Karlin,et al.  Implementing global memory management in a workstation cluster , 1995, SOSP.

[4]  Joel H. Saltz,et al.  Tuning the performance of I/O-intensive parallel applications , 1996, IOPADS '96.

[5]  Evangelos P. Markatos,et al.  Implementation of a Reliable Remote Memory Pager , 1996, USENIX ATC.

[6]  Joel H. Saltz,et al.  The utility of exploiting idle workstations for parallel computation , 1997, SIGMETRICS '97.

[7]  Liviu Iftode,et al.  Home-based shared virtual memory , 1998 .

[8]  Amnon Barak,et al.  Memory ushering in a scalable computing cluster , 1998, Microprocess. Microsystems.

[9]  Kai Li,et al.  Diskless Checkpointing , 1998, IEEE Trans. Parallel Distributed Syst..

[10]  Anna R. Karlin,et al.  Implementing cooperative prefetching and caching in a globally-managed memory system , 1998, SIGMETRICS '98/PERFORMANCE '98.

[11]  Sanjeev Setia,et al.  Dodo: a user-level system for exploiting idle memory in workstation clusters , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[12]  Sanjeev Setia,et al.  Availability and utility of idle memory in workstation clusters , 1999, SIGMETRICS '99.

[13]  Jeffrey Scott Vitter,et al.  A theoretical framework for memory-adaptive algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[14]  Todd C. Mowry,et al.  Taming the memory hogs: using compiler-inserted releases to manage physical memory intelligently , 2000, OSDI.

[15]  Yousef Saad,et al.  Parallel methods and tools for predicting material properties , 2000, Comput. Sci. Eng..

[16]  Fangzhe Chang,et al.  User-level resource-constrained sandboxing , 2000 .

[17]  Mary K. Vernon,et al.  Characteristics of a Large Shared Memory Production Workload , 2001, JSSPP.

[18]  Evgenia Smirni,et al.  Algorithmic modifications to the Jacobi-Davidson parallel eigensolver to dynamically balance external CPU and memory load , 2001, ICS '01.

[19]  Dimitrios S. Nikolopoulos,et al.  Adaptive Scheduling under Memory Pressure on Multiprogrammed Clusters , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[20]  Francine Berman,et al.  A Decoupled Scheduling Approach for the GrADS Program Development Environment , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[21]  Dimitrios S. Nikolopoulos Malleable memory mapping: user-level control of memory bounds for effective program adaptation , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[22]  Sathish S. Vadhiyar,et al.  A performance oriented migration framework for the grid , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[23]  Paul,et al.  Cluster Computing in the SHMOD Framework on the NSF TeraGrid , 2004 .

[24]  Richard T. Mills,et al.  Dynamic adaptation to cpu and memory load in scientific applications , 2004 .

[25]  Yunhao Liu,et al.  Parallel network RAM: effectively utilizing global cluster memory for large data-intensive parallel programs , 2004 .

[26]  Dimitrios S. Nikolopoulos,et al.  Adapting to memory pressure from within scientific applications on multiprogrammed COWs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[27]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[28]  Pamela L. Eddy COLLEGE ' OF WILLIAM AND MARY , 2004 .

[29]  Jarek Nieplocha,et al.  Exploiting processor groups to extend scalability of the GA shared memory programming model , 2005, CF '05.

[30]  Hyun-Wook Jin,et al.  Designing Efficient Cooperative Caching Schemes for Multi-Tier Data-Centers over RDMA-enabled Networks , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).