Runtime Support for Memory Adaptation in Scientific Applications via Local Disk and Remote Memory

The ever increasing memory demands of many scientific applications and the complexity of today's shared computational resources still require the occasional use of virtual memory, network memory, or even out-of-core implementations, with well known drawbacks in performance and usability. In this paper, we present a general framework, based on our earlier MML B prototype, that enables fully customizable, memory malleability in a wide variety of scientific applications. We provide several necessary enhancements to the environment sensing capabilities of MMLIB and introduce a remote memory capability, based on MPI communication of cached memory blocks between `compute nodes' and designated memory servers. We show experimental results from three important scientific applications that require the general MML B framework. Under constant memory pressure, we observe execution time improvements of factors between three and five over relying solely on the virtual memory system. With remote memory employed, these factors are even larger and significantly better than other, system-level remote memory implementations

[1]  Mary K. Vernon,et al.  Characteristics of a Large Shared Memory Production Workload , 2001, JSSPP.

[2]  Francine Berman,et al.  A Decoupled Scheduling Approach for the GrADS Program Development Environment , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[3]  Yunhao Liu,et al.  Parallel network RAM: effectively utilizing global cluster memory for large data-intensive parallel programs , 2004 .

[4]  Yousef Saad,et al.  Parallel methods and tools for predicting material properties , 2000, Comput. Sci. Eng..

[5]  Joel H. Saltz,et al.  The utility of exploiting idle workstations for parallel computation , 1997, SIGMETRICS '97.

[6]  Dimitrios S. Nikolopoulos,et al.  Adaptive Scheduling under Memory Pressure on Multiprogrammed Clusters , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[7]  Joel H. Saltz,et al.  Tuning the performance of I/O-intensive parallel applications , 1996, IOPADS '96.

[8]  Fangzhe Chang,et al.  User-level resource-constrained sandboxing , 2000 .

[9]  Sathish S. Vadhiyar,et al.  A performance oriented migration framework for the grid , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[10]  Evangelos P. Markatos,et al.  Implementation of a Reliable Remote Memory Pager , 1996, USENIX ATC.

[11]  Rajkumar Buyya,et al.  High Performance Cluster Computing , 1999 .

[12]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[13]  Evgenia Smirni,et al.  Algorithmic modifications to the Jacobi-Davidson parallel eigensolver to dynamically balance external CPU and memory load , 2001, ICS '01.

[14]  Kai Li,et al.  Diskless Checkpointing , 1998, IEEE Trans. Parallel Distributed Syst..

[15]  Sanjeev Setia,et al.  Availability and utility of idle memory in workstation clusters , 1999, SIGMETRICS '99.

[16]  Amnon Barak,et al.  Memory ushering in a scalable computing cluster , 1998, Microprocess. Microsystems.

[17]  Pamela L. Eddy COLLEGE ' OF WILLIAM AND MARY , 2004 .

[18]  Dimitrios S. Nikolopoulos,et al.  Adapting to memory pressure from within scientific applications on multiprogrammed COWs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[19]  Sanjeev Setia,et al.  Dodo: a user-level system for exploiting idle memory in workstation clusters , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[20]  Richard T. Mills,et al.  Dynamic adaptation to cpu and memory load in scientific applications , 2004 .

[21]  Jarek Nieplocha,et al.  Exploiting processor groups to extend scalability of the GA shared memory programming model , 2005, CF '05.

[22]  Todd C. Mowry,et al.  Taming the memory hogs: using compiler-inserted releases to manage physical memory intelligently , 2000, OSDI.

[23]  L. Iftode,et al.  Memory servers for multicomputers , 1993, Digest of Papers. Compcon Spring.

[24]  Miron Livny,et al.  Memory-Adaptive External Sorting , 1993, VLDB.

[25]  Hyun-Wook Jin,et al.  Designing Efficient Cooperative Caching Schemes for Multi-Tier Data-Centers over RDMA-enabled Networks , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[26]  Jeffrey Scott Vitter,et al.  A theoretical framework for memory-adaptive algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[27]  Liviu Iftode,et al.  Home-based shared virtual memory , 1998 .

[28]  Dimitrios S. Nikolopoulos Malleable memory mapping: user-level control of memory bounds for effective program adaptation , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[29]  Anna R. Karlin,et al.  Implementing cooperative prefetching and caching in a globally-managed memory system , 1998, SIGMETRICS '98/PERFORMANCE '98.

[30]  Anna R. Karlin,et al.  Implementing global memory management in a workstation cluster , 1995, SOSP.

[31]  Paul,et al.  Cluster Computing in the SHMOD Framework on the NSF TeraGrid , 2004 .