Adapting to memory pressure from within scientific applications on multiprogrammed COWs

Summary form only given. Dismal performance often results when the memory requirements of a process exceed the physical memory available to it. Moreover, significant throughput reduction is experienced when this process is part of a synchronous parallel job on a nondedicated computational cluster. A possible solution is to develop programs that can dynamically adapt their memory usage according to the current availability of physical memory. We explore this idea on scientific computations that perform repetitive data accesses. Part of the program's data set is cached in resident memory, while the remainder that cannot fit is accessed in an "out-of-core" fashion from disk. The replacement policy can be user defined. This allows for a graceful degradation of performance as memory becomes scarce. To dynamically adjust its memory usage, the program must reliably answer whether there is a memory shortage or surplus in the system. Because operating systems typically export limited memory information, we develop a parameter-free algorithm that uses no system information beyond the resident set size (RSS) of the program. Our resulting library can be called by scientific codes with little change to their structure or with no change at all, if computations are already "blocked" for reasons of locality. Experimental results with both sequential and parallel versions of a memory-adaptive conjugate-gradient linear system solver show substantial performance gains over the original version that relies on the virtual memory system. Furthermore, multiple instances of the adaptive code can coexist on the same node with little interference with one another.

[1]  Larry Rudolph,et al.  Evaluation of Design Choices for Gang Scheduling Using Distributed Hierarchical Control , 1996, J. Parallel Distributed Comput..

[2]  Sanjeev Setia,et al.  Availability and utility of idle memory in workstation clusters , 1999, SIGMETRICS '99.

[3]  Sivan Toledo,et al.  A survey of out-of-core algorithms in numerical linear algebra , 1999, External Memory Algorithms.

[4]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures: dealing with massive data , 2001, CSUR.

[5]  David R. Cheriton,et al.  Application-controlled physical memory using external page-cache management , 1992, ASPLOS V.

[6]  Wu-chun Feng,et al.  Time-Sharing Parallel Jobs in the Presence of Multiple Resource Requirements , 2000, JSSPP.

[7]  Mary K. Vernon,et al.  Characteristics of a Large Shared Memory Production Workload , 2001, JSSPP.

[8]  Francine Berman,et al.  A Decoupled Scheduling Approach for the GrADS Program Development Environment , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[9]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[10]  Evgenia Smirni,et al.  Algorithmic modifications to the Jacobi-Davidson parallel eigensolver to dynamically balance external CPU and memory load , 2001, ICS '01.

[11]  John K. Ousterhout,et al.  Scheduling Techniques for Concurrent Systems , 1982, ICDCS.

[12]  Scott Pakin,et al.  Dynamic Coscheduling on Workstation Clusters , 1998, JSSPP.

[13]  Robert Schreiber,et al.  Efficient Methods for Out-of-Core Sparse Cholesky Factorization , 1999, SIAM J. Sci. Comput..

[14]  Sathish S. Vadhiyar,et al.  A performance oriented migration framework for the grid , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[15]  Miron Livny,et al.  Memory-Adaptive External Sorting , 1993, VLDB.

[16]  Robert L. Henderson,et al.  Job Scheduling Under the Portable Batch System , 1995, JSSPP.

[17]  Andrea C. Arpaci-Dusseau,et al.  Implicit coscheduling: coordinated scheduling with implicit information in distributed systems , 2001, TOCS.

[18]  Dimitrios S. Nikolopoulos Malleable memory mapping: user-level control of memory bounds for effective program adaptation , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[19]  Dror G. Feitelson,et al.  Gang scheduling with memory considerations , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[20]  Todd C. Mowry,et al.  Taming the memory hogs: using compiler-inserted releases to manage physical memory intelligently , 2000, OSDI.

[21]  Jack J. Dongarra,et al.  Key Concepts for Parallel Out-of-Core LU Factorization , 1996, Parallel Comput..

[22]  Dimitrios S. Nikolopoulos,et al.  Adaptive Scheduling under Memory Pressure on Multiprogrammed Clusters , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[23]  Fangzhe Chang,et al.  User-level resource-constrained sandboxing , 2000 .

[24]  Jeffrey Scott Vitter,et al.  A theoretical framework for memory-adaptive algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[25]  John K. Ousterhout Scheduling Techniques for Concurrebt Systems. , 1982, ICDCS 1982.

[26]  Anna R. Karlin,et al.  Implementation and performance of integrated application-controlled file caching, prefetching, and disk scheduling , 1996, TOCS.