Advanced user environment design and implementation on integrated multi-architecture supercomputers

Scientists and engineers using supercomputer clusters should be able to focus on their scientific and technical work instead of worrying about operating their user environment. However, creating a convenient and effective user environment on modern supercomputers becomes more and more challenging due to the complexity of these large-scale systems. In this report, we discuss important design issues and goals in user environment that must support multiple compiler suites, various applications, and diverse libraries on heterogeneous computing architectures. We present our implementation on the latest high-performance computing system, Yellowstone, which is a powerful dedicated resource for earth system science deployed by the National Center for Atmospheric Research. Our newly designed user environment is built upon a hierarchical module structure, customized wrapper scripts, pre-defined system modules, Lmod modules implementation, and several creative tools. The resulting implementation realizes many great features including streamlined control, versioning, user customization, automated documentation, etc., and accommodates both novice and experienced users. The design and implementation also minimize the effort of the administrator and support team in managing users environment. The smooth application and positive feedback from our users demonstrate that our design and implementation on the Yellowstone system have been well accepted and have facilitated thousands of users all over the world.

[1]  Roberto Ierusalimschy,et al.  Lua—An Extensible Extension Language , 1996, Softw. Pract. Exp..

[2]  Markus Geimer,et al.  Modern Scientific Software Management Using EasyBuild and Lmod , 2014, 2014 First International Workshop on HPC User Support Tools.

[3]  Rudolf Eigenmann,et al.  Parallel I/O Library (PIO) , 2011, Encyclopedia of Parallel Computing.

[4]  Mark R. Fahey,et al.  User Environment Tracking and Problem Detection with XALT , 2014, 2014 First International Workshop on HPC User Support Tools.

[5]  Tommy Minyard,et al.  Best practices for the deployment and management of production HPC clusters , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[6]  Si Liu,et al.  The NWSC benchmark suite: Using scientific throughput to measure supercomputer performance , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[7]  Mark Howison,et al.  Building Software Environments for Research Computing Clusters , 2013, LISA.

[8]  Peter W. Osel,et al.  Abstract Yourself With Modules , 1996, LISA.