HPC System Acceptance: Controlled Chaos
暂无分享,去创建一个
Over the last six decades, Los Alamos National Laboratory (LANL) has acquired, accepted, and integrated over 100 new HPC systems, from MANIAC in 1952 to Trinity in 2016. These systems range from small clusters to large supercomputers. Each type of system has its own challenges and having a well established and proven test, acceptance, and integration plan is valuable to the site and vendor to expedite the process. The topic of systems acceptance itself is quite broad, and for the purposes of this paper, it will be mostly focused on the system’s software and hardware components. Some discussion will be given to performance testing as well, but the purpose of this paper is to help HPC System Administrators with the acceptance process.
[1] Eva Hocks,et al. Gordon: design, performance, and experiences deploying and supporting a data intensive supercomputer , 2012, XSEDE '12.
[2] M. Rajan,et al. Performance on Trinity Phase 2 ( a Cray XC 40 utilizing Intel Xeon Phi processors ) with Acceptance Applications and Benchmarks , 2017 .
[3] Celso L. Mendes,et al. Deployment and testing of the sustained petascale Blue Waters system , 2015, J. Comput. Sci..
[4] Laura Monroe,et al. GPU Behavior on a Large HPC Cluster , 2013, Euro-Par Workshops.