Instruction Memory Overhead of In Situ Visualization and Analysis Libraries on HPC Machines

Running visualization on data in situ of the simulation that generates them is an ever increasing requirement of largescale scientific computing[8]. To address the diverse needs of the scientific community, multiple projects are working on adapting general-purpose large-scale standalone visualization applications and libraries into in situ libraries that can be integrated into existing simulations[1, 11, 6, 13, 14]. Such an integration requires the simulation and visualization components to share system resources, in particular memory. Some work has been done to evaluate in transit operations[10, 3, 4, 9, 7] where the visualization libraries run at the same time as the simulation, but in a different memory space. In this study we consider only in situ operation. As such, the memory footprint of the visualization library is a critical metric for its viability and adoption. The memory overhead can be reduced by sharing data where possible, but some memory cannot possibly be shared. In particular, the program instructions loaded from the visualization libraries clearly must be independent of anything else used in the simulation. The overhead of these program instructions is seldom of concern when running the visualization independently, but can be of pivotal concern when integrated into a large-scale scientific simulation. Realistic analysis of the memory overhead for libraries is difficult. For example, simply shared linking the full ParaView Catalyst libraries reports an increase of 500MB in virtual memory space per process even though the files on disk only use 150MB of space. In addition, the memory mapping on a desktop Linux machine will likely be different than on compute node Linux[12] or some other light weight HPC operating system. How does performance as it is understood on a desktop system transfer to understanding performance on the HPC system? In this study we use the Catalyst library to analyze memory overhead. A recent feature added to Catalyst is Editions, which allow subselection from the full set of ParaView Catalyst filters down to only those needed for certain tasks. We use this to compare visualization library configurations of different sizes with different features. We report both experiment methodologies and design decisions that are generally applicable to visualization and other libraries on HPC systems.