POSITION PAPER: Modularizing Global Variable In Climate Simulation Software

In large-scale simulation codes, such as climate models, variables represent a large number of characteristics of earth surface and atmosphere for a single multi-dimensional cell and are distributed over a multitude of cores in the supercomputers where these simulations run. The hundreds of variables allow different parts of simulation representing specific sub-models, for example the photosynthesis, to interact with other sub-models of the simulation. The scientists of each domain write the simulation code for the sub-model representing their sub-domain. To integrate their code into the entire simulation, they need to deal with hundreds of unfamiliar variables of which only a small subset is relevant to their work. Designing such variables in a modular fashion, so that the scientists could interact only with the variables relevant to their sub-model is likely to increase the productivity of the scientists and to increase accuracy of the simulation codes. A natural way to group the variables into modules is by using a language feature that group them together, such as, struct construct in C language or a class in C++ language. Each scientist would then need to familiarize themselves with only a small subset of modules that contain variables used in their simulations. For example, Community Earth System Model (CESM) code v1.06 has 51 such modules (structures) that contain 1479 variables. The methods proposed below can be used to assess the modularity of the existing set of structures and to generate alternative modularizations that improve upon it. In a nutshell, the approach minimizes the number of variables exposed to other domains by the modules used in each domain.