On large Linux clusters, scalability is the ability of the program to utilize additional processors in a way that provides a near-linear increase in computational capacity for each node employed. Without scalability, the cluster may cease to be useful after adding a very small number of nodes. The Joint Forces Command (JFCOM) Experimentation Directorate (J9) has recently been engaged in Joint Urban Operations (JUO) experiments and counter mortar analyses. Both required scalable codes to simulate over 1 million SAF clutter entities, using hundreds of CPUs. The JSAF application suite, utilizing the redesigned RTI-s communications system, provides the ability to run distributed simulations with sites located across the United States, from Norfolk, Virginia, to Maui, Hawaii. Interest-aware routers are essential for scalable communications in the large, distributed environments, and the RTI-s framework, currently in use by JFCOM, provides such routers connected in a basic tree topology. This approach is successful for small to medium sized simulations, but faces a number of constraining limitations precluding very large simulations. To resolve these issues, the work described herein utilizes a new software router infrastructure to accommodate more sophisticated, general topologies, including both the existing tree framework and a new generalization of the fully connected mesh topologies. The latter were first used in the SF Express ModSAF simulations of 100,000 fully interacting vehicles. The new software router objects incorporate an augmented set of the scalable features of the SF Express design, while optionally using low-level RTI-s objects to perform actual site-to-site communications. The limitations of the original MeshRouter formalism have been eliminated, allowing fully dynamic operations. The mesh topology capabilities allow aggregate bandwidth and site-to-site latencies to match actual network performance. The heavy resource load at the root node now can be distributed across routers at the participating sites. Most significantly, realizable point-to-point bandwidths remain stable as the underlying problem size increases, sustaining scalability claims.
[1]
Larry Smarr,et al.
Supercomputing and the transformation of science
,
1993
.
[2]
Forum Mpi.
MPI: A Message-Passing Interface
,
1994
.
[3]
Carl Kesselman,et al.
Implementing distributed synthetic forces simulations in metacomputing environments
,
1998,
Proceedings Seventh Heterogeneous Computing Workshop (HCW'98).
[4]
Steven J. Rak.
HLA/RTI Data Distribution Management in the Synthetic Theater of War
,
1997
.
[5]
Sharon Brunett,et al.
A Large-Scale Metacomputing Framework for the ModSAF Real-Time Simulation
,
1998,
Parallel Comput..
[6]
Paul Messina,et al.
Distributed interactive simulation for synthetic forces
,
1997,
Proceedings Sixth Heterogeneous Computing Workshop (HCW'97).
[7]
Dan M. Davis,et al.
Joint Experimentation on Scalable Parallel Processors
,
2005
.
[8]
Corporate The MPI Forum,et al.
MPI: a message passing interface
,
1993,
Supercomputing '93.
[9]
Richard M. Fujimoto.
HLA RTI Performance in High Speed LAN Environments
,
1999
.
[10]
Thomas A. Funkhouser,et al.
Network topologies for scalable multi-user virtual environments
,
1996,
Proceedings of the IEEE 1996 Virtual Reality Annual International Symposium.
[11]
M. F. Mar,et al.
ModSAF Behavior Simulation and Control
,
1993
.