Processor Assignment and Synchronization in Parallel Simulation of

MULTISTAGE INTERCONNECTION NETWORKS (MIN''S), AN IMPORTANT CLASS OF NET- WORKS ARISING IN PARALLEL COMPUTER ARCHITECTURES, ARE EXCELLENT CANDIDATES FOR PARALLEL TIME-DRIVEN SIMULATION ON SHARED MEMORY COMPUTERS BECAUSE THEY ARE LARGE, INHERENTLY PARALLEL, REGULARLY STRUCTURED, DISCRETE TIME SYS- TEMS. IN THIS PAPER WE REPORT RESULTS FROM A CONTINUING STUDY ON THE PER- FORMANCE OF SUCH SIMULATIONS ON A SEQUENT SYMMETRY. OUR FOCUS IS THE CLASS OF BUFFERED DELTA NETWORKS CONSISTING OF 2 X 2 SWITCHES. WE REPORT RESULTS FOR SIMULATIONS OF MIN''S CONTAINING 4-9 STAGES AND REPORT ON THE EFFECTS THAT DIFFERENT SYNCHRONIZATION TECHNIQUES, DIFFERENT PROCESSOR TO SWITCH ALLOCATION STRATEGIES, AND NON-UNIFORM TRAFFIC PATTERNS IN THE WORKLOAD HAVE ON SIMULATION SPEEDUP. BRIEFLY, WE OBSERVE THAT A TWO-PHASE SYNCHRON- IZATION TECHNIQUE REQUIRING ONLY TWO BARRIERS DURING EACH CLOCK CYCLE PRO- VIDES THE BEST SPEEDUP. WE EXAMINE TWO PROCESSOR TO SWITCH ALLOCATION STRATEGIES, A `CONTIGUOUS'' ALLOCATION THAT ALLOCATES CONTIGUOUS ROWS OF SWITCHES TO EACH PROCESSOR, AND AN `INTERLEAVED'' ALLOCATION THAT ALLOCATES EVERY P-TH ROW TO EACH OF THE P PROCESSORS. WHILE THE CONTIGUOUS ALLOCA- TION EXHIBITS BETTER LOCALITY OF REFERENCE, FOR UNIFORM TRAFFIC THERE IS LITTLE DIFFERENCE IN PERFORMANCE BETWEEN THE TWO STRATEGIES. FOR NON-UNI- FORM TRAFFIC, WHEREAS THE SPEEDUP USING CONTIGUOUS ALLOCATION DEGRADES, THE SPEEDUP USING THE INTERLEAVED ALLOCATION REMAINS NEARLY CONSTANT. USING