A spatial path scheduling algorithm for EDGE architectures

Growing on-chip wire delays are motivating architectural features that expose on-chip communication to the compiler. EDGE architectures are one example of communication-exposed microarchitectures in which the compiler forms dataflow graphs that specify how the microarchitecture maps instructions onto a distributed execution substrate. This paper describes a compiler scheduling algorithm called spatial path scheduling that factors in previously fixed locations - called anchor points - for each placement. This algorithm extends easily to different spatial topologies. We augment this basic algorithm with three heuristics: (1) local and global ALU and network link contention modeling, (2) global critical path estimates, and (3) dependence chain path reservation. We use simulated annealing to explore possible performance improvements and to motivate the augmented heuristics and their weighting functions. We show that the spatial path scheduling algorithm augmented with these three heuristics achieves a 21% average performance improvement over the best prior algorithm and comes within an average of 5% of the annealed performance for our benchmarks.

[1]  Vaughn Betz,et al.  VPR: A new packing, placement and routing tool for FPGA research , 1997, FPL.

[2]  Richard E. Korf,et al.  Depth-First Iterative-Deepening: An Optimal Admissible Tree Search , 1985, Artif. Intell..

[3]  Rastislav Bodík,et al.  Focusing processor policies via critical-path prediction , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[4]  Alexandru Nicolau,et al.  Parallel processing: a smart compiler and a dumb machine , 1984, SIGPLAN '84.

[5]  T. J. Watson,et al.  CARS: A New Code Generation Framework for Clustered ILP Processors , 2001 .

[6]  John R. Ellis,et al.  Bulldog: A Compiler for VLIW Architectures , 1986 .

[7]  Lizy Kurian John,et al.  Scaling to the end of silicon with EDGE architectures , 2004, Computer.

[8]  Carla E. Brodley,et al.  Learning to Schedule Straight-Line Code , 1997, NIPS.

[9]  R. Korf An Optimal Admissible Tree Search , 1985 .

[10]  Arvind,et al.  Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.

[11]  Kathryn S. McKinley,et al.  Instruction scheduling for emerging communication-exposed architectures , 2004, PACT 2004.

[12]  Philip H. Sweany,et al.  Instruction Scheduling Using Simulated Annealing , 1998 .

[13]  Xia Chen,et al.  Critical path analysis of the TRIPS architecture , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[14]  Philip H. Sweany,et al.  Optimizing loop performance for clustered VLIW architectures , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[15]  Alexandru Nicolau,et al.  Parallel processing: a smart compiler and a dumb machine , 1984, SIGP.

[16]  Javier Zalamea,et al.  Software and Hardware Techniques to Optimize Register File Utilization in VLIW Architectures , 2004, International Journal of Parallel Programming.

[17]  Pierre G. Paulin,et al.  Force-Directed Scheduling in Automatic Data Path Synthesis , 1987, 24th ACM/IEEE Design Automation Conference.

[18]  A. Gonzalez,et al.  Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[19]  Vivek Sarkar,et al.  Baring It All to Software: Raw Machines , 1997, Computer.

[20]  Jack B. Dennis,et al.  A preliminary architecture for a basic data-flow processor , 1974, ISCA '98.

[21]  Mark Stephenson,et al.  Convergent scheduling , 2002, MICRO 35.

[22]  Christoph W. Kessler,et al.  Optimal integrated code generation for clustered VLIW architectures , 2002, LCTES/SCOPES '02.

[23]  Steven Swanson,et al.  Modeling instruction placement on a spatial architecture , 2006, SPAA '06.

[24]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[25]  Aaron Smith,et al.  Compiling for EDGE architectures , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[26]  Thomas M. Conte,et al.  Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[27]  Kemal Ebcioglu,et al.  CARS: a new code generation framework for clustered ILP processors , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[28]  Kathryn S. McKinley,et al.  Static placement, dynamic issue (SPDI) scheduling for EDGE architectures , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..