Dynamic run-time architecture techniques for enabling continuous optimization

Future computer systems will integrate tens of multithreaded processor cores on a single chip die, resulting in hundreds of concurrent program threads sharing system resources. These designs will be the cornerstone of improving throughput in high-performance computing and server environments. However, to date, appropriate systems software (operating system, run-time system, and compiler) technologies for these emerging machines have not been adequately explored. Future processors will require sophisticated hardware monitoring units to continuously feed back resource utilization information to allow the operating system to make optimal thread co-scheduling decisions and also to software that continuously optimizes the program itselfNevertheless, in order to continually and automatically adapt systems resources to program behaviors and application needs, specific run-time information must be collected to adequately enable dynamic code optimization and operating system scheduling. Generally, run-time optimization is limited by the time required to collect profiles, the time required to perform optimization, and the inherent benefits of any optimization or decisions. Initial techniques for effectively utilizing run-time information for dynamic optimization and informed thread scheduling in future multithreaded architectures are presented

[1]  Scott A. Mahlke,et al.  The superblock: An effective technique for VLIW and superscalar compilation , 1993, The Journal of Supercomputing.

[2]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[3]  Josep Torrellas,et al.  A Chip-Multiprocessor Architecture with Speculative Multithreading , 1999, IEEE Trans. Computers.

[4]  Karl Pettis,et al.  Profile guided code positioning , 1990, PLDI '90.

[5]  Margaret Martonosi,et al.  Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data , 2003, MICRO.

[6]  Thomas Ball,et al.  Edge profiling versus path profiling: the showdown , 1998, POPL '98.

[7]  James R. Larus,et al.  Efficient path profiling , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[8]  Michael D. Bond,et al.  Practical path profiling for dynamic optimizers , 2005, International Symposium on Code Generation and Optimization.

[9]  Brad Calder,et al.  Phase tracking and prediction , 2003, ISCA '03.

[10]  Wei-Chung Hsu,et al.  Dynamic trace selection using performance monitoring hardware sampling , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[11]  Wen-mei W. Hwu,et al.  Inline function expansion for compiling C programs , 1989, PLDI '89.

[12]  David W. Wall Predicting program behavior using real or estimated profiles , 1991, PLDI '91.

[13]  Amitabh Srivastava,et al.  Analysis Tools , 2019, Public Transportation Systems.

[14]  Wei-Chung Hsu,et al.  Design and Implementation of a Lightweight Dynamic Optimization System , 2004, J. Instr. Level Parallelism.

[15]  D. Marr,et al.  Hyper-Threading Technology Architecture and MIcroarchitecture , 2002 .

[16]  Steven R. Kunkel,et al.  A multithreaded PowerPC processor for commercial servers , 2000, IBM J. Res. Dev..

[17]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, ACM Trans. Comput. Syst..

[18]  A. Snavely,et al.  Symbiotic jobscheduling for a simultaneous multithreaded processor , 2000, ASPLOS IX.

[19]  Michael Franz,et al.  Continuous program optimization , 1999 .

[20]  Dean M. Tullsen,et al.  Supporting fine-grained synchronization on a simultaneous multithreading processor , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[21]  Michael Franz,et al.  Continuous Program Optimization: Design and Evaluation , 2001, IEEE Trans. Computers.

[22]  Dean M. Tullsen,et al.  Symbiotic jobscheduling with priorities for a simultaneous multithreading processor , 2002, SIGMETRICS '02.

[23]  Mark S. Squillante,et al.  Performance Analysis of Simultaneous Multithreading in a PowerPC-based Processor , 2002 .

[24]  Larry Carter,et al.  Symbiotic Jobscheduling on the Tera MTA , 2000 .

[25]  Brinkley Sprunt Pentium 4 Performance-Monitoring Features , 2002, IEEE Micro.

[26]  Wen-mei W. Hwu,et al.  A hardware mechanism for dynamic extraction and relayout of program hot spots , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).