Phase Directed Compiler Optimizations

Profile-guided optimizing compilers learn from representative executions of a program to "tune" transformations so as to benefit frequent paths. However, these optimizations view the whole run of a program in a monolithic manner. It is known that a program execution proceeds in phases—each phase corresponding to an identifiable set of control-flow behaviors. This implies that not all control-flows are hot (i.e. executed with a high frequency) all the times. Hence, if a program can switch among a set of hot paths for guiding the optimizations in the different phases, the optimizations may yield powerful results. We propose an algorithm that optimizes the clones of a function according to the different phase behaviors exhibited by the function, and dispatches its calls to the (potentially) most beneficial clone at runtime. This makes it possible to use profile information at a finer granularity than existing approaches. We start off by identifying critical functions that exhibit a high differential in its control-flow profiles, thereby exhibiting widely varying phase behavior. For these critical functions, we compile specialized clones that are tuned for each distinct phase behavior. Finally, we build a phase predictor that, at run-time, predicts the phase that a yet-to-be-executed function invocation would evoke (when executed), and guides the function invocation to the respective clone of the function. We build the predictor by learning a classifier over features extracted from the state of the program with the distinct phases acting as class labels. We demonstrate our algorithm by building a concrete phase-directed optimizer for register allocation (pdra) within the PBQP-based register allocator in the LLVM compiler infrastructure. We compare our allocator against the base allocator and a profile-guided allocator (pgra) that uses the profile information in a monolithic manner without extracting phase information.

[1]  Saumya K. Debray,et al.  Code Specialization Based on Value Profiles , 2000, SAS.

[2]  Péricles Rafael Oliveira Alves,et al.  Just-in-time value specialization , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[3]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[4]  Saman P. Amarasinghe,et al.  Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.

[5]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[6]  Martin Guha,et al.  Encyclopedia of Statistics in Behavioral Science , 2006 .

[7]  Markus Mock,et al.  Annotation-Directed Run-Time Specialization in C , 1997, PEPM.

[8]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[9]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[10]  Brad Calder,et al.  Discovering and Exploiting Program Phases , 2003, IEEE Micro.

[11]  Markus Mock,et al.  A retrospective on: "an evaluation of staged run-time optimizations in DyC" , 2004, SIGP.

[12]  Gennady Pekhimenko,et al.  Efficient Program Compilation Through Machine Learning Techniques , 2010, Software Automatic Tuning, From Concepts to State-of-the-Art Results.

[13]  Charles Consel,et al.  A general approach for run-time specialization and its application to C , 1996, POPL '96.

[14]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[15]  Nieves R. Brisaboa,et al.  Spatial Selection of Sparse Pivots for Similarity Search in Metric Spaces , 2007, SOFSEM.

[16]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[17]  Brian N. Bershad,et al.  Fast, effective dynamic compilation , 1996, PLDI '96.

[18]  David I. August,et al.  Practical automatic loop specialization , 2013, ASPLOS '13.

[19]  Wen-mei W. Hwu,et al.  Vacuum packing: extracting hardware-detected program phases for post-link optimization , 2002, MICRO.

[20]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[21]  Brad Calder,et al.  Value Profiling and Optimization , 1999, J. Instr. Level Parallelism.

[22]  Markus Mock,et al.  Calpa: a tool for automating selective dynamic compilation , 2000, MICRO 33.

[23]  Bernhard Scholz,et al.  Nearly Optimal Register Allocation with PBQP , 2006, JMLC.