FEAST: An Automated Feature Selection Framework for Compilation Tasks

Modern machine-learning techniques greatly reduce the efforts required to conduct high-quality program compilation, which, without the aid of machine learning, would otherwise heavily rely on human manipulation as well as expert intervention. The success of the application of machine-learning techniques to compilation tasks can be largely attributed to the recent development and advancement of program characterization, a process that numerically or structurally quantifies a target program. While great achievements have been made in identifying key features to characterize programs, choosing a correct set of features for a specific compiler task remains an ad hoc procedure. In order to guarantee a comprehensive coverage of features, compiler engineers usually need to select excessive number of features. This, unfortunately, would potentially lead to a selection of multiple similar features, which in turn could create a new problem of bias that emphasizes certain aspects of a program's characteristics, hence reducing the accuracy and performance of the target compiler task. In this paper, we propose FEAture Selection for compilation Tasks (FEAST), an efficient and automated framework for determining the most relevant and representative features from a feature pool. Specifically, FEAST utilizes widely used statistics and machine-learning tools, including LASSO, sequential forward and backward selection, for automatic feature selection, and can in general be applied to any numerical feature set. This paper further proposes an automated approach to compiler parameter assignment for assessing the performance of FEAST. Intensive experimental results demonstrate that, under the compiler parameter assignment task, FEAST can achieve comparable results with about 18% of features that are automatically selected from the entire feature pool. We also inspect these selected features and discuss their roles in program execution.

[1]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[2]  Mark Stephenson,et al.  Predicting unroll factors using supervised classification , 2005, International Symposium on Code Generation and Optimization.

[3]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[4]  Michael F. P. O'Boyle,et al.  Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[5]  Michael F. P. O'Boyle,et al.  Rapidly Selecting Good Compiler Optimizations using Performance Counters , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[6]  Michael F. P. O'Boyle,et al.  Mapping parallelism to multi-cores: a machine learning based approach , 2009, PPoPP '09.

[7]  Westley Weimer,et al.  The road not taken: Estimating path execution frequency statically , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[8]  Simha Sethumadhavan,et al.  Approximate graph clustering for program characterization , 2012, TACO.

[9]  John Cavazos,et al.  Using graph-based program characterization for predictive modeling , 2012, CGO '12.

[10]  Sameer Kulkarni,et al.  Automatic construction of inlining heuristics using machine learning , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[11]  Alfred O. Hero,et al.  AMOS: An automated model order selection algorithm for spectral graph clustering , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).