SMARTEST: A Surrogate-Assisted Memetic Algorithm for Code Size Reduction

Compiling source code effectively to meet various criteria is a critical task in software engineering. Especially, code size reduction has attracted much attention from both industry and academia due to the requirement of resource utilization. Generally, developers rely on compiler optimization passes to realize code size reduction. However, it is impractical to select a desirable optimization sequence manually since a wide variety of optimization passes are integrated into a compiler. Evolutionary algorithms offer an impressive way to alleviate this problem. Nevertheless, previous approaches fail to balance the exploitation and exploration of the search space. Moreover, the expensive fitness evaluation requires actual compilation, which makes the evolution rather time-consuming. To tackle the challenges, we propose a novel approach SMARTEST, which characterizes the systematic exploitation of a huge volume of historical compilation information. Specifically, SMARTEST comprises two components: 1) a local search operator to enhance the solution quality; and 2) a data-driven surrogate model to avoid expensive fitness evaluation. We evaluate the effectiveness of SMARTEST over the cBench benchmark suite. Experimental results indicate that SMARTEST outperforms the standard level -Os by 2.17% on average, and achieves 1.2 times code size reduction compared with the genetic algorithm. Furthermore, experimental results over the benchmark suite evidently show that SMARTEST gets a better result and takes less actual fitness evaluations than its variants, which demonstrates the contribution of the local search and the surrogate model.

[1]  Leslie Pérez Cáceres,et al.  Evaluating random forest models for irace , 2017, GECCO.

[2]  Gordon Fraser,et al.  On Parameter Tuning in Search Based Software Engineering , 2011, SSBSE.

[3]  Tapabrata Ray,et al.  A Surrogate Assisted Approach for Single-Objective Bilevel Optimization , 2017, IEEE Transactions on Evolutionary Computation.

[4]  L. Darrell Whitley,et al.  The GENITOR Algorithm and Selection Pressure: Why Rank-Based Allocation of Reproductive Trials is Best , 1989, ICGA.

[5]  A. Panichella,et al.  A guided genetic algorithm for automated crash reproduction , 2017, ICSE 2017.

[6]  Jianchao Zeng,et al.  Surrogate-Assisted Cooperative Swarm Optimization of High-Dimensional Expensive Problems , 2017, IEEE Transactions on Evolutionary Computation.

[7]  Bernhard Sendhoff,et al.  A framework for evolutionary optimization with approximate fitness functions , 2002, IEEE Trans. Evol. Comput..

[8]  Mengjie Zhang,et al.  Surrogate-Assisted Genetic Programming With Simplified Models for Automated Design of Dispatching Rules , 2017, IEEE Transactions on Cybernetics.

[9]  Sameer Kulkarni,et al.  Mitigating the compiler optimization phase-ordering problem using machine learning , 2012, OOPSLA '12.

[10]  Suresh Purini,et al.  Finding good optimization sequences covering program space , 2013, TACO.

[11]  Yuanyuan Zhang,et al.  Search-based software engineering: Trends, techniques and applications , 2012, CSUR.

[12]  Michael F. P. O'Boyle,et al.  Milepost GCC: Machine Learning Enabled Self-tuning Compiler , 2011, International Journal of Parallel Programming.

[13]  Reyhaneh Jabbarvand,et al.  Search-Based Energy Testing of Android , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[14]  Jingyuan Zhang,et al.  A Hybrid ACO algorithm for the Next Release Problem , 2010, The 2nd International Conference on Software Engineering and Data Mining.

[15]  Gregory M. Kapfhammer,et al.  A genetic algorithm to improve linux kernel performance on resource-constrained devices , 2010, GECCO '10.

[16]  Kalyanmoy Deb,et al.  Simulated Binary Crossover for Continuous Search Space , 1995, Complex Syst..

[17]  Gianluca Palermo,et al.  A Survey on Compiler Autotuning using Machine Learning , 2018, ACM Comput. Surv..

[18]  J. Anderson,et al.  Computational fluid dynamics : the basics with applications , 1995 .

[19]  Jürgen Branke,et al.  Faster convergence by means of fitness estimation , 2005, Soft Comput..

[20]  Lionel C. Briand,et al.  Testing Vision-Based Control Systems Using Learnable Evolutionary Algorithms , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[21]  Yaochu Jin,et al.  A comprehensive survey of fitness approximation in evolutionary computation , 2005, Soft Comput..

[22]  Michèle Sebag,et al.  Self-adaptive surrogate-assisted covariance matrix adaptation evolution strategy , 2012, GECCO '12.

[23]  Antonio Martínez-Álvarez,et al.  Nonintrusive Automatic Compiler-Guided Reliability Improvement of Embedded Applications Under Proton Irradiation , 2019, IEEE Transactions on Nuclear Science.

[24]  Matei Ripeanu,et al.  Finding Resilience-Friendly Compiler Optimizations Using Meta-Heuristic Search Techniques , 2016, 2016 12th European Dependable Computing Conference (EDCC).

[25]  William M. Spears,et al.  Crossover or Mutation? , 1992, FOGA.

[26]  Gordon Fraser,et al.  Automated unit test generation for classes with environment dependencies , 2014, ASE.

[27]  Adam Lipowski,et al.  Roulette-wheel selection via stochastic acceptance , 2011, ArXiv.

[28]  Jaime Llorca,et al.  Approximation algorithms for the NFV service distribution problem , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[29]  Gerry V. Dozier,et al.  Vulnerability analysis of immunity-based intrusion detection systems using genetic and evolutionary hackers , 2007, Appl. Soft Comput..

[30]  Olivier Barais,et al.  NOTICE: A Framework for Non-Functional Testing of Compilers , 2016, 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[31]  Jianchao Zeng,et al.  A fitness approximation assisted competitive swarm optimizer for large scale expensive optimization problems , 2018, Memetic Comput..

[32]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[33]  Marouane Kessentini,et al.  Regression Testing for Model Transformations: A Multi-objective Approach , 2013, SSBSE.

[34]  Edna Barros,et al.  Latin hypercube initialization strategy for design space exploration of deep neural network architectures , 2019, GECCO.

[35]  Gianluca Palermo,et al.  Predictive modeling methodology for compiler phase-ordering , 2016, PARMA-DITAM '16.

[36]  Siva Krishna Dasari,et al.  Random Forest Surrogate Models to Support Design Space Exploration in Aerospace Use-Case , 2019, AIAI.

[37]  Arie van Deursen,et al.  Search-Based Test Data Generation for SQL Queries , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[38]  He Jiang,et al.  Solving the Large Scale Next Release Problem with a Backbone-Based Multilevel Algorithm , 2012, IEEE Transactions on Software Engineering.

[39]  Rajeev Wankar,et al.  Tuning the Optimization Parameter Set for Code Size , 2012, MIWAI.

[40]  Handing Wang,et al.  A Random Forest-Assisted Evolutionary Algorithm for Data-Driven Constrained Multiobjective Combinatorial Optimization of Trauma Systems , 2020, IEEE Transactions on Cybernetics.

[41]  Wael Farag,et al.  Automatic selection of compiler options using genetic techniques for embedded software design , 2013, 2013 IEEE 14th International Symposium on Computational Intelligence and Informatics (CINTI).

[42]  Alexandre C. B. Delbem,et al.  Clustering-Based Selection for the Exploration of Compiler Optimization Sequences , 2016, ACM Trans. Archit. Code Optim..

[43]  Anderson Faustino da Silva,et al.  The Effect of Combining Compiler Optimizations on Code Size , 2011, 2011 30th International Conference of the Chilean Computer Science Society.

[44]  Feilong Tang,et al.  Feature Mining for Machine Learning Based Compilation Optimization , 2014, 2014 Eighth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing.

[45]  Michael F. P. O'Boyle,et al.  MiDataSets: Creating the Conditions for a More Realistic Evaluation of Iterative Optimization , 2007, HiPEAC.

[46]  Charalampos Konstantopoulos,et al.  Approximation algorithms for the arc orienteering problem , 2015, Inf. Process. Lett..

[47]  Ying Tan,et al.  A generation-based optimal restart strategy for surrogate-assisted social learning particle swarm optimization , 2019, Knowl. Based Syst..

[48]  Keith D. Cooper,et al.  Optimizing for reduced code space using genetic algorithms , 1999, LCTES '99.

[49]  Feng Qian,et al.  Operation optimization of hydrocracking process based on Kriging surrogate model , 2019, Control Engineering Practice.

[50]  Enrique Alba,et al.  Search based algorithms for test sequence generation in functional testing , 2015, Inf. Softw. Technol..

[51]  Yin Tan,et al.  An adaptive model selection strategy for surrogate-assisted particle swarm optimization algorithm , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[52]  Carlos Cotta,et al.  Memetic algorithms and memetic computing optimization: A literature review , 2012, Swarm Evol. Comput..

[53]  Rajeev Wankar,et al.  GA-Based Compiler Parameter Set Tuning , 2015 .

[54]  L. Darrell Whitley,et al.  Constructing subtle higher order mutants for Java and AspectJ programs , 2013, 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE).