CF-TUNE: Collaborative Filtering Auto-Tuning for Energy Efficient Many-Core Processors

Energy efficiency is considered today as a first class design principle of modern many-core computing systems in the effort to overcome the limited power envelope. However, many-core processors are characterised by high micro-architectural complexity, which is propagated up to the application level affecting both performance and energy consumption. In this paper, we present CF-TUNE, an online and scalable auto-tuning framework for energy aware applications mapping on emerging many-core architectures. CF-TUNE enables the extraction of an energy-efficient tuning configuration point with minimal application characterisation on the whole tuning configuration space. Instead of analyzing every application against every tuning configuration, it adopts a collaborative filtering technique that quickly and with high accuracy configures the application’s tuning parameters by identifying similarities with previously optimized applications. We evaluate CF-TUNE ’s efficiency against a set of demanding and diverse applications mapped on Intel Many Integrated Core processor and we show that with minimal characterization, e.g., only either two or four evaluations, CF-TUNE recommends a tuning configuration that performs at least at the 94 percent level of the optimal one.

[1]  Ryan Newton,et al.  A Synergetic Approach to Throughput Computing on x86-Based Multicore Desktops , 2011, IEEE Software.

[2]  Shoaib Kamil,et al.  OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[3]  Gary S. Tyson,et al.  Practical exhaustive optimization phase order exploration and evaluation , 2009, TACO.

[4]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[5]  Lieven Eeckhout,et al.  Deconstructing iterative optimization , 2012, TACO.

[6]  David H. Bailey,et al.  The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[7]  Chun Chen,et al.  A scalable auto-tuning framework for compiler optimization , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[8]  Allen D. Malony,et al.  Collective mind: Towards practical and collaborative auto-tuning , 2014, Sci. Program..

[9]  Anna Sikora,et al.  AutoTune: A Plugin-Driven Approach to the Automatic Tuning of Parallel Applications , 2012, PARA.

[10]  Michael F. P. O'Boyle,et al.  Milepost GCC: Machine Learning Enabled Self-tuning Compiler , 2011, International Journal of Parallel Programming.

[11]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[12]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[13]  Archana Ganapathi,et al.  A case for machine learning to optimize multicore performance , 2009 .

[14]  Michael F. P. O'Boyle,et al.  Automatic feature generation for machine learning-based optimising compilation , 2014, ACM Trans. Archit. Code Optim..