Self-Configuring Applications for Heterogeneous Systems: Program Composition and Optimization Using Cognitive Techniques

This paper describes several challenges facing programmers of future edge computing systems, the diverse many-core devices that will soon exemplify commodity mainstream systems. To call attention to programming challenges ahead, this paper focuses on the most complex of such architectures: integrated, power-conserving systems, inherently parallel and heterogeneous, with distributed address spaces. When programming such complex systems, new concerns arise: computation partitioning across functional units, data movement and synchronization, managing a diversity of programming models for different devices, and reusing existing legacy and library software. We observe that many of these challenges are also faced in programming applications for large-scale heterogeneous distributed computing environments, and current solutions as well as future research directions in distributed computing can be adapted to commodity computing environments. Optimization decisions are inherently complex due to large search spaces of possible solutions and the difficulty of predicting performance on increasingly complex architectures. Cognitive techniques are well suited for managing systems of such complexity, citing recent trends of using cognitive techniques for code mapping and optimization support. Combining these, we describe a fundamentally new programming paradigm for complex heterogeneous systems, where programmers design self-configuring applications and the system automates optimization decisions and manages the allocation of heterogeneous resources.

[1]  H. Yu,et al.  An adaptive algorithm selection framework , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[2]  Albert Cohen,et al.  Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[3]  Dinesh Manocha,et al.  Accelerating Line of Sight Computation Using Graphics Processing Units , 2004 .

[4]  Mark Stephenson,et al.  Predicting unroll factors using supervised classification , 2005, International Symposium on Code Generation and Optimization.

[5]  James Arthur Kohl,et al.  A Component Architecture for High-Performance Scientific Computing , 2006, Int. J. High Perform. Comput. Appl..

[6]  Ken Kennedy,et al.  TaskScheduling Strategies forWorkflow-based Applications inGrids , 2005 .

[7]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[8]  John R Anderson,et al.  An integrated theory of the mind. , 2004, Psychological review.

[9]  Richard E. Korf,et al.  Real-Time Heuristic Search , 1990, Artif. Intell..

[10]  Yolanda Gil,et al.  The Role of Planning in Grid Computing , 2003, ICAPS.

[11]  Katherine Yelick,et al.  OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[12]  Lawrence Rauchwerger,et al.  An Adaptive Algorithm Selection Framework , 2004, IEEE PACT.

[13]  Herbert A. Simon,et al.  The Sciences of the Artificial , 1970 .

[14]  Chun Chen,et al.  A Systematic Approach to Model-Guided Empirical Search for Memory Hierarchy Optimization , 2005, LCPC.

[15]  Nancy M. Amato,et al.  A framework for adaptive algorithm selection in STAPL , 2005, PPoPP.

[16]  Henry Hoffmann,et al.  A stream compiler for communication-exposed architectures , 2002, ASPLOS X.

[17]  John Cavazos,et al.  Inducing heuristics to decide whether to schedule , 2004, PLDI '04.

[18]  Yolanda Gil,et al.  On agents and grids: Creating the fabric for a new generation of distributed intelligent systems , 2006, J. Web Semant..

[19]  Nancy M. Amato,et al.  Smartapps, an application centric approach to high performance computing: compiler-assisted software and hardware support for reduction operations , 2000, Proceedings 16th International Parallel and Distributed Processing Symposium.

[20]  Marc Spraragen,et al.  An intelligent assistant for interactive workflow composition , 2004, IUI '04.

[21]  Kristina Lerman,et al.  Resource Allocation in the Grid with Learning Agents , 2005, Journal of Grid Computing.

[22]  Carla E. Brodley,et al.  Learning to Schedule Straight-Line Code , 1997, NIPS.

[23]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[24]  Shlomo Zilberstein,et al.  Approximate Reasoning Using Anytime Algorithms , 1995 .

[25]  Mary Lou Soffa,et al.  A model-based framework: an approach for profit-driven optimization , 2005, International Symposium on Code Generation and Optimization.

[26]  William Pugh,et al.  Optimization within a unified transformation framework , 1996 .

[27]  Bernhard Hengst,et al.  Generating Hierarchical Structure in Reinforcement Learning from State Variables , 2000, PRICAI.

[28]  Michael F. P. O'Boyle,et al.  The effect of cache models on iterative compilation for combined tiling and unrolling , 2004, Concurr. Comput. Pract. Exp..

[29]  David E. Goldberg,et al.  The Design of Innovation: Lessons from and for Competent Genetic Algorithms , 2002 .

[30]  Gang Ren,et al.  Is Search Really Necessary to Generate High-Performance BLAS? , 2005, Proceedings of the IEEE.

[31]  Jesfis Peral,et al.  Heuristics -- intelligent search strategies for computer problem solving , 1984 .

[32]  Yolanda Gil,et al.  Wings for Pegasus: Creating Large-Scale Scientific Applications Using Semantic Representations of Computational Workflows , 2007, AAAI.

[33]  Mark Stephenson,et al.  Automating the construction of compiler heuristics using machine learning , 2006 .

[34]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[35]  Michael F. P. O'Boyle,et al.  Automatic performance model construction for the fast software exploration of new hardware designs , 2006, CASES '06.

[36]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[37]  Keith D. Cooper,et al.  ACME: adaptive compilation made efficient , 2005, LCTES '05.

[38]  James C. Browne,et al.  Productivity and performance through components: the ASCI Sweep3D application , 2007, Concurr. Comput. Pract. Exp..

[39]  Rina Dechter,et al.  Constraint Processing , 1995, Lecture Notes in Computer Science.

[40]  David Parello,et al.  Facilitating the search for compositions of program transformations , 2005, ICS '05.

[41]  Chun Chen,et al.  Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy , 2005, International Symposium on Code Generation and Optimization.

[42]  L. Almagor,et al.  Finding effective compilation sequences , 2004, LCTES '04.

[43]  Nils J. Nilsson,et al.  Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  David A. Padua,et al.  Optimizing sorting with genetic algorithms , 2005, International Symposium on Code Generation and Optimization.

[45]  Yoon-Ju Lee,et al.  Empirical Optimization for a Sparse Linear Solver: A Case Study , 2005, International Journal of Parallel Programming.

[46]  Yolanda Gil,et al.  Artificial intelligence and grids: workflow planning and beyond , 2004, IEEE Intelligent Systems.

[47]  Andy Ceranowicz,et al.  Adapting to Urban Warfare , 2005 .

[48]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[49]  David A. Padua,et al.  A dynamically tuned sorting library , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[50]  Dan M. Davis,et al.  Joint Experimentation on Scalable Parallel Processors , 2005 .

[51]  Eric A. Hansen,et al.  Anytime Heuristic Search , 2011, J. Artif. Intell. Res..

[52]  Keshav Pingali,et al.  Think globally, search locally , 2005, ICS '05.

[53]  Antoine Petitet,et al.  Minimizing development and maintenance costs in supporting persistently optimized BLAS , 2005 .

[54]  Maya Gokhale,et al.  Metropolitan road traffic simulation on FPGAs , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[55]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[56]  Michael F. P. O'Boyle,et al.  Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).