论文信息 - Invited: Efficient Reinforcement Learning for Automating Human Decision-Making in SoC Design

Invited: Efficient Reinforcement Learning for Automating Human Decision-Making in SoC Design

The exponential growth in PVT corners due to Moore's law scaling, and the increasing demand for consumer applications and longer battery life in mobile devices, has ushered in significant cost and power-related challenges for designing and productizing mobile chips within a predictable schedule. Two main reasons for this are the reliance on human decision-making to achieve the desired performance within the target area and power budget, and significant increases in complexity of the human decision-making space. The problem is that to-date human design experience has not been replaced by design automation tools, and tasks requiring experience of past designs are still being performed manually.In this paper we investigate how machine learning may be applied to develop tools that learn from experience just like human designers, thus automating tasks that still require human intervention. The potential advantage of the machine learning approach is the ability to scale with increasing complexity and therefore hold the design-time constant with same manpower.Reinforcement Learning (RL) is a machine learning technique that allows us to mimic a human designers' ability to learn from experience and automate human decision-making, without loss in quality of the design, while making the design time independent of the complexity. In this paper we show how manual design tasks can be abstracted as RL problems. Based on the experience with applying RL to one of these problems, we show that RL can automatically achieve results similar to human designs, but in a predictable schedule. However, a major drawback is that the RL solution can require a prohibitively large number of iterations for training. If efficient training techniques can be developed for RL, it holds great promise to automate tasks requiring human experience. In this paper we present a Bayesian Optimization technique for reducing the RL training time.

Rajeev Jain | Zhuo Chen | Shankar Sadasivam | Jinwon Lee

[1] Kirthevasan Kandasamy,et al. High Dimensional Bayesian Optimisation and Bandits via Additive Models , 2015, ICML.

[2] A. F. Schwarz. VLSI CHIP DESIGN , 1993 .

[3] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[4] Andreas Krause,et al. High-Dimensional Gaussian Process Bandits , 2013, NIPS.

[5] Wei Liu,et al. Enhanced Q-learning algorithm for dynamic power management with performance constraint , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[6] Abhijit Gosavi,et al. Reinforcement Learning: A Tutorial Survey and Recent Advances , 2009, INFORMS J. Comput..

[7] Adolf F. Schwarz. Handbook of VLSI chip design and expert systems , 1993 .

[8] Nando de Freitas,et al. Bayesian Optimization in High Dimensions via Random Embeddings , 2013, IJCAI.

[9] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[10] Nando de Freitas,et al. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[11] András Lörincz,et al. Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.

[12] Carl E. Rasmussen,et al. Additive Gaussian Processes , 2011, NIPS.

[13] Xi Chen,et al. Dynamic voltage and frequency scaling for shared resources in multicore processor designs , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[16] Andreas Krause,et al. Joint Optimization and Variable Selection of High-dimensional Gaussian Processes , 2012, ICML.

[17] Diana Marculescu,et al. Distributed reinforcement learning for power limited many-core system performance optimization , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).