Autonomous HVAC Control, A Reinforcement Learning Approach

—Recent high profile developments of autonomous learning thermostats by companies such as Nest Labs and Honeywell have brought to the fore the possibility of ever greater numbers of intelligent devices permeating our homes and working environments into the future. However, the specific learning approaches and methodologies utilised by these devices have never been made public. In fact little information is known as to the specifics of how these devices operate and learn about their environments or the users who use them. This paper proposes a suitable learning architecture for such an intelligent thermostat in the hope that it will benefit further investigation by the research community. Our architecture comprises a number of different learning methods each of which contributes to create a complete autonomous thermostat capable of controlling a HVAC system. A novel state action space formalism is proposed to enable a Reinforcement Learning agent to successfully control the HVAC system by optimising both occupant comfort and energy costs. Our results show that the learning thermostat can achieve cost savings of 10% over a programmable thermostat, whilst maintaining high occupant comfort standards.

[1]  Enda Barrett,et al.  A parallel framework for Bayesian reinforcement learning , 2014, Connect. Sci..

[2]  Enda Barrett,et al.  Applying reinforcement learning towards automating resource allocation and application scalability in the cloud , 2013, Concurr. Comput. Pract. Exp..

[3]  John Krumm,et al.  PreHeat: controlling home heating using occupancy prediction , 2011, UbiComp '11.

[4]  Enda Barrett,et al.  A Learning Architecture for Scheduling Workflow Applications in the Cloud , 2011, 2011 IEEE Ninth European Conference on Web Services.

[5]  Isis Truck,et al.  Using Reinforcement Learning for Autonomic Resource Allocation in Clouds: towards a fully automated workflow , 2011 .

[6]  Daniel Kudenko,et al.  Learning Shaping Rewards in Model-based Reinforcement Learning , 2009 .

[7]  Prashant Doshi,et al.  Dynamic workflow composition using Markov decision processes , 2004, Proceedings. IEEE International Conference on Web Services, 2004..

[8]  Jeffrey O. Kephart,et al.  Pricing in Agent Economies Using Multi-Agent Q-Learning , 2002, Autonomous Agents and Multi-Agent Systems.

[9]  Paolo Traverso,et al.  Automated Planning: Theory & Practice , 2004 .

[10]  Fakhreddine O. Karray,et al.  Soft Computing and Intelligent Systems Design, Theory, Tools and Applications , 2006, IEEE Transactions on Neural Networks.

[11]  Marco Wiering,et al.  Multi-Agent Reinforcement Learning for Traffic Light control , 2000 .

[12]  Malcolm J. A. Strens,et al.  A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[13]  Dit-Yan Yeung,et al.  Predictive Q-Routing: A Memory-based Reinforcement Learning Approach to Adaptive Traffic Control , 1995, NIPS.

[14]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[15]  Luca Maria Gambardella,et al.  Ant-Q: A Reinforcement Learning Approach to the Traveling Salesman Problem , 1995, ICML.

[16]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[17]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[18]  David J. Spiegelhalter,et al.  Bayesian analysis in expert systems , 1993 .