A Large-Scale Study on Predicting and Contextualizing Building Energy Usage

In this paper we present a data-driven approach to modeling end user energy consumption in residential and commercial buildings. Our model is based upon a data set of monthly electricity and gas bills, collected by a utility over the course of several years, for approximately 6,500 buildings in Cambridge, MA. In addition, we use publicly available tax assessor records and geographical survey information to determine corresponding features for the buildings. Using both parametric and non-parametric learning methods, we learn models that predict distributions over energy usage based upon these features, and use these models to develop two end-user systems. For utilities or authorized institutions (those who may obtain access to the full data) we provide a system that visualizes energy consumption for each building in the city; this allows companies to quickly identify outliers (buildings which use much more energy than expected even after conditioning on the relevant predictors), for instance allowing them to target homes for potential retrofits or tiered pricing schemes. For other end users, we provide an interface for entering their own electricity and gas usage, along with basic information about their home, to determine how their consumption compares to that of similar buildings as predicted by our model. Merely allowing users to contextualize their consumption in this way, relating it to the consumption in similar buildings, can itself produce behavior changes to significantly reduce consumption.

[1]  H. Allcott,et al.  Social Norms and Energy Conservation , 2011 .

[2]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[3]  Tong Zhang,et al.  On the Consistency of Feature Selection using Greedy Least Squares Regression , 2009, J. Mach. Learn. Res..

[4]  Aki Vehtari,et al.  Gaussian process regression with Student-t likelihood , 2009, NIPS.

[5]  Dirk Helbing,et al.  Scaling laws in urban supply networks , 2006 .

[6]  Daniel T. Cassidy,et al.  Pricing European options with a log Students t-distribution: A Gosset formula , 2009, 0906.4092.

[7]  Barbara T. Fichman Annual Energy Review 2009 , 2010 .

[8]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[9]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[10]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[11]  Lucio Soibelman,et al.  Learning Systems for Electric Consumption of Buildings , 2009 .

[12]  W. Stahel,et al.  Log-normal Distributions across the Sciences: Keys and Clues , 2001 .

[13]  Aki Vehtari,et al.  Bayesian Modeling with Gaussian Processes using the MATLAB Toolbox GPstuff (v3.3) , 2012, ArXiv.

[14]  End Use Annual energy review , 1984 .

[15]  Samuel Kotz,et al.  Multivariate T-Distributions and Their Applications , 2004 .

[16]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[17]  U. Feige,et al.  Spectral Graph Theory , 2015 .