Building Simulation Modelers are we big-data ready?

Recent advances in computing and sensor technologies have pushed the amount of data we collect or generate to limits previously unheard of. Sub-minute resolution data from dozens of channels is becoming increasingly common and is expected to increase with the prevalence of non-intrusive load monitoring. Experts are running larger building simulation experiments and are faced with an increasingly complex data set to analyze and derive meaningful insight. This paper focuses on the data management challenges that building modeling experts may face in data collected from a large array of sensors, or generated from running a large number of building energy/performance simulations. The paper highlights the technical difficulties that were encountered and overcome in order to run 3.5 million EnergyPlus simulations on supercomputers and generating over 200 TBs of simulation output. This extreme case involved development of technologies and insights that will be beneficial to modelers in the immediate future. The paper discusses different database technologies (including relational databases, columnar storage, and schema-less Hadoop) in order to contrast the advantages and disadvantages of employing each for storage of EnergyPlus output. Scalability, analysis requirements, and the adaptability of these database technologies are discussed. Additionally, unique attributes of EnergyPlus output are highlighted whichmore » make data-entry non-trivial for multiple simulations. Practical experience regarding cost-effective strategies for big-data storage is provided. The paper also discusses network performance issues when transferring large amounts of data across a network to different computing devices. Practical issues involving lag, bandwidth, and methods for synchronizing or transferring logical portions of the data are presented. A cornerstone of big-data is its use for analytics; data is useless unless information can be meaningfully derived from it. In addition to technical aspects of managing big data, the paper details design of experiments in anticipation of large volumes of data. The cost of re-reading output into an analysis program is elaborated and analysis techniques that perform analysis in-situ with the simulations as they are run are discussed. The paper concludes with an example and elaboration of the tipping point where it becomes more expensive to store the output than re-running a set of simulations.« less

[1]  Jim Gray,et al.  The Transaction Concept: Virtues and Limitations (Invited Paper) , 1981, VLDB.

[2]  Keith Gordon,et al.  What is Big Data , 2013 .

[3]  ArtemTrunov,et al.  Peer—to—Peer Computing for secure High Performance Data Copying , 2001 .

[4]  M. Stein Large sample properties of simulations using latin hypercube sampling , 1987 .

[5]  Philip Farese,et al.  Tool to Prioritize Energy Efficiency Investments , 2012 .

[6]  Peter Snyder,et al.  tmpfs: A Virtual Memory File System , 1990 .

[7]  Charles C. Castello,et al.  Machine Learning Techniques Applied to Sensor Data Correction in Building Technologies , 2013, 2013 12th International Conference on Machine Learning and Applications.

[8]  Michael S. Eldred,et al.  OVERVIEW OF MODERN DESIGN OF EXPERIMENTS METHODS FOR COMPUTATIONAL SIMULATIONS , 2003 .

[9]  Joshua Ryan New,et al.  Sensor Data Management, Validation, Correction, and Provenance for Building Technologies , 2014 .

[10]  Robert L. Mason,et al.  Statistical Principles in Experimental Design , 2003 .

[11]  Joshua Ryan New,et al.  Autonomous Correction of Sensor Data Applied to Building Technologies Utilizing Statistical Processing Methods , 2012 .

[12]  Beatrice Gralton,et al.  Washington DC - USA , 2008 .

[13]  Paul Mackerras,et al.  The rsync algorithm , 1996 .

[14]  Evangelos P. Markatos,et al.  The Network RamDisk: Using remote memory on heterogeneous NOWs , 1999, Cluster Computing.

[15]  William E. Allcock,et al.  The Globus Striped GridFTP Framework and Server , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[16]  Jim Gray,et al.  The Transaction Concept: Virtues and Limitations (Invited Paper) , 1981, VLDB.

[17]  Eric C. Larson,et al.  Disaggregated End-Use Energy Sensing for the Smart Grid , 2011, IEEE Pervasive Computing.

[18]  Charles C. Castello,et al.  Autonomous correction of sensor data applied to building technologies using filtering methods , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[19]  Jibonananda Sanyal,et al.  Provenance in Sensor Data Management , 2013 .

[20]  Shwetak N. Patel,et al.  The design and evaluation of an end-user-deployable, whole house, contactless power consumption sensor , 2010, CHI.

[21]  Lynne E. Parker,et al.  Calibrating building energy models using supercomputer trained machine learning agents , 2014, Concurr. Comput. Pract. Exp..