Skip to main content
SHARE
Publication

Building Simulation Modelers – are we big-data ready?...

by Jibonananda Sanyal, Joshua R New
Publication Type
Conference Paper
Publication Date
Page Numbers
449 to 456
Conference Name
Proceedings of the ASHRAE/IBPSA-USA Building Simulation Conference
Conference Location
Atlanta, Georgia, United States of America
Conference Date
-

Recent advances in computing and sensor technologies have pushed the amount of data we collect or generate to limits previously unheard of. Sub-minute resolution data from dozens of channels is becoming increasingly common and is expected to increase with the prevalence of non-intrusive load monitoring. Experts are running larger building simulation experiments and are faced with an increasingly complex data set to analyze and derive meaningful insight.
This paper focuses on the data management challenges that building modeling experts may face in data collected from a large array of sensors, or generated from running a large number of building energy/performance simulations. The paper highlights the technical difficulties that were encountered and overcome in order to run 3.5 million EnergyPlus simulations on supercomputers and generating over 200 TBs of simulation output. This extreme case involved development of technologies and insights that will be beneficial to modelers in the immediate future.
The paper discusses different database technologies (including relational databases, columnar storage, and schema-less Hadoop) in order to contrast the advantages and disadvantages of employing each for storage of EnergyPlus output. Scalability, analysis requirements, and the adaptability of these database technologies are discussed. Additionally, unique attributes of EnergyPlus output are highlighted which make data-entry non-trivial for multiple simulations. Practical experience regarding cost-effective strategies for big-data storage is provided. The paper also discusses network performance issues when transferring large amounts of data across a network to different computing devices. Practical issues involving lag, bandwidth, and methods for synchronizing or transferring logical portions of the data are presented.
A cornerstone of big-data is its use for analytics; data is useless unless information can be meaningfully derived from it. In addition to technical aspects of managing big data, the paper details design of experiments in anticipation of large volumes of data. The cost of re-reading output into an analysis program is elaborated and analysis techniques that perform analysis in-situ with the simulations as they are run are discussed. The paper concludes with an example and elaboration of the tipping point where it becomes more expensive to store the output than re-running a set of simulations.