Abstract
Deep Reinforcement Learning (DRL) has started showing success in real-world applications such as building energy optimization. Much of the research in this space utilized simulated environments to train RL-agent in an offline mode. Very few research have used DRL-based control in real-world systems due to two main reasons: 1) sample efficiency challenge---DRL approaches need to perform a lot of interactions with the environment to collect sufficient experiences to learn from, which is difficult in real systems, and 2) comfort or safety related constraints---user's comfort must never or at least rarely be violated. In this work, we propose a novel deep Reinforcement Learning framework with online Data Augmentation (RLDA) to address the sample efficiency challenge of real-world RL. We used a time series Generative Adversarial Network (TimeGAN) architecture as a data generator. We further evaluated the proposed RLDA framework using a case study of an intelligent HVAC control. With a ≈28% improvement in the sample efficiency, RLDA framework lays the way towards increased adoption of DRL-based intelligent control in real-world building energy management systems.