Skip to main content
SHARE
Publication

Provenance In Sensor Data Management: A Cohesive, Independent Solution...

by Zachary P Hensley, Jibonananda Sanyal, Joshua R New
Publication Type
Journal
Journal Name
Communications of the ACM
Publication Date
Page Numbers
55 to 62
Volume
57
Issue
2

In today's information-driven workplaces, data is constantly undergoing transformations and being moved around. The typical business-as-usual approach is to use email attachments, shared network locations, databases, and now, the cloud. More often than not, there are multiple versions of the data sitting in different locations and users of this data are confounded by the lack of metadata describing its provenance, or in other words, its lineage. Our project is aimed to solve this issue in the context of sensor data. The Oak Ridge National Laboratory's Building Technologies Research and Integration Center has reconfigurable commercial buildings deployed on the Flexible Research Platforms (FRPs). These FRPs are instrumented with a large number of sensors which measure a number of variables such as HVAC efficiency, relative humidity, and temperature gradients across doors, windows, and walls. Sub-minute resolution data from hundreds of channels is acquired. This sensor data, traditionally, was saved to a shared network location which was accessible to a number of scientists for performing complicated simulation and analysis tasks. The sensor data also participates in elaborate quality assurance exercises as a result of inherent faults. Sometimes, faults are induced to observe building behavior.

It became apparent that proper scientific controls required not just managing the data acquisition and delivery, but to also manage the metadata associated with temporal subsets of the sensor data. We built a system named ProvDMS, or Provenance Data Management System for the FRPs, which would both allow researchers to retrieve data of interest as well as trace data lineage. This provides researchers a one-stop shop for comprehensive views of various data transformation allowing researchers to effectively trace their data to its source so that experiments, and derivations of experiments, may be reused and reproduced without much overhead of the repeatability of experiments that use it. Using these traces, researchers can determine exactly what happens to data as it moves through its life cycle.