Abstract
Data aggregation (a.k.a reduce operations) is an important element in information processing systems, including MapReduce clusters and cyberphysical networks. Unlike simple sensor networks, all the data in information processing systems must be eventually aggregated. Our goal is to lower overall latency in these systems by intelligently scheduling aggregation on intermediate routing nodes. Unlike previous models, our model explicitly takes into account link latency and computa- tion time. Our model also considers heterogeneous computing capabilities.
In order to understand the potential challenges associated with constructing a distributed scheduler that minimizes la- tency, we’ve developed a simulation of our model and tested the results of randomly scheduling nodes. Although these experiments were designed to provide data for a null-model, preliminary results have yielded a few interesting observations. We show that in cases where the computation time is larger than transmission time, in-network aggregation can have a large effect (reducing latency by 50% or more), but that naive scheduling can have a detrimental effect. Specifically, we show that when the root node (a.k.a the basestation) is faster than the other nodes, the latency can increase with increased coverage, and that these effects vary with the number of nodes present.