Abstract
Specialized, transient data services are playing an increasingly prominent role in data-intensive scientific computing. These services offer flexible, on-demand pairing of applications with storage hardware using semantics that are optimized for the problem domain. Concurrent with this trend, upcoming scientific computing and big data systems will be deployed with emerging NVM technology to achieve the highest possible price/productivity ratio. Clearly, therefore, we must develop techniques to facilitate the confluence of specialized data services and NVM technology.
In this work we explore how to enable the composition of NVM resources within transient distributed services while still retaining their essential performance characteristics. Our approach involves eschewing the conventional distributed file system model and instead projecting NVM devices as remote microservices that leverage user-level threads, RPC services, RMA-enabled network transports, and persistent memory libraries in order to maximize performance. We describe a prototype system that incorporates these concepts, evaluate its performance for key workloads on an exemplar system, and discuss how the system can be leveraged as a component of future data-intensive architectures.