Ripple: Home Automation for Research Data Management

Exploding data volumes and acquisition rates, plus ever more complex research processes, place significant strain on research data management processes. It is increasingly common for data to flow through pipelines comprised of dozens of different management, organization, and analysis steps distributed across multiple institutions and storage systems. To alleviate the resulting complexity, we propose a home automation approach to managing data throughout its lifecycle, in which users specify via high-level rules the actions that should be performed on data at different times and locations. To this end, we have developed Ripple, a responsive storage architecture that allows users to express data management tasks via a rules notation. Ripple monitors storage systems for events, evaluates rules, and uses serverless computing techniques to execute actions in response to these events. We evaluate our solution by applying Ripple to the data lifecycles of two real-world projects, in astronomy and light source science, and show that it can automate many mundane and cumbersome data management processes.