dataClay: The Integration of Persistent Data, Parallel Programming Models, and True Sharing

Summary form only given. Since the beginning, persistent data and non-persistent data have been treated as two separate abstractions. A clear example is that the model used to store data into volatile memory (mainly objects an their relations) is completely different from the model used to store the same data into a persistent storage (mainly tables or files). This differentiation between data has many negative side effects because persistent data cannot be integrated into the programming model. This lack of integration causes,among others, the following problems i) moving computation to the data becomes a complex task (deployment can become an arduous task), ii) the extraction of potential data parallelism by the programming model is very difficult (the programming model is unaware of where the data really is), and iii) offering a mechanisms to really share data without taking the control from the data owner becomes nearly impossible (we will show that today data is not really shared).In this talk, we will present data Clay, a new-generation object storage and its integration with the COMPSs programming model. This new way to handle data (and code), and its perfect fit with a parallel programming model will eliminate all the aforementioned problems easing the task of implementing data-centric programs while taking full advantage of the available parallelism.