An in-memory based framework for scientific data analytics

This work presents the I/O in-memory server implemented in the context of the Ophidia framework, a big data analytics stack addressing scientific data analysis of n-dimensional datasets. The provided I/O server represents a key component in the Ophidia 2.0 architecture proposed in this paper. It exploits (i) a NoSQL approach to manage scientific data at the storage level, (ii) user-defined functions to perform array-based analytics, (iii) the Ophidia Storage API to manage heterogeneous back-ends through a plugin-based approach, and (iv) an in-memory and parallel analytics engine to address high scalability and performance. Preliminary performance results about a statistical analytics kernel benchmark performed on a HPC cluster running at the CMCC SuperComputing Centre are provided in this paper.

[1]  Dean N. Williams,et al.  A workflow-enabled big data analytics software stack for escience , 2015, 2015 International Conference on High Performance Computing & Simulation (HPCS).

[2]  Gordon Bell,et al.  Beyond the Data Deluge , 2009, Science.

[3]  Ian T. Foster,et al.  A security architecture for computational grids , 1998, CCS '98.

[4]  Roy T. Fielding,et al.  Principled design of the modern Web architecture , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[5]  Beng Chin Ooi,et al.  In-Memory Big Data Management and Processing: A Survey , 2015, IEEE Transactions on Knowledge and Data Engineering.

[6]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[7]  E. Hirschorn,et al.  Open Geospatial Consortium , 2004 .

[8]  Parag Agrawal,et al.  The case for RAMClouds: scalable high-performance storage entirely in DRAM , 2010, OPSR.

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Ian T. Foster,et al.  Ophidia: Toward Big Data Analytics for eScience , 2013, ICCS.

[11]  Neal Leavitt,et al.  Will NoSQL Databases Live Up to Their Promise? , 2010, Computer.

[12]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[13]  Michael Stonebraker,et al.  The Architecture of SciDB , 2011, SSDBM.

[14]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[15]  David J. DeWitt,et al.  Scientific data management in the coming decade , 2005, SGMD.