论文信息 - An in-memory based framework for scientific data analytics

An in-memory based framework for scientific data analytics

This work presents the I/O in-memory server implemented in the context of the Ophidia framework, a big data analytics stack addressing scientific data analysis of n-dimensional datasets. The provided I/O server represents a key component in the Ophidia 2.0 architecture proposed in this paper. It exploits (i) a NoSQL approach to manage scientific data at the storage level, (ii) user-defined functions to perform array-based analytics, (iii) the Ophidia Storage API to manage heterogeneous back-ends through a plugin-based approach, and (iv) an in-memory and parallel analytics engine to address high scalability and performance. Preliminary performance results about a statistical analytics kernel benchmark performed on a HPC cluster running at the CMCC SuperComputing Centre are provided in this paper.

[1] Dean N. Williams,et al. A workflow-enabled big data analytics software stack for escience , 2015, 2015 International Conference on High Performance Computing & Simulation (HPCS).

[2] Gordon Bell,et al. Beyond the Data Deluge , 2009, Science.

[3] Ian T. Foster,et al. A security architecture for computational grids , 1998, CCS '98.

[4] Roy T. Fielding,et al. Principled design of the modern Web architecture , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[5] Beng Chin Ooi,et al. In-Memory Big Data Management and Processing: A Survey , 2015, IEEE Transactions on Knowledge and Data Engineering.

[6] Michael J. Franklin,et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[7] E. Hirschorn,et al. Open Geospatial Consortium , 2004 .

[8] Parag Agrawal,et al. The case for RAMClouds: scalable high-performance storage entirely in DRAM , 2010, OPSR.

[9] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10] Ian T. Foster,et al. Ophidia: Toward Big Data Analytics for eScience , 2013, ICCS.

[11] Neal Leavitt,et al. Will NoSQL Databases Live Up to Their Promise? , 2010, Computer.

[12] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[13] Michael Stonebraker,et al. The Architecture of SciDB , 2011, SSDBM.

[14] Hairong Kuang,et al. The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[15] David J. DeWitt,et al. Scientific data management in the coming decade , 2005, SGMD.