ACTOR: Active Cloud Storage with Energy-Efficient On-Drive Data Processing

Storage systems are indispensable for big data processing and cloud computing services today. The ever-growing size of computation and data analytic results demands larger storage capacity, which challenges data processing and storage scalability. Moreover, the increasing complexity of storage hierarchy and "passive" storage devices make todays storage systems inefficient, which necessitates the adoption of new storage technologies. In this paper, we explore new Ethernet connected drives with on-drive embedded CPU and DRAM to develop an active cloud storage system where data can be processed on disk drives without data movement. These drives are micro-storage servers that can support software-defined storage. In addition to I/O operations, we test and evaluate on-drive data processing, including data compression, aggregation and erasure encoding, which provide natural support for data-intensive applications. Our experimental results show that Open Ethernet Drive can significantly lower the energy consumption while maintaining the data processing throughput simultaneously by ensuring data availability and storage scalability. Results and findings from this work will facilitate scheduling of on-drive compute resource for building active and scalable cloud storage systems.

[1]  Mohammed Ghanbari,et al.  Congestion control of video traffic with transcoders , 1997, Proceedings of ICC'97 - International Conference on Communications.

[2]  Song Fu,et al.  F-SEFI: A Fine-Grained Soft Error Fault Injection Tool for Profiling Application Vulnerability , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[3]  Peter Triantafillou,et al.  Hierarchical caching and prefetching for continuous media servers with smart disks , 2000, IEEE Concurr..

[4]  Andreas Wilke,et al.  Shock: Active Storage for Multicloud Streaming Data Analysis , 2015, 2015 IEEE/ACM 2nd International Symposium on Big Data Computing (BDC).

[5]  Robert Latham,et al.  Understanding and improving computational science storage access through continuous characterization , 2011, MSST.

[6]  Weisong Shi,et al.  DStore: A Holistic Key-Value Store Exploring Near-Data Processing and On-Demand Scheduling for Compaction Optimization , 2018, IEEE Access.

[7]  Ian Sommerville,et al.  CloudMonitor: Profiling Power Usage , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[8]  Supranamaya Ranjan,et al.  IPzip: A Stream-Aware IP Compression Algorithm , 2008, Data Compression Conference (dcc 2008).

[9]  Antonio Pescapè,et al.  Efficient Storage and Processing of High-Volume Network Monitoring Data , 2013, IEEE Transactions on Network and Service Management.

[10]  Hai Jin,et al.  Active Disks: Programming Model, Algorithms and Evaluation , 2002 .

[11]  Galen M. Shipman,et al.  Workload characterization of a leadership class storage cluster , 2010, 2010 5th Petascale Data Storage Workshop (PDSW '10).

[12]  Song Fu,et al.  Developing Cost-Effective Data Rescue Schemes to Tackle Disk Failures in Data Centers , 2018, BigData Congress.

[13]  Shuang Wu,et al.  Virtual Machine Based Energy-Efficient Data Center Architecture for Cloud Computing: A Performance Perspective , 2010, 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing.

[14]  Xian-He Sun,et al.  Towards Energy Efficient Data Management in HPC: The Open Ethernet Drive Approach , 2016, 2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS).

[15]  James S. Plank,et al.  Erasure Codes for Storage Systems: A Brief Primer , 2013, login Usenix Mag..

[16]  Christos Faloutsos,et al.  Active Disks for Large-Scale Data Processing , 2001, Computer.

[17]  Xianghua Xu,et al.  Performance Evaluation of the CPU Scheduler in XEN , 2008, 2008 International Symposium on Information Science and Engineering.

[18]  Mahmut T. Kandemir,et al.  Design and evaluation of smart disk architecture for DSS commercial workloads , 2000, Proceedings 2000 International Conference on Parallel Processing.

[19]  Jeffery A. Kuehn,et al.  The Impact of Vectorization on Erasure Code Computing in Cloud Storages - A Performance and Power Consumption Study , 2015, 2015 IEEE 8th International Conference on Cloud Computing.

[20]  Song Fu,et al.  PASSI: A Parallel, Reliable and Scalable Storage Software Infrastructure for active storage system and I/O environments , 2015, 2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC).

[21]  Jack J. Dongarra,et al.  Energy Footprint of Advanced Dense Numerical Linear Algebra Using Tile Algorithms on Multicore Architectures , 2012, 2012 Second International Conference on Cloud and Green Computing.

[22]  A. Murat Tekalp,et al.  Cross-layer design for real-time video streaming over 1xEV-DO using multiple objective optimization , 2005, GLOBECOM '05. IEEE Global Telecommunications Conference, 2005..