Capability-aware data placement for heterogeneous active storage systems

By moving computations from computing nodes to storage nodes, active storage technology provides an efficient for data-intensive high-performance computing applications. The existing studies have neglected the heterogeneity of storage nodes on the performance of active storage systems. We introduce CADP, a capability-aware data placement scheme for heterogeneous active storage systems to obtain high-performance data processing. The basic idea of CADP is to place data on storage nodes based on their computing capability and storage capability, so that the load-imbalance among heterogeneous servers can be avoided. We have implemented CADP under a parallel I/O system. The experimental results show that the proposed capability-aware data placement scheme can improve the active storage system performance significantly.

[1]  Samuel Lang,et al.  A Segment-Level Adaptive Data Layout Scheme for Improved Load Balance in Parallel File Systems , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[2]  Xian-He Sun,et al.  A cost-intelligent application-specific data layout scheme for parallel file systems , 2011, HPDC '11.

[3]  A. L. Narasimha Reddy,et al.  MVSS: An Active Storage Architecture , 2003, IEEE Trans. Parallel Distributed Syst..

[4]  Yang Wang,et al.  Boosting Parallel File System Performance via Heterogeneity-Aware Selective Data Layout , 2016, IEEE Transactions on Parallel and Distributed Systems.

[5]  Chao Chen,et al.  Dynamic Active Storage for High Performance I/O , 2012, 2012 41st International Conference on Parallel Processing.

[6]  Jesús Labarta,et al.  Taking advantage of heterogeneity in disk arrays , 2003, J. Parallel Distributed Comput..

[7]  Shuibing He,et al.  Skewed Data Distribution for Active Storage Systems on Hybrid Servers , 2016 .

[8]  Christos Faloutsos,et al.  Active Storage for Large-Scale Data Mining and Multimedia , 1998, VLDB.

[9]  Yang Wang,et al.  A Heterogeneity-Aware Region-Level Data Layout for Hybrid Parallel File Systems , 2015, 2015 44th International Conference on Parallel Processing.

[10]  Jarek Nieplocha,et al.  Evaluation of active storage strategies for the lustre parallel file system , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[11]  Dana N. Xu,et al.  Design of an Intelligent Object-based Storage device , 2009 .

[12]  Robert B. Ross,et al.  Enabling active storage on parallel I/O software stacks , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[13]  Mahmut T. Kandemir,et al.  Improving I/O Performance of Applications through Compiler-Directed Code Restructuring , 2008, FAST.

[14]  John A. Chandy,et al.  Active storage using object-based devices , 2008, 2008 IEEE International Conference on Cluster Computing.

[15]  Wei-keng Liao,et al.  Design and Evaluation of Distributed Smart Disk Architecture for I/O-Intensive Workloads , 2003, International Conference on Computational Science.

[16]  Xian-He Sun,et al.  HAS: Heterogeneity-Aware Selective Data Layout Scheme for Parallel File Systems on Hybrid Servers , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[17]  David A. Patterson,et al.  A case for intelligent disks (IDISKs) , 1998, SGMD.

[18]  Xian-He Sun,et al.  Performance-Aware Data Placement in Hybrid Parallel File Systems , 2014, ICA3PP.

[19]  G. Jack Lipovski,et al.  CASSM: a cellular system for very large data bases , 1975, VLDB '75.

[20]  Yulai Xie,et al.  Design and evaluation of Oasis: An active storage framework based on T10 OSD standard , 2011, 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST).

[21]  Chao Chen,et al.  DOSAS: Mitigating the Resource Contention in Active Storage Systems , 2012, 2012 IEEE International Conference on Cluster Computing.

[22]  Tao Yang,et al.  The Panasas ActiveScale Storage Cluster - Delivering Scalable High Bandwidth Storage , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[23]  Xin Qi Lossless Recovery of Multiple Decryption Capability and Progressive Visual Secret Sharing , 2016 .

[24]  Dan Feng,et al.  Active Storage Framework for Object-based Storage Device , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).

[25]  Andrea C. Arpaci-Dusseau,et al.  Evolving RPC for active storage , 2002, ASPLOS X.

[26]  Hai Jin,et al.  Active Disks: Programming Model, Algorithms and Evaluation , 2002 .

[27]  Mahadev Satyanarayanan,et al.  Diamond: A Storage Architecture for Early Discard in Interactive Search , 2004, FAST.

[28]  Peter Desnoyers,et al.  Active flash: towards energy-efficient, in-situ data analytics on extreme-scale machines , 2013, FAST.

[29]  Douglas Thain,et al.  DataLab: transactional data-parallel computing on an active storage cloud , 2008, HPDC '08.

[30]  Kenneth C. Smith,et al.  RAP: an associative processor for data base management , 1975, AFIPS '75.

[31]  Rajeev Thakur,et al.  A Server-Level Adaptive Data Layout Strategy for Parallel File Systems , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[32]  Yan Liu,et al.  PSA: A Performance and Space-Aware Data Layout Scheme for Hybrid Parallel File Systems , 2014, 2014 International Workshop on Data Intensive Scalable Computing Systems.

[33]  Shuibing He,et al.  Oasa: An Active Storage Architecture for Object-based Storage System , 2012, Int. J. Comput. Intell. Syst..