Cold data eviction using node congestion probability for HDFS based on Hybrid SSD

Data exist in various persistent-storage formats, and Hadoop distributed file system (HDFS) has been recognized to be effective for distributed storage and processing. Recently, the research of Hybrid NAND flash-based solid state drives (SSD) is rapidly expanding into the storage areas including Hybrid ReRAM/MLC NAND SSD. Most existing researches of Hybrid SSD are based on a single storage, while the management of multiple nodes like HDFS is still immature. In this paper a new efficient cold data eviction scheme is proposed which is based on the state of node congestion of Hybrid SSD for HDFS. It computer simulation reveals that the proposed scheme significantly reduces average recovery and execution time in comparison to the existing replication schemes.

[1]  Jon B. Weissman,et al.  Adaptive middleware supporting scalable performance for high-end network services , 2009, J. Netw. Comput. Appl..

[2]  Ken Takeuchi,et al.  A High Performance and Energy-Efficient Cold Data Eviction Algorithm for 3D-TSV Hybrid ReRAM/MLC NAND SSD , 2014, IEEE Transactions on Circuits and Systems I: Regular Papers.

[3]  P. Gács,et al.  Algorithms , 1992 .

[4]  Kavitha Ranganathan,et al.  Identifying Dynamic Replication Strategies for a High-Performance Data Grid , 2001, GRID.

[5]  Luciano Serafini,et al.  Towards an Economy-Based Optimisation of File Access and Replication on a Data Grid , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[6]  Satoshi Matsuoka,et al.  Access-pattern and bandwidth aware file replication algorithm in a grid environment , 2008, 2008 9th IEEE/ACM International Conference on Grid Computing.

[7]  Yuping Zhang,et al.  A Dynamic Optimal Replication Strategy in Data Grid Environment , 2010, 2010 International Conference on Internet Technology and Applications.

[8]  Ken Takeuchi,et al.  x11 performance increase, x6.9 endurance enhancement, 93% energy reduction of 3D TSV-integrated hybrid ReRAM/MLC NAND SSDs by data fragmentation suppression , 2012, 2012 Symposium on VLSI Circuits (VLSIC).

[9]  Tao Xie,et al.  FIRE: A File Reunion Based Data Replication Strategy for Data Grids , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[10]  Antony Selvadoss Thanamani,et al.  Dynamic replication in a data grid using a Modified BHR Region Based Algorithm , 2011, Future Gener. Comput. Syst..

[11]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[12]  E. Deelman,et al.  Data replication strategies in grid environments , 2002, Fifth International Conference on Algorithms and Architectures for Parallel Processing, 2002. Proceedings..

[13]  Rajkumar Buyya,et al.  A taxonomy of Data Grids for distributed data sharing, management, and processing , 2005, CSUR.

[14]  Thomas L. Magnanti,et al.  Applied Mathematical Programming , 1977 .

[15]  Ruay-Shiung Chang,et al.  A dynamic data replication strategy using access-weights in data grids , 2008, The Journal of Supercomputing.