IM-Dedup: An Image Management System Based on Deduplication Applied in DWSNs

In distributed wireless sensor networks (DWSNs), the data gathered by sink is always massive and consumes a lot of resources. It is suitable for cloud computing platform to apply service in data processing system. In cloud computing, IAAS platform provides services and calculation to the user through the virtual machine. The management of virtual machine images not only consumes a huge amount of storage space but also gives large pressure on network transmission. By using deduplication technology in openstack, this paper designed and implemented, an image management system IM-dedup, which uses static chunking (SC) to divide image file into blocks of data, avoid duplication data blocks transmission on network by using fingerprint pretransmission technology, and reduce storage space by deploying kernel mode file system with deduplication in the image storage server. The experimental results showed that the system not only reduced 80% usage of the virtual machine image storage, but also saved at least 30% of transmission time. Furthermore, the research on virtual machine image format showed that “VMWare Virtual Machine Disk Format” (VMDK), “Virtual Desktop Infrastructure” (VDI), “QEMU Copy On Write2” (QCOW2), and RAW image formats are more suitable for the IM-dedup system.

[1]  Hong Jiang,et al.  SAM: A Semantic-Aware Multi-tiered Source De-duplication Framework for Cloud Backup , 2010, 2010 39th International Conference on Parallel Processing.

[2]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[3]  Aleksey Pesterev,et al.  Fast, Inexpensive Content-Addressed Storage in Foundation , 2008, USENIX Annual Technical Conference.

[4]  David Mazières,et al.  A low-bandwidth network file system , 2001, SOSP.

[5]  Ethan L. Miller,et al.  The effectiveness of deduplication on virtual machine disk images , 2009, SYSTOR '09.

[6]  Mark Lillibridge,et al.  Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality , 2009, FAST.

[7]  Mark Lillibridge,et al.  Extreme Binning: Scalable, parallel deduplication for chunk-based file backup , 2009, 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems.

[8]  Kai Li,et al.  Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.

[9]  Sean Quinlan,et al.  Venti: A New Approach to Archival Storage , 2002, FAST.

[10]  Christos T. Karamanolis,et al.  Evaluation of Efficient Archival Storage Techniques , 2004, MSST.

[11]  Fang Liu,et al.  AA-Dedupe: An Application-Aware Source Deduplication Approach for Cloud Backup Services in the Personal Computing Environment , 2011, 2011 IEEE International Conference on Cluster Computing.

[12]  John C. S. Lui,et al.  Live Deduplication Storage of Virtual Machine Images in an Open-Source Cloud , 2011, Middleware.

[13]  Yang Pin Deduplication-based file backup system for multiuser , 2011 .

[14]  André Brinkmann,et al.  Multi-level comparison of data deduplication in a backup scenario , 2009, SYSTOR '09.

[15]  Brian D. Noble,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Pastiche: Making Backup Cheap and Easy , 2022 .