Leveraging hadoop framework to develop duplication detector and analysis using Mapreduce, Hive and Pig
暂无分享,去创建一个
[1] Sean Matthew Dorward,et al. Awarded Best Paper! - Venti: A New Approach to Archival Data Storage , 2002 .
[2] GhemawatSanjay,et al. The Google file system , 2003 .
[3] Carlos Maltzahn,et al. RADOS: a scalable, reliable storage service for petabyte-scale storage clusters , 2007, PDSW '07.
[4] Darrell D. E. Long,et al. Duplicate Data Elimination in a SAN File System , 2004, MSST.
[5] Philip Hunter. Journey to the centre of big data , 2013 .
[6] Kai Li,et al. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.
[7] Zhanhuai Li,et al. Data deduplication techniques , 2010, 2010 International Conference on Future Information Technology and Management Engineering.
[8] Val Henson,et al. An Analysis of Compare-by-hash , 2003, HotOS.
[9] Gregory R. Ganger,et al. Ursa minor: versatile cluster-based storage , 2005, FAST'05.
[10] John Black,et al. Compare-by-Hash: A Reasoned Analysis , 2006, USENIX Annual Technical Conference, General Track.
[11] K. Bakshi,et al. Considerations for big data: Architecture and approach , 2012, 2012 IEEE Aerospace Conference.
[12] Zhe SUN,et al. P2CP: a new cloud storage model to enhance performance of cloud services , 2011 .
[13] Mark Lillibridge,et al. Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality , 2009, FAST.
[14] Chandramohan A. Thekkath,et al. Petal: distributed virtual disks , 1996, ASPLOS VII.
[15] Piyush Malik,et al. Governing Big Data: Principles and practices , 2013, IBM J. Res. Dev..
[16] Howard Gobioff,et al. The Google file system , 2003, SOSP '03.
[17] Michal Kaczmarczyk,et al. HYDRAstor: A Scalable Secondary Storage , 2009, FAST.
[18] Zhe Sun,et al. DeDu: Building a deduplication storage system over cloud computing , 2011, Proceedings of the 2011 15th International Conference on Computer Supported Cooperative Work in Design (CSCWD).
[19] Hong Jiang,et al. MAD2: A scalable high-throughput exact deduplication approach for network backup services , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).
[20] Qian Xu,et al. Compression-aware I/O performance analysis for big data clustering , 2012, BigMine '12.
[21] Bin Zhou,et al. Scalable Performance of the Panasas Parallel File System , 2008, FAST.
[22] Arif Merchant,et al. FAB: building distributed enterprise disk arrays from commodity components , 2004, ASPLOS XI.
[23] Irfan Ahmad,et al. Decentralized Deduplication in SAN Cluster File Systems , 2009, USENIX Annual Technical Conference.
[24] Carlos Maltzahn,et al. Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.
[25] Mark Lillibridge,et al. Extreme Binning: Scalable, parallel deduplication for chunk-based file backup , 2009, 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems.
[26] Randy H. Katz,et al. How Hadoop Clusters Break , 2013, IEEE Software.
[27] David Geer. Reducing the Storage Burden via Data Deduplication , 2008, Computer.
[28] Tom White,et al. Hadoop: The Definitive Guide , 2009 .
[29] Miguel Castro,et al. Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OPSR.