Online data deduplication for in-memory big-data analytic systems
暂无分享,去创建一个
Yushi Sun | Zhe Huang | Catherine Y. Zeng | Jaeyoon Chung | Zhe Huang | Jaeyoon Chung | Yushi Sun | Catherine Y. Zeng
[1] Joseph M. Hellerstein,et al. MapReduce Online , 2010, NSDI.
[2] Marvin Theimer,et al. Reclaiming space from duplicate files in a serverless distributed file system , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.
[3] George Karypis,et al. A Comparison of Document Clustering Techniques , 2000 .
[4] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.
[5] Mung Chiang,et al. SAP: Similarity-aware partitioning for efficient cloud storage , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.
[6] Dutch T. Meyer,et al. A study of practical deduplication , 2011, TOS.
[7] Jinyang Li,et al. Piccolo: Building Fast, Distributed Programs with Partitioned Tables , 2010, OSDI.
[8] Irfan Ahmad,et al. Decentralized Deduplication in SAN Cluster File Systems , 2009, USENIX Annual Technical Conference.
[9] Sean Matthew Dorward,et al. Awarded Best Paper! - Venti: A New Approach to Archival Data Storage , 2002 .
[10] Brenda S. Baker,et al. On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.
[11] Nasir D. Memon,et al. Cluster-based delta compression of a collection of files , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002..
[12] Udi Manber,et al. GLIMPSE: A Tool to Search Through Entire File Systems , 1994, USENIX Winter.
[13] Michael J. Franklin,et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.
[14] Torsten Suel,et al. Compressing File Collections with a TSP-Based Approach , 2004 .
[15] George Forman,et al. Finding similar files in large document repositories , 2005, KDD '05.
[16] Micah Adler,et al. Towards compressing Web graphs , 2001, Proceedings DCC 2001. Data Compression Conference.
[17] Darrell D. E. Long,et al. Duplicate Data Elimination in a SAN File System , 2004, MSST.
[18] Carlos Guestrin,et al. Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .
[19] Kai Li,et al. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.
[20] Udi Manber,et al. Finding Similar Files in a Large File System , 1994, USENIX Winter.