Ignem: Upward Migration of Cold Data in Big Data File Systems

This paper investigates whether migrating cold data can yield significant speedup for big data jobs that run on modern big data file systems. Our work is motivated by two observations. First, improving the input stage of a job can provide significant speedup because many jobs spend a large part of their execution reading inputs. The second observation is that the inputs for many jobs are cold. Common techniques that aim to keep hot data in memory do not benefit these jobs. We analyze the Google production cluster trace data and find that the key ingredients for effectively migrating cold data do exist in such production environments. Encouraged by our findings, we design and implement Ignem, a framework for migrating cold data in big data file systems. We evaluate Ignem in a series of experiments and show that it provides significant speedup for both small and large jobs. Specifically, Hive queries are accelerated by up to 34%; the mean job duration in a trace-driven workload is reduced by 12% and the task duration by nearly 40%; other standalone jobs such as sort and wordcount also improve similarly by up to 30%.

[1]  Tim Kraska,et al.  Tupleware: Redefining Modern Analytics , 2014, ArXiv.

[2]  GhemawatSanjay,et al.  The Google file system , 2003 .

[3]  Anna R. Karlin,et al.  A study of integrated prefetching and caching strategies , 1995, SIGMETRICS '95/PERFORMANCE '95.

[4]  Scott Shenker,et al.  Usenix Association 10th Usenix Symposium on Networked Systems Design and Implementation (nsdi '13) 185 Effective Straggler Mitigation: Attack of the Clones , 2022 .

[5]  Srikanth Kandula,et al.  Efficient queue management for cluster scheduling , 2016, EuroSys.

[6]  Antony I. T. Rowstron,et al.  Rhea: Automatic Filtering for Unstructured Cloud Storage , 2013, NSDI.

[7]  Zheng Shao,et al.  Data warehousing and analytics infrastructure at facebook , 2010, SIGMOD Conference.

[8]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[9]  Chao Li,et al.  Fuxi: a Fault-Tolerant Resource Management and Job Scheduling System at Internet Scale , 2014, Proc. VLDB Endow..

[10]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[11]  Ding Yuan,et al.  Don't Get Caught in the Cold, Warm-up Your JVM: Understand and Eliminate JVM Warm-up Overhead in Data-Parallel Systems , 2016, OSDI.

[12]  Prashant Pandey,et al.  Cloud Analytics: Do We Really Need to Reinvent the Storage Stack? , 2009, HotCloud.

[13]  Raghunath Othayoth Nambiar,et al.  The making of TPC-DS , 2006, VLDB.

[14]  Andrew J. Hutton,et al.  Lustre: Building a File System for 1,000-node Clusters , 2003 .

[15]  Jignesh M. Patel,et al.  Column-Oriented Storage Techniques for MapReduce , 2011, Proc. VLDB Endow..

[16]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[17]  Carlo Curino,et al.  Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications , 2015, SIGMOD Conference.

[18]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[19]  Srikanth Kandula,et al.  PACMan: Coordinated Memory Caching for Parallel Jobs , 2012, NSDI.

[20]  John H. Hartman,et al.  The Zebra striped network file system , 1995, TOCS.

[21]  Chenyang Lu,et al.  Proceedings of the Fast 2002 Conference on File and Storage Technologies Aqueduct: Online Data Migration with Performance Guarantees , 2022 .

[22]  T. S. Eugene Ng,et al.  Pfimbi: Accelerating big data jobs through flow-controlled data replication , 2016, 2016 32nd Symposium on Mass Storage Systems and Technologies (MSST).

[23]  Dean Hildebrand,et al.  Panache: A Parallel File System Cache for Global File Access , 2010, FAST.

[24]  Abhishek Verma,et al.  Large-scale cluster management at Google with Borg , 2015, EuroSys.

[25]  Scott Shenker,et al.  Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks , 2014, SoCC.

[26]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[27]  T. S. Eugene Ng,et al.  Leaky Buffer: A Novel Abstraction for Relieving Memory Pressure from Cluster Data Processing Frameworks , 2017, IEEE Transactions on Parallel and Distributed Systems.

[28]  Antony I. T. Rowstron,et al.  Scale-up vs scale-out for Hadoop: time to rethink? , 2013, SoCC.

[29]  Jin-Soo Kim,et al.  HPMR: Prefetching and pre-shuffling in shared MapReduce computation environment , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[30]  Jon Howell,et al.  Flat Datacenter Storage , 2012, OSDI.

[31]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[32]  Scott Shenker,et al.  Making Sense of Performance in Data Analytics Frameworks , 2015, NSDI.

[33]  Yanpei Chen,et al.  Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads , 2012, Proc. VLDB Endow..

[34]  Robert B. Ross,et al.  On the duality of data-intensive file system design: Reconciling HDFS and PVFS , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[35]  Dhabaleswar K. Panda,et al.  Triple-H: A Hybrid Approach to Accelerate HDFS on HPC Clusters with Heterogeneous Storage Architecture , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.