论文信息 - INSTalytics

INSTalytics

We present the design, implementation, and evaluation of Instalytics, a co-designed stack of a cluster file system and the compute layer, for efficient big data analytics in large-scale data centers. Instalytics amplifies the well-known benefits of data partitioning in analytics systems; instead of traditional partitioning on one dimension, Instalytics enables data to be simultaneously partitioned on four different dimensions at the same storage cost, enabling a larger fraction of queries to benefit from partition filtering and joins without network shuffle. To achieve this, Instalytics uses compute-awareness to customize the 3-way replication that the cluster file system employs for availability. A new heterogeneous replication layout enables Instalytics to preserve the same recovery cost and availability as traditional replication. Instalytics also uses compute-awareness to expose a new {\em sliced-read} API that improves performance of joins by enabling multiple compute nodes to read slices of a data block efficiently via co-ordinated request scheduling and selective caching at the storage nodes. We have implemented Instalytics in a production analytics stack, and show that recovery performance and availability is similar to physical replication, while providing significant improvements in query performance, suggesting a new approach to designing cloud-scale big-data analytics systems.

[1] Kannan Ramchandran,et al. A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers , 2014 .

[2] Michael J. Freedman,et al. Riffle: optimized shuffle service for large-scale data analytics , 2018, EuroSys.

[3] Jorge-Arnulfo Quiané-Ruiz,et al. Only Aggressive Elephants are Fast Elephants , 2012, Proc. VLDB Endow..

[4] Andrea C. Arpaci-Dusseau,et al. Semantically-Smart Disk Systems , 2003, FAST.

[5] Yuanyuan Tian,et al. CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop , 2011, Proc. VLDB Endow..

[6] Hairong Kuang,et al. The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[7] Vinay Setty,et al. Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing) , 2010, Proc. VLDB Endow..

[8] Andrey Gubarev,et al. Dremel : Interactive Analysis of Web-Scale Datasets , 2011 .

[9] Pete Wyckoff,et al. Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[10] Andrea C. Arpaci-Dusseau,et al. Information and control in gray-box systems , 2001, SOSP.

[11] GhemawatSanjay,et al. The Google file system , 2003 .

[12] Min Zhu,et al. B4: experience with a globally-deployed software defined wan , 2013, SIGCOMM.

[13] Jehoshua Bruck,et al. EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures , 1995, IEEE Trans. Computers.

[14] Wei Lin,et al. Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing , 2014, OSDI.

[15] Cheng Huang,et al. Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[16] Yuan Yu,et al. Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[17] Ittai Abraham,et al. Replex: A Scalable, Highly Available Multi-Index Data Store , 2016, USENIX Annual Technical Conference.

[18] Jingren Zhou,et al. Incorporating partitioning and parallel plans into the SCOPE optimizer , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[19] Samuel Madden,et al. A robust partitioning scheme for ad-hoc query workloads , 2017, SoCC.

[20] Nicolas Bruno,et al. SCOPE: parallel databases meet MapReduce , 2012, The VLDB Journal.

[21] Chris Douglas,et al. Azure Data Lake Store: A Hyperscale Distributed File Service for Big Data Analytics , 2017, SIGMOD Conference.

[22] David J. DeWitt,et al. Chained declustering: a new availability strategy for multiprocessor database machines , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[23] Joseph K. Bradley,et al. Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.

[24] Michael Stonebraker,et al. A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[25] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[26] Van-Anh Truong,et al. Availability in Globally Distributed Storage Systems , 2010, OSDI.

[27] Reynold Xin,et al. Apache Spark , 2016 .

[28] Christopher Olston,et al. Interactive Analysis of Web-Scale Data , 2009, CIDR.

[29] Liwen Sun,et al. A Partitioning Framework for Aggressive Data Skipping , 2014, Proc. VLDB Endow..

[30] Chandramohan A. Thekkath,et al. Petal: distributed virtual disks , 1996, ASPLOS VII.

[31] Willy Zwaenepoel,et al. Rock you like a hurricane: taming skew in large scale analytics , 2018, EuroSys.

[32] Carlo Curino,et al. Hydra: a federated resource manager for data-center scale analytics , 2019, NSDI.

[33] Yogen K. Dalal,et al. Pilot: an operating system for a personal computer , 1980, CACM.

[34] Chandramohan A. Thekkath,et al. Frangipani: a scalable distributed file system , 1997, SOSP.