Column Cache: Buffer Cache for Columnar Storage on HDFS

Columnar storage is a data source for data analytics in distributed computing frameworks. For portability and scalability, columnar storage is built on top of existing distributed file systems with columnar data representations such as Parquet, RCFile, and ORC. However, these representations fail to utilize high-level information (e.g., columnar formats) for low-level disk buffer management in operating systems. As a result, data analytics workloads suffer from redundant memory buffers with expensive garbage collections, unnecessary disk readahead, and cache pollution in the operating system buffer cache.We propose column cache, which unifies and re-structures the buffers and caches of multiple software layers from columnar storage to operating systems. Column cache leverages high-level information such as file formats and query plans for enabling adaptive disk reads and cache eviction policies. We have developed a column cache prototype for Apache Parquet and observed that our prototype reduced redundant resource utilization in Apache Spark. Specifically, with our prototype, Spark showed a maximum speedup of 1.28x in TPC-DS workloads while increasing Linux page cache size by 18%, reducing total disk reads by 43%, and reducing garbage collection time in a Java virtual machine by 76%.

[1]  Michael Stonebraker,et al.  Operating system support for database management , 1981, CACM.

[2]  Scott Shenker,et al.  Making Sense of Performance in Data Analytics Frameworks , 2015, NSDI.

[3]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[4]  Sang Lyul Min,et al.  On the existence of a spectrum of policies that subsumes the least recently used (LRU) and least frequently used (LFU) policies , 1999, SIGMETRICS '99.

[5]  Anna R. Karlin,et al.  Implementation and performance of integrated application-controlled file caching, prefetching, and disk scheduling , 1996, TOCS.

[6]  Andrea C. Arpaci-Dusseau,et al.  Transforming policies into mechanisms with infokernel , 2003, SOSP '03.

[7]  Srikanth Kandula,et al.  PACMan: Coordinated Memory Caching for Parallel Jobs , 2012, NSDI.

[8]  Andrea C. Arpaci-Dusseau,et al.  Analysis of HDFS under HBase: a facebook messages case study , 2014, FAST.

[9]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[10]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[11]  Michael Isard,et al.  Broom: Sweeping Out Garbage Collection from Big Data Systems , 2015, HotOS.

[12]  Zhiwei Xu,et al.  RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[13]  Tilmann Rabl,et al.  Analysis of TPC-DS: the first standard benchmark for SQL-based big data systems , 2017, SoCC.

[14]  Steve Vandebogart,et al.  Reducing Seek Overhead with Application-Directed Prefetching , 2009, USENIX Annual Technical Conference.

[15]  Jialin Li,et al.  Towards High-Performance Application-Level Storage Management , 2014, HotStorage.

[16]  Timothy Roscoe,et al.  Arrakis , 2014, OSDI.

[17]  David Plonka,et al.  Application Buffer-Cache Management for Performance: Running the World's Largest MRTG , 2007, LISA.