Towards Physical Design Management in Storage Systems

In the post-Moore era, systems and devices with new architectures will arrive at a rapid rate with significant impacts on the software stack. Applications will not be able to fully benefit from new architectures unless they can delegate adapting to new devices in lower layers of the stack. In this paper we introduce physical design management which deals with the problem of identifying and executing transformations on physical designs of stored data, i.e. how data is mapped to storage abstractions like files, objects, or blocks, in order to improve performance. Physical design is traditionally placed with applications, access libraries, and databases, using hard-wired assumptions about underlying storage systems. Yet, storage systems increasingly not only contain multiple kinds of storage devices with vastly different performance profiles but also move data among those storage devices, thereby changing the benefit of a particular physical design. We advocate placing physical design management in storage, identify interesting research challenges, provide a brief description of a prototype implementation in Ceph, and discuss the results of initial experiments at scale that are replicable using Cloudlab. These experiments show performance and resource utilization trade-offs associated with choosing different physical designs and choosing to transform between physical designs.

[1]  Jeff LeFevre,et al.  Skyhook: Programmable Storage for Databases , 2019 .

[2]  Animesh Trivedi,et al.  Albis: High-Performance File Format for Big Data Systems , 2018, USENIX Annual Technical Conference.

[3]  Surajit Chaudhuri,et al.  An Online Approach to Physical Design Tuning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[4]  Bruce Hendrickson The Day After Tomorrow: The Looming Post-Exascale Crisis , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[5]  Andrea C. Arpaci-Dusseau,et al.  Database-aware semantically-smart storage , 2005, FAST'05.

[6]  S.A. Brandt,et al.  CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[7]  Kuang-Ching Wang,et al.  The Design and Operation of CloudLab , 2019, USENIX ATC.

[8]  Michael Stonebraker,et al.  One Size Fits All? Part 2: Benchmarking Studies , 2007, CIDR.

[9]  Carlos Maltzahn,et al.  Malacology: A Programmable Storage System , 2017, EuroSys.

[10]  William J. Schroeder,et al.  The Visualization Toolkit , 2005, The Visualization Handbook.

[11]  Kanishk Jain Object-based Storage , 2022 .

[12]  Sam Lightstone,et al.  DB2 Design Advisor: Integrated Automatic Physical Database Design , 2004, VLDB.

[13]  Manos Athanassoulis,et al.  Beyond the Wall: Near-Data Processing for Databases , 2015, DaMoN.

[14]  David J. DeWitt,et al.  Query processing on smart SSDs: opportunities and challenges , 2013, SIGMOD '13.

[15]  David Li,et al.  Design Continuums and the Path Toward Self-Designing Key-Value Stores that Know and Learn , 2019, CIDR.

[16]  Hakan Hacigümüs,et al.  MISO: souping up big data query processing with a multistore system , 2014, SIGMOD Conference.

[17]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[18]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.