论文信息 - MiDas: Containerizing Data-Intensive Applications with I/O Specialization

MiDas: Containerizing Data-Intensive Applications with I/O Specialization

Scientific applications often depend on data produced from computational models. Model-generated data can be prohibitively large. Current mechanisms for sharing and distributing reproducible applications, such as containers, assume all model data is saved and included with a program to support its successful re-execution. However, including model data increases the sizes of containers. This increases the cost and time required for deployment and further reuse. We present a framework named MiDas ("Minimizing Datasets") for specializing I/O libraries which, given an application, automates the process of identifying and including only a subset of the data accessed by the program. To do this, MiDas combines static and dynamic analysis techniques to map high level user inputs to low level file offsets. We show several orders of magnitude reduction in data size via specialization of I/O libraries associated with model-based data-intensive applications, such as those operating on meteorological and geophysical data.

[1] Yoshihiko Futamura,et al. Partial Evaluation of Computation Process--An Approach to a Compiler-Compiler , 1999, High. Order Symb. Comput..

[2] Natarajan Shankar,et al. Automated software winnowing , 2015, SAC.

[3] Abhishek Verma,et al. Large-scale cluster management at Google with Borg , 2015, EuroSys.

[4] Douglas Thain,et al. An invariant framework for conducting reproducible computational science , 2015, J. Comput. Sci..

[5] The UniProt Consortium,et al. UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..

[6] Henning Makholm. Specializing c - an introduction to the principles behind c-mix/f1 , 1999 .

[7] Andrea C. Arpaci-Dusseau,et al. Slacker: Fast Distribution with Lazy Docker Containers , 2016, FAST.

[8] Hashim Sharif,et al. Trimmer: Application Specialization for Code Debloating , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[9] Hassen Saïdi,et al. Wholly!: A Build System For The Modern Software Stack , 2018, FMICS.

[10] Christopher Smowton,et al. I/O optimisation and elimination via partial evaluation , 2014 .

[11] Julia L. Lawall,et al. A tour of Tempo: a program specializer for the C language , 2004, Sci. Comput. Program..

[12] Peter Sestoft,et al. Partial evaluation and automatic program generation , 1993, Prentice Hall international series in computer science.