Detecting I/O Access Patterns of HPC Workloads at Runtime

In this paper, we seek to guide optimization and tuning strategies by identifying the application's I/O access pattern. We evaluate three machine learning techniques to automatically detect the I/O access pattern of HPC applications at runtime: decision trees, random forests, and neural networks. We focus on the detection using metrics from file-level accesses as seen by the clients, I/O nodes, and parallel file system servers. We evaluated these detection strategies in a case study in which the accurate detection of the current access pattern is fundamental to adjust a parameter of an I/O scheduling algorithm. We demonstrate that such approaches correctly classify the access pattern, regarding file layout and spatiality of accesses – into the most common ones used by the community and by I/O benchmarking tools to test new I/O optimization – with up to 99% precision. Furthermore, when applied to our study case, it guides a tuning mechanism to achieve 99% of the performance of an Oracle solution.

[1]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[2]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[3]  Richard Hans Robert Hahnloser,et al.  Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit , 2000, Nature.

[4]  Richard A. Johnson,et al.  A new family of power transformations to improve normality or symmetry , 2000 .

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  Franck Cappello,et al.  Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed , 2006, Int. J. High Perform. Comput. Appl..

[7]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[8]  Achim Zeileis,et al.  BMC Bioinformatics BioMed Central Methodology article Conditional variable importance for random forests , 2008 .

[9]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[10]  Xian-He Sun,et al.  A cost-intelligent application-specific data layout scheme for parallel file systems , 2011, HPDC '11.

[11]  Rong Ge,et al.  SERA-IO: Integrating Energy Consciousness into Parallel I/O Middleware , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[12]  Avishek Saha,et al.  Characterization and modeling of PIDX parallel I/O for performance optimization , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[13]  Yong Chen,et al.  Hierarchical I/O Scheduling for Collective I/O , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[14]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[15]  Houjun Tang,et al.  Improving Read Performance with Online Access Pattern Analysis and Prefetching , 2014, Euro-Par.

[16]  Yang Liu,et al.  Automatic identification of application I/O signatures from noisy server-side traces , 2014, FAST.

[17]  Hai Jin,et al.  Iteration Based Collective I/O Strategy for Parallel I/O Systems , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[18]  Robert B. Ross,et al.  Omnisc'IO: A Grammar-Based Approach to Spatial and Temporal I/O Patterns Prediction , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[19]  Robert Latham,et al.  Revealing applications' access pattern in collective I/O for cache management , 2014, ICS '14.

[20]  Francieli Zanon Boito,et al.  Automatic I/O scheduling algorithm selection for parallel file systems , 2016, Concurr. Comput. Pract. Exp..

[21]  André Brinkmann,et al.  Improving Collective I/O Performance Using Non-volatile Memory Devices , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).

[22]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[23]  Emmanuel Jeannot,et al.  TAPIOCA: An I/O Library for Optimized Topology-Aware Data Aggregation on Large-Scale Supercomputers , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).

[24]  Jean Luca Bez,et al.  TWINS: Server Access Coordination in the I/O Forwarding Layer , 2017, 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP).

[25]  Jean Luca Bez,et al.  Evaluating I/O Scheduling Techniques at the Forwarding Layer and Coordinating Data Server Accesses , 2018, Anais do Concurso de Teses e Dissertações da SBC (CTD-SBC).