On server-side file access pattern matching

In this paper, we propose a pattern matching approach for server-side access pattern detection for the HPC I/O stack. More specifically, our proposal concerns file-level accesses, such as the ones made to I/O libraries, I/O nodes, and the parallel file system servers. The goal of this detection is to allow the system to adapt to the current workload. Compared to existing detection techniques, ours differ by working at run-time and on the server side, where detailed application information is not available since HPC I/O systems are stateless, and without relying on previous traces. We build a time series to represent accesses spatiality, and use a pattern matching algorithm, in addition to an heuristic, to compare it to known patterns. We detail our proposal and evaluate it with two case studies – situations where detecting the current access pattern is important to select the best scheduling algorithm or to tune a fixed algorithm parameter. We show our approach has good detection capabilities, with precision of up to 93% and recall of up to 99%, and discuss all design choices.

[1]  Avishek Saha,et al.  Characterization and modeling of PIDX parallel I/O for performance optimization , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[2]  Rong Ge,et al.  SERA-IO: Integrating Energy Consciousness into Parallel I/O Middleware , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[3]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[4]  Xian-He Sun,et al.  A cost-intelligent application-specific data layout scheme for parallel file systems , 2011, HPDC '11.

[5]  Toni Cortes,et al.  Automatic I/O Scheduler Selection through Online Workload Analysis , 2012, 2012 9th International Conference on Ubiquitous Intelligence and Computing and 9th International Conference on Autonomic and Trusted Computing.

[6]  Francieli Zanon Boito,et al.  Automatic I/O scheduling algorithm selection for parallel file systems , 2016, Concurr. Comput. Pract. Exp..

[7]  Rajeev Thakur,et al.  Pattern-Direct and Layout-Aware Replication Scheme for Parallel I/O Systems , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[8]  Houjun Tang,et al.  Improving Read Performance with Online Access Pattern Analysis and Prefetching , 2014, Euro-Par.

[9]  Yang Liu,et al.  Automatic identification of application I/O signatures from noisy server-side traces , 2014, FAST.

[10]  Robert B. Ross,et al.  Omnisc'IO: A Grammar-Based Approach to Spatial and Temporal I/O Patterns Prediction , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  Yong Chen,et al.  Hierarchical I/O Scheduling for Collective I/O , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[12]  Robert Latham,et al.  Revealing applications' access pattern in collective I/O for cache management , 2014, ICS '14.

[13]  André Brinkmann,et al.  Improving Collective I/O Performance Using Non-volatile Memory Devices , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).

[14]  Xin Huang,et al.  A cost-aware region-level data placement scheme for hybrid parallel I/O systems , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[15]  Jean Luca Bez,et al.  A Checkpoint of Research on Parallel I/O for High-Performance Computing , 2018, ACM Comput. Surv..

[16]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[17]  Limin Xiao,et al.  A dynamic and adaptive load balancing strategy for parallel file system with large-scale I/O servers , 2012, J. Parallel Distributed Comput..

[18]  Jean Luca Bez,et al.  TWINS: Server Access Coordination in the I/O Forwarding Layer , 2017, 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP).

[19]  Song Jiang,et al.  IOrchestrator: Improving the Performance of Multi-node I/O Systems via Inter-Server Coordination , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[20]  Jean Luca Bez,et al.  Adaptive request scheduling for the I/O forwarding layer using reinforcement learning , 2020, Future Gener. Comput. Syst..

[21]  Emmanuel Jeannot,et al.  TAPIOCA: An I/O Library for Optimized Topology-Aware Data Aggregation on Large-Scale Supercomputers , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).

[22]  Samuel Lang,et al.  Server-side I/O coordination for parallel file systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[23]  Hai Jin,et al.  Iteration Based Collective I/O Strategy for Parallel I/O Systems , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.