I/O load balancing for big data HPC applications

High Performance Computing (HPC) big data problems require efficient distributed storage systems. However, at scale, such storage systems often experience load imbalance and resource contention due to two factors: the bursty nature of scientific application I/O; and the complex I/O path that is without centralized arbitration and control. For example, the extant Lustre parallel file system-that supports many HPC centers-comprises numerous components connected via custom network topologies, and serves varying demands of a large number of users and applications. Consequently, some storage servers can be more loaded than others, which creates bottlenecks and reduces overall application I/O performance. Existing solutions typically focus on per application load balancing, and thus are not as effective given their lack of a global view of the system. In this paper, we propose a data-driven approach to load balance the I/O servers at scale, targeted at Lustre deployments. To this end, we design a global mapper on Lustre Metadata Server, which gathers runtime statistics from key storage components on the I/O path, and applies Markov chain modeling and a minimum-cost maximum-flow algorithm to decide where data should be placed. Evaluation using a realistic system simulator and a real setup shows that our approach yields better load balancing, which in turn can improve end-to-end performance.

[1]  Ramesh R. Sarukkai,et al.  Link prediction and path analysis using Markov chains , 2000, Comput. Networks.

[2]  Aameek Singh,et al.  Server-storage virtualization: Integration and load balancing in data centers , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[3]  Bibhudatta Sahoo,et al.  Dynamic Virtual Machine Placement in Cloud Computing , 2015 .

[4]  Limin Xiao,et al.  A dynamic and adaptive load balancing strategy for parallel file system with large-scale I/O servers , 2012, J. Parallel Distributed Comput..

[5]  Archana Sathaye,et al.  Load balancing distributed file system servers: a rule-based approach , 2003 .

[6]  Robert B. Ross,et al.  FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[7]  S.A. Brandt,et al.  CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[8]  Vladimir Kolmogorov,et al.  An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Ian T. Foster,et al.  Toward scalable monitoring on large-scale storage for software defined cyberinfrastructure , 2017, PDSW-DISCS@SC.

[10]  Eric Barton,et al.  A Novel network request scheduler for a large scale storage system , 2009, Computer Science - Research and Development.

[11]  Bo Hong,et al.  File System Workload Analysis For Large Scientific Computing Applications , 2004, MSST.

[12]  Lustre : A Scalable , High-Performance File System Cluster , 2003 .

[13]  Min Li,et al.  CHOPPER: Optimizing Data Partitioning for In-memory Data Analytics Frameworks , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).

[14]  Saurabh Gupta,et al.  Improving large-scale storage system performance via topology-aware and balanced data placement , 2014, 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS).

[15]  Oscar R. Hernandez,et al.  Titan : Early experience with the Cray XK 6 at Oak Ridge National Laboratory , 2012 .

[16]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[17]  Engin Ipek,et al.  Dynamic Multicore Resource Management: A Machine Learning Approach , 2009, IEEE Micro.

[18]  Moshe Tennenholtz,et al.  Adaptive Load Balancing: A Study in Multi-Agent Learning , 1994, J. Artif. Intell. Res..

[19]  D. R. Fulkerson,et al.  A Simple Algorithm for Finding Maximal Network Flows and an Application to the Hitchcock Problem , 1957, Canadian Journal of Mathematics.

[20]  Robert N. M. Watson,et al.  Firmament: Fast, Centralized Cluster Scheduling at Scale , 2016, OSDI.

[21]  Eric B. Boyer,et al.  GlusterFS One Storage Server to Rule Them All , 2012 .

[22]  Sudarshan S. Deshmukh,et al.  Improved load balancing for distributed file system using self acting and adaptive loading data migration process , 2015, 2015 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions).

[23]  Buddy Bland,et al.  Titan - Early experience with the Titan system at Oak Ridge National Laboratory , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.