AIOC2: A deep Q-learning approach to autonomic I/O congestion control in Lustre

[1]  Size Zheng,et al.  FlexTensor , 2020, Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems.

[2]  Ke Zhou,et al.  An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning , 2019, SIGMOD Conference.

[3]  Kirk W. Cameron,et al.  iez: Resource Contention Aware Load Balancing for Large-Scale Parallel File Systems , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[4]  Zenglin Xu,et al.  Superneurons: dynamic GPU memory management for training deep neural networks , 2018, PPoPP.

[5]  Scott Klasky,et al.  Analysis and Modeling of the End-to-End I/O Performance on OLCF's Titan Supercomputer , 2017, 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[6]  André Brinkmann,et al.  A Configurable Rule based Classful Token Bucket Filter Network Request Scheduler for the Lustre File System , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[7]  Yan Li,et al.  CAPES: Unsupervised Storage Performance Tuning Using Neural Network-Based Deep Reinforcement Learning , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Scott Klasky,et al.  Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[9]  Geoffrey J. Gordon,et al.  Automatic Database Management System Tuning Through Large-scale Machine Learning , 2017, SIGMOD Conference.

[10]  Feiyi Wang,et al.  Using Balanced Data Placement to Address I/O Contention in Production Environments , 2016, 2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[11]  Yang Wang,et al.  Boosting Parallel File System Performance via Heterogeneity-Aware Selective Data Layout , 2016, IEEE Transactions on Parallel and Distributed Systems.

[12]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[13]  Chris J. Maddison,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[14]  Darrell D. E. Long,et al.  ASCAR: Automating contention management for high-performance storage systems , 2015, 2015 31st Symposium on Mass Storage Systems and Technologies (MSST).

[15]  Franck Cappello,et al.  Scheduling the I/O of HPC Applications Under Congestion , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[16]  Galen M. Shipman,et al.  LADS: Optimizing Data Transfers Using Layout-Aware Data Scheduling , 2015, FAST.

[17]  Yang Liu,et al.  Automatic identification of application I/O signatures from noisy server-side traces , 2014, FAST.

[18]  BalakrishnanHari,et al.  TCP ex machina , 2013 .

[19]  Hari Balakrishnan,et al.  TCP ex machina: computer-generated congestion control , 2013, SIGCOMM.

[20]  Rouven Krebs,et al.  Metrics and techniques for quantifying performance isolation in cloud environments , 2012, QoSA '12.

[21]  Francieli Zanon Boito,et al.  The impact of applications' I/O strategies on the performance of the Lustre parallel file system , 2011, Int. J. High Perform. Syst. Archit..

[22]  Karsten Schwan,et al.  Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS) , 2008, CLADE '08.

[23]  Scott A. Brandt,et al.  Providing Quality of Service Support in Object-Based File System , 2007, 24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007).

[24]  Mark Handley,et al.  Equation-based congestion control for unicast applications , 2000, SIGCOMM.

[25]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[26]  Dries Naudts,et al.  A Q-Learning Scheme for Fair Coexistence Between LTE and Wi-Fi in Unlicensed Spectrum , 2018, IEEE Access.

[27]  Saurabh Gupta,et al.  Improving large-scale storage system performance via topology-aware and balanced data placement , 2014, 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS).