论文信息 - AIOC2: A deep Q-learning approach to autonomic I/O congestion control in Lustre

AIOC2: A deep Q-learning approach to autonomic I/O congestion control in Lustre

[1] Size Zheng,et al. FlexTensor , 2020, Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems.

[2] Ke Zhou,et al. An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning , 2019, SIGMOD Conference.

[3] Kirk W. Cameron,et al. iez: Resource Contention Aware Load Balancing for Large-Scale Parallel File Systems , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[4] Zenglin Xu,et al. Superneurons: dynamic GPU memory management for training deep neural networks , 2018, PPoPP.

[5] Scott Klasky,et al. Analysis and Modeling of the End-to-End I/O Performance on OLCF's Titan Supercomputer , 2017, 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[6] André Brinkmann,et al. A Configurable Rule based Classful Token Bucket Filter Network Request Scheduler for the Lustre File System , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[7] Yan Li,et al. CAPES: Unsupervised Storage Performance Tuning Using Neural Network-Based Deep Reinforcement Learning , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[8] Scott Klasky,et al. Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[9] Geoffrey J. Gordon,et al. Automatic Database Management System Tuning Through Large-scale Machine Learning , 2017, SIGMOD Conference.

[10] Feiyi Wang,et al. Using Balanced Data Placement to Address I/O Contention in Production Environments , 2016, 2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[11] Yang Wang,et al. Boosting Parallel File System Performance via Heterogeneity-Aware Selective Data Layout , 2016, IEEE Transactions on Parallel and Distributed Systems.

[12] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[13] Chris J. Maddison,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[14] Darrell D. E. Long,et al. ASCAR: Automating contention management for high-performance storage systems , 2015, 2015 31st Symposium on Mass Storage Systems and Technologies (MSST).

[15] Franck Cappello,et al. Scheduling the I/O of HPC Applications Under Congestion , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[16] Galen M. Shipman,et al. LADS: Optimizing Data Transfers Using Layout-Aware Data Scheduling , 2015, FAST.

[17] Yang Liu,et al. Automatic identification of application I/O signatures from noisy server-side traces , 2014, FAST.

[18] BalakrishnanHari,et al. TCP ex machina , 2013 .

[19] Hari Balakrishnan,et al. TCP ex machina: computer-generated congestion control , 2013, SIGCOMM.

[20] Rouven Krebs,et al. Metrics and techniques for quantifying performance isolation in cloud environments , 2012, QoSA '12.

[21] Francieli Zanon Boito,et al. The impact of applications' I/O strategies on the performance of the Lustre parallel file system , 2011, Int. J. High Perform. Syst. Archit..

[22] Karsten Schwan,et al. Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS) , 2008, CLADE '08.

[23] Scott A. Brandt,et al. Providing Quality of Service Support in Object-Based File System , 2007, 24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007).

[24] Mark Handley,et al. Equation-based congestion control for unicast applications , 2000, SIGCOMM.

[25] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[26] Dries Naudts,et al. A Q-Learning Scheme for Fair Coexistence Between LTE and Wi-Fi in Unlicensed Spectrum , 2018, IEEE Access.

[27] Saurabh Gupta,et al. Improving large-scale storage system performance via topology-aware and balanced data placement , 2014, 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS).