论文信息 - Fast DRL-based scheduler configuration tuning for reducing tail latency in edge-cloud jobs - 字舞流文

Fast DRL-based scheduler configuration tuning for reducing tail latency in edge-cloud jobs

L. Chen | C. Liu | Rui Han | Shilin Wen

[1] Lijie Wen,et al. MespaConfig: Memory-Sparing Configuration Auto-Tuning for Co-Located In-Memory Cluster Computing Jobs , 2022, IEEE Transactions on Services Computing.

[2] C. Carrión. Kubernetes Scheduling: Taxonomy, Ongoing Issues and Challenges , 2022, ACM Comput. Surv..

[3] Guoren Wang,et al. EdgeTuner: Fast Scheduling Algorithm Tuning for Dynamic Edge-Cloud Workloads and Resources , 2022, IEEE INFOCOM 2022 - IEEE Conference on Computer Communications.

[4] Liang Feng,et al. A Cooperative Coevolution Hyper-Heuristic Framework for Workflow Scheduling Problem , 2022, IEEE Transactions on Services Computing.

[5] Guoren Wang,et al. LegoDNN: block-grained scaling of deep neural networks for mobile vision , 2021, MobiCom.

[6] 2021 5th International Conference on Cloud and Big Data Computing (ICCBDC) , 2021 .

[7] Ali Cakmak,et al. Benchmarking Apache Spark and Hadoop MapReduce on Big Data Classification , 2021, ICCBDC.

[8] Mehdi Dehghan,et al. Big data classification using heterogeneous ensemble classifiers in Apache Spark based on MapReduce paradigm , 2021, Expert Syst. Appl..

[9] Yi Mei,et al. A Cooperative Coevolution Genetic Programming Hyper-Heuristics Approach for On-Line Resource Allocation in Container-Based Clouds , 2020, IEEE Transactions on Cloud Computing.

[10] Rui Han,et al. Accelerating Deep Learning Systems via Critical Set Identification and Model Compression , 2020, IEEE Transactions on Computers.

[11] Xin Zhou,et al. Efficient Compute-Intensive Job Allocation in Data Centers via Deep Reinforcement Learning , 2020, IEEE Transactions on Parallel and Distributed Systems.

[12] Elisa Bertino,et al. Privacy-preserving Real-time Anomaly Detection Using Edge Computing , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[13] Tarun Kulshrestha,et al. Real-Time Crowd Monitoring Using Seamless Indoor-Outdoor Localization , 2020, IEEE Transactions on Mobile Computing.

[14] Shan Zhang,et al. Cooperative Service Caching and Workload Scheduling in Mobile Edge Computing , 2020, IEEE INFOCOM 2020 - IEEE Conference on Computer Communications.

[15] Shusen Yang,et al. SurveilEdge: Real-time Video Query based on Collaborative Cloud-Edge Deep Learning , 2020, IEEE INFOCOM 2020 - IEEE Conference on Computer Communications.

[16] Gaochao Xu,et al. Explore Deep Neural Network and Reinforcement Learning to Large-scale Tasks Processing in Big Data , 2019, Int. J. Pattern Recognit. Artif. Intell..

[17] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[18] Jianfeng Zhan,et al. Workload-Adaptive Configuration Tuning for Hierarchical Cloud Schedulers , 2019, IEEE Transactions on Parallel and Distributed Systems.

[19] Yang Zhang,et al. Fog-enabled Event Processing Based on IoT Resource Models , 2019, IEEE Transactions on Knowledge and Data Engineering.

[20] F. Moutarde,et al. Is Deep Reinforcement Learning Really Superhuman on Atari? Leveling the playing field , 2019, 1908.04683.

[21] Gaith Rjoub,et al. Deep Smart Scheduling: A Deep Learning Approach for Automated Big Data Scheduling Over the Cloud , 2019, 2019 7th International Conference on Future Internet of Things and Cloud (FiCloud).

[22] Chuan Wu,et al. Learning Resource Allocation and Pricing for Cloud Profit Maximization , 2019, AAAI.

[23] Lyes Khoukhi,et al. 5G-Slicing-Enabled Scalable SDN Core Network: Toward an Ultra-Low Latency of Autonomous Driving Service , 2019, IEEE Journal on Selected Areas in Communications.

[24] Anh-Cang Phan,et al. Face Recognition Using Gabor Wavelet in MapReduce and Spark , 2019, WCGO.

[25] Xin Zhou,et al. Toward Efficient Compute-Intensive Job Allocation for Green Data Centers: A Deep Reinforcement Learning Approach , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[26] Zhiming Hu,et al. Spear: Optimized Dependency-Aware Task Scheduling with Deep Reinforcement Learning , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[27] Xin Zhou,et al. DeepEE: Joint Optimization of Job Scheduling and Cooling Control for Data Center Energy Efficiency Using Deep Reinforcement Learning , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[28] Shanhe Yi,et al. Nomad: An Efficient Consensus Approach for Latency-Sensitive Edge-Cloud Applications , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[29] Hao Wang,et al. Distributed Machine Learning with a Serverless Architecture , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[30] Chuan Wu,et al. Deep Learning-based Job Placement in Distributed Machine Learning Clusters , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[31] Chang Zhou,et al. AliGraph: A Comprehensive Graph Neural Network Platform , 2019, Proc. VLDB Endow..

[32] Anh-Cang Phan,et al. Fingerprint Recognition using Gabor wavelet in MapReduce and Spark , 2018, SoICT.

[33] Andreas Gerstlauer,et al. DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[34] Hongzi Mao,et al. Learning scheduling algorithms for data processing clusters , 2018, SIGCOMM.

[35] Quan Zhang,et al. Firework: Data Processing and Sharing for Hybrid Cloud-Edge Analytics , 2018, IEEE Transactions on Parallel and Distributed Systems.

[36] Philip S. Yu,et al. Not Just Privacy: Improving Performance of Private Deep Learning in Mobile Cloud , 2018, KDD.

[37] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.

[38] Lizy Kurian John,et al. Benchmarking Big Data Systems: A Review , 2018, IEEE Transactions on Services Computing.

[39] Gregory R. Ganger,et al. 3Sigma: distribution-based cluster scheduling for runtime uncertainty , 2018, EuroSys.

[40] Peter R. Pietzuch,et al. Medea: scheduling of long running applications in shared production clusters , 2018, EuroSys.

[41] Chuan Wu,et al. Optimus: an efficient dynamic resource scheduler for deep learning clusters , 2018, EuroSys.

[42] José Merseguer,et al. Towards the Performance Analysis of Apache Tez Applications , 2018, ICPE Companion.

[43] Shiyan Hu,et al. Combating Coordinated Pricing Cyberattack and Energy Theft in Smart Home Cyber-Physical Systems , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[44] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.

[45] Zhiyuan Xu,et al. Model-free Control for Distributed Stream Data Processing using Deep Reinforcement Learning , 2018, Proc. VLDB Endow..

[46] Ji Li,et al. DRL-cloud: Deep reinforcement learning-based resource provisioning and task scheduling for cloud service providers , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[47] Zongpeng Li,et al. Online Job Scheduling in Distributed Machine Learning Clusters , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[48] Ion Stoica,et al. Ray RLLib: A Composable and Scalable Reinforcement Learning Library , 2017, NIPS 2017.

[49] Ole J. Mengshoel,et al. QoS-Aware Scheduling of Heterogeneous Servers for Inference in Deep Neural Networks , 2017, CIKM.

[50] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[51] Michael J. Freedman,et al. SLAQ: quality-driven scheduling for distributed machine learning , 2017, SoCC.

[52] Rui Han,et al. CLAP: Component-Level Approximate Processing for Low Tail Latency and High Result Accuracy in Cloud Online Services , 2017, IEEE Transactions on Parallel and Distributed Systems.

[53] Kritwara Rattanaopas,et al. A performance comparison of Apache Tez and MapReduce with data compression on Hadoop cluster , 2017, 2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE).

[54] Marco Gribaudo,et al. Fluid Petri Nets for the Performance Evaluation of MapReduce and Spark Applications , 2017, PERV.

[55] Qinru Qiu,et al. A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[56] Jose Miguel Puerta,et al. Learning distributed discrete Bayesian Network Classifiers under MapReduce with Apache Spark , 2017, Knowl. Based Syst..

[57] Srikanth Kandula,et al. Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[58] Weisong Shi,et al. Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[59] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[60] L. Arockiam,et al. Service Level Agreement in cloud computing: An overview , 2015, 2015 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT).

[61] Anja Feldmann,et al. C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection , 2015, NSDI.

[62] Shaolei Ren,et al. Optimal Aggregation Policy for Reducing Tail Latency of Web Search , 2015, SIGIR.

[63] Abhishek Verma,et al. Large-scale cluster management at Google with Borg , 2015, EuroSys.

[64] Chu-Sing Yang,et al. A Hyper-Heuristic Scheduling Algorithm for Cloud , 2014, IEEE Transactions on Cloud Computing.

[65] Moustafa Ghanem,et al. Future Generation Computer Systems ( ) – Future Generation Computer Systems Enabling Cost-aware and Adaptive Elasticity of Multi-tier Cloud Applications , 2022 .

[66] Dirk Merkel,et al. Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[67] Carlo Curino,et al. Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[68] Srikanth Kandula,et al. Speeding up distributed request-response workflows , 2013, SIGCOMM.

[69] David A. Bader,et al. Energy-Efficient Scheduling for Best-Effort Interactive Services to Achieve High Response Quality , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[70] Luiz André Barroso,et al. The tail at scale , 2013, CACM.

[71] James R. Larus,et al. Zeta: scheduling interactive services with partial execution , 2012, SoCC '12.

[72] Moustafa Ghanem,et al. Lightweight Resource Scaling for Cloud Applications , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[73] Sameh Elnikety,et al. Tians Scheduling: Using Partial Processing in Best-Effort Applications , 2011, 2011 31st International Conference on Distributed Computing Systems.

[74] Randy H. Katz,et al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[75] Benjamin Hindman,et al. Dominant Resource Fairness: Fair Allocation of Multiple Resource Types , 2011, NSDI.

[76] Dror G. Feitelson,et al. Paired Gang Scheduling , 2003, IEEE Trans. Parallel Distributed Syst..

[77] Shijun Liu,et al. DRL-Scheduling: An Intelligent QoS-Aware Job Scheduling Framework for Applications in Clouds , 2018, IEEE Access.

[78] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[79] Hilde van der Togt,et al. Publisher's Note , 2003, J. Netw. Comput. Appl..