Fast DRL-based scheduler configuration tuning for reducing tail latency in edge-cloud jobs

[1]  Lijie Wen,et al.  MespaConfig: Memory-Sparing Configuration Auto-Tuning for Co-Located In-Memory Cluster Computing Jobs , 2022, IEEE Transactions on Services Computing.

[2]  C. Carrión Kubernetes Scheduling: Taxonomy, Ongoing Issues and Challenges , 2022, ACM Comput. Surv..

[3]  Guoren Wang,et al.  EdgeTuner: Fast Scheduling Algorithm Tuning for Dynamic Edge-Cloud Workloads and Resources , 2022, IEEE INFOCOM 2022 - IEEE Conference on Computer Communications.

[4]  Liang Feng,et al.  A Cooperative Coevolution Hyper-Heuristic Framework for Workflow Scheduling Problem , 2022, IEEE Transactions on Services Computing.

[5]  Guoren Wang,et al.  LegoDNN: block-grained scaling of deep neural networks for mobile vision , 2021, MobiCom.

[6]  2021 5th International Conference on Cloud and Big Data Computing (ICCBDC) , 2021 .

[7]  Ali Cakmak,et al.  Benchmarking Apache Spark and Hadoop MapReduce on Big Data Classification , 2021, ICCBDC.

[8]  Mehdi Dehghan,et al.  Big data classification using heterogeneous ensemble classifiers in Apache Spark based on MapReduce paradigm , 2021, Expert Syst. Appl..

[9]  Yi Mei,et al.  A Cooperative Coevolution Genetic Programming Hyper-Heuristics Approach for On-Line Resource Allocation in Container-Based Clouds , 2020, IEEE Transactions on Cloud Computing.

[10]  Rui Han,et al.  Accelerating Deep Learning Systems via Critical Set Identification and Model Compression , 2020, IEEE Transactions on Computers.

[11]  Xin Zhou,et al.  Efficient Compute-Intensive Job Allocation in Data Centers via Deep Reinforcement Learning , 2020, IEEE Transactions on Parallel and Distributed Systems.

[12]  Elisa Bertino,et al.  Privacy-preserving Real-time Anomaly Detection Using Edge Computing , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[13]  Tarun Kulshrestha,et al.  Real-Time Crowd Monitoring Using Seamless Indoor-Outdoor Localization , 2020, IEEE Transactions on Mobile Computing.

[14]  Shan Zhang,et al.  Cooperative Service Caching and Workload Scheduling in Mobile Edge Computing , 2020, IEEE INFOCOM 2020 - IEEE Conference on Computer Communications.

[15]  Shusen Yang,et al.  SurveilEdge: Real-time Video Query based on Collaborative Cloud-Edge Deep Learning , 2020, IEEE INFOCOM 2020 - IEEE Conference on Computer Communications.

[16]  Gaochao Xu,et al.  Explore Deep Neural Network and Reinforcement Learning to Large-scale Tasks Processing in Big Data , 2019, Int. J. Pattern Recognit. Artif. Intell..

[17]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[18]  Jianfeng Zhan,et al.  Workload-Adaptive Configuration Tuning for Hierarchical Cloud Schedulers , 2019, IEEE Transactions on Parallel and Distributed Systems.

[19]  Yang Zhang,et al.  Fog-enabled Event Processing Based on IoT Resource Models , 2019, IEEE Transactions on Knowledge and Data Engineering.

[20]  F. Moutarde,et al.  Is Deep Reinforcement Learning Really Superhuman on Atari? Leveling the playing field , 2019, 1908.04683.

[21]  Gaith Rjoub,et al.  Deep Smart Scheduling: A Deep Learning Approach for Automated Big Data Scheduling Over the Cloud , 2019, 2019 7th International Conference on Future Internet of Things and Cloud (FiCloud).

[22]  Chuan Wu,et al.  Learning Resource Allocation and Pricing for Cloud Profit Maximization , 2019, AAAI.

[23]  Lyes Khoukhi,et al.  5G-Slicing-Enabled Scalable SDN Core Network: Toward an Ultra-Low Latency of Autonomous Driving Service , 2019, IEEE Journal on Selected Areas in Communications.

[24]  Anh-Cang Phan,et al.  Face Recognition Using Gabor Wavelet in MapReduce and Spark , 2019, WCGO.

[25]  Xin Zhou,et al.  Toward Efficient Compute-Intensive Job Allocation for Green Data Centers: A Deep Reinforcement Learning Approach , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[26]  Zhiming Hu,et al.  Spear: Optimized Dependency-Aware Task Scheduling with Deep Reinforcement Learning , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[27]  Xin Zhou,et al.  DeepEE: Joint Optimization of Job Scheduling and Cooling Control for Data Center Energy Efficiency Using Deep Reinforcement Learning , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[28]  Shanhe Yi,et al.  Nomad: An Efficient Consensus Approach for Latency-Sensitive Edge-Cloud Applications , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[29]  Hao Wang,et al.  Distributed Machine Learning with a Serverless Architecture , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[30]  Chuan Wu,et al.  Deep Learning-based Job Placement in Distributed Machine Learning Clusters , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[31]  Chang Zhou,et al.  AliGraph: A Comprehensive Graph Neural Network Platform , 2019, Proc. VLDB Endow..

[32]  Anh-Cang Phan,et al.  Fingerprint Recognition using Gabor wavelet in MapReduce and Spark , 2018, SoICT.

[33]  Andreas Gerstlauer,et al.  DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[34]  Hongzi Mao,et al.  Learning scheduling algorithms for data processing clusters , 2018, SIGCOMM.

[35]  Quan Zhang,et al.  Firework: Data Processing and Sharing for Hybrid Cloud-Edge Analytics , 2018, IEEE Transactions on Parallel and Distributed Systems.

[36]  Philip S. Yu,et al.  Not Just Privacy: Improving Performance of Private Deep Learning in Mobile Cloud , 2018, KDD.

[37]  Rémi Munos,et al.  Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.

[38]  Lizy Kurian John,et al.  Benchmarking Big Data Systems: A Review , 2018, IEEE Transactions on Services Computing.

[39]  Gregory R. Ganger,et al.  3Sigma: distribution-based cluster scheduling for runtime uncertainty , 2018, EuroSys.

[40]  Peter R. Pietzuch,et al.  Medea: scheduling of long running applications in shared production clusters , 2018, EuroSys.

[41]  Chuan Wu,et al.  Optimus: an efficient dynamic resource scheduler for deep learning clusters , 2018, EuroSys.

[42]  José Merseguer,et al.  Towards the Performance Analysis of Apache Tez Applications , 2018, ICPE Companion.

[43]  Shiyan Hu,et al.  Combating Coordinated Pricing Cyberattack and Energy Theft in Smart Home Cyber-Physical Systems , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[44]  David Budden,et al.  Distributed Prioritized Experience Replay , 2018, ICLR.

[45]  Zhiyuan Xu,et al.  Model-free Control for Distributed Stream Data Processing using Deep Reinforcement Learning , 2018, Proc. VLDB Endow..

[46]  Ji Li,et al.  DRL-cloud: Deep reinforcement learning-based resource provisioning and task scheduling for cloud service providers , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[47]  Zongpeng Li,et al.  Online Job Scheduling in Distributed Machine Learning Clusters , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[48]  Ion Stoica,et al.  Ray RLLib: A Composable and Scalable Reinforcement Learning Library , 2017, NIPS 2017.

[49]  Ole J. Mengshoel,et al.  QoS-Aware Scheduling of Heterogeneous Servers for Inference in Deep Neural Networks , 2017, CIKM.

[50]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[51]  Michael J. Freedman,et al.  SLAQ: quality-driven scheduling for distributed machine learning , 2017, SoCC.

[52]  Rui Han,et al.  CLAP: Component-Level Approximate Processing for Low Tail Latency and High Result Accuracy in Cloud Online Services , 2017, IEEE Transactions on Parallel and Distributed Systems.

[53]  Kritwara Rattanaopas,et al.  A performance comparison of Apache Tez and MapReduce with data compression on Hadoop cluster , 2017, 2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE).

[54]  Marco Gribaudo,et al.  Fluid Petri Nets for the Performance Evaluation of MapReduce and Spark Applications , 2017, PERV.

[55]  Qinru Qiu,et al.  A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[56]  Jose Miguel Puerta,et al.  Learning distributed discrete Bayesian Network Classifiers under MapReduce with Apache Spark , 2017, Knowl. Based Syst..

[57]  Srikanth Kandula,et al.  Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[58]  Weisong Shi,et al.  Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[59]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[60]  L. Arockiam,et al.  Service Level Agreement in cloud computing: An overview , 2015, 2015 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT).

[61]  Anja Feldmann,et al.  C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection , 2015, NSDI.

[62]  Shaolei Ren,et al.  Optimal Aggregation Policy for Reducing Tail Latency of Web Search , 2015, SIGIR.

[63]  Abhishek Verma,et al.  Large-scale cluster management at Google with Borg , 2015, EuroSys.

[64]  Chu-Sing Yang,et al.  A Hyper-Heuristic Scheduling Algorithm for Cloud , 2014, IEEE Transactions on Cloud Computing.

[65]  Moustafa Ghanem,et al.  Future Generation Computer Systems ( ) – Future Generation Computer Systems Enabling Cost-aware and Adaptive Elasticity of Multi-tier Cloud Applications , 2022 .

[66]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[67]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[68]  Srikanth Kandula,et al.  Speeding up distributed request-response workflows , 2013, SIGCOMM.

[69]  David A. Bader,et al.  Energy-Efficient Scheduling for Best-Effort Interactive Services to Achieve High Response Quality , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[70]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[71]  James R. Larus,et al.  Zeta: scheduling interactive services with partial execution , 2012, SoCC '12.

[72]  Moustafa Ghanem,et al.  Lightweight Resource Scaling for Cloud Applications , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[73]  Sameh Elnikety,et al.  Tians Scheduling: Using Partial Processing in Best-Effort Applications , 2011, 2011 31st International Conference on Distributed Computing Systems.

[74]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[75]  Benjamin Hindman,et al.  Dominant Resource Fairness: Fair Allocation of Multiple Resource Types , 2011, NSDI.

[76]  Dror G. Feitelson,et al.  Paired Gang Scheduling , 2003, IEEE Trans. Parallel Distributed Syst..

[77]  Shijun Liu,et al.  DRL-Scheduling: An Intelligent QoS-Aware Job Scheduling Framework for Applications in Clouds , 2018, IEEE Access.

[78]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[79]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..