This paper is included in the Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation.

We consider the problem of fair resource allocation in a system where user demands are dynamic, that is, where user demands vary over time. Our key observation is that the classical max-min fairness algorithm for resource allocation provides many desirable properties ( e.g. , Pareto efficiency, strategy-proofness, and fairness), but only under the strong assumption of user demands being static over time. For the realistic case of dynamic user demands, the max-min fairness algorithm loses one or more of these properties. We present Karma, a new resource allocation mechanism for dynamic user demands. The key technical contribution in Karma is a credit-based resource allocation algorithm: in each quantum, users donate their unused resources and are assigned credits when other users borrow these resources; Karma carefully orchestrates the exchange of credits across users (based on their instantaneous demands, donated resources and borrowed resources), and performs prioritized resource allocation based on users’ credits. We theoretically establish Karma guarantees related to Pareto efficiency, strategy-proofness, and fairness for dynamic user demands. Empirical evaluations over production workloads show that these properties translate well into practice: Karma is able to reduce disparity in performance across users to a bare minimum while maintaining Pareto-optimal system-wide performance.

[1]  Asaf Cidon,et al.  Karma: Resource Allocation for Dynamic Demands , 2023, OSDI.

[2]  C. Kozyrakis,et al.  Towards μs tail latency and terabit ethernet: disaggregating the host network stack , 2022, SIGCOMM.

[3]  I. Stoica,et al.  Jiffy: elastic far-memory for stateful serverless analytics , 2022, EuroSys.

[4]  Stephen P. Boyd,et al.  Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP , 2021, SOSP.

[5]  Roxana Geambasu,et al.  Privacy Budget Scheduling , 2021, OSDI.

[6]  Yongfeng Zhang,et al.  Towards Long-term Fairness in Recommendation , 2021, WSDM.

[7]  Behnaz Arzani,et al.  Contracting Wide-area Network Topologies to Solve Flow Problems Quickly , 2020, NSDI.

[8]  Amar Phanishayee,et al.  Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads , 2020, OSDI.

[9]  Miguel Elias M. Campista,et al.  Stateful DRF: Considering the Past in a Multi-Resource Allocation , 2020, IEEE Transactions on Computers.

[10]  Nipun Kwatra,et al.  Balancing efficiency and fairness in heterogeneous GPU clusters for deep learning , 2020, EuroSys.

[11]  Ricardo Bianchini,et al.  Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider , 2020, USENIX Annual Technical Conference.

[12]  Alexander D'Amour,et al.  Fairness is not static: deeper understanding of long term fairness via simulation studies , 2020, FAT*.

[13]  Toby Walsh,et al.  Strategy-Proofness, Envy-Freeness and Pareto Efficiency in Online Fair Division with Additive Utilities , 2019, PRICAI.

[14]  David Zeng,et al.  Fairness-Efficiency Tradeoffs in Dynamic Fair Division , 2019, EC.

[15]  S. Venkataraman,et al.  Themis: Fair and Efficient GPU Cluster Scheduling , 2019, NSDI.

[16]  Ridi Hossain,et al.  Sharing is Caring: Dynamic Mechanism for Shared Resource Ownership , 2019, AAMAS.

[17]  Ali Anwar,et al.  Analyzing Alibaba’s Co-located Datacenter Workloads , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[18]  Vincent Conitzer,et al.  Dynamic Proportional Sharing: A Game-Theoretic Approach , 2018, SIGMETRICS.

[19]  Chuan Wu,et al.  Optimus: an efficient dynamic resource scheduler for deep learning clusters , 2018, EuroSys.

[20]  Kejiang Ye,et al.  Imbalance in the cloud: An analysis on Alibaba cluster trace , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[21]  Richard Wolski,et al.  Probabilistic Guarantees of Execution Duration for Amazon Spot Instances , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[22]  Ryan Stutsman,et al.  Memshare: a Dynamic Multi-tenant Key-value Cache , 2017, USENIX Annual Technical Conference.

[23]  Aditya Akella,et al.  Altruistic Scheduling in Multi-Resource Clusters , 2016, OSDI.

[24]  Srikanth Kandula,et al.  This Paper Is Included in the Proceedings of the 12th Usenix Symposium on Operating Systems Design and Implementation (osdi '16). Graphene: Packing and Dependency-aware Scheduling for Data-parallel Clusters G: Packing and Dependency-aware Scheduling for Data-parallel Clusters , 2022 .

[25]  Orna Agmon Ben-Yehuda,et al.  Ginseng: Market-Driven LLC Allocation , 2016, USENIX Annual Technical Conference.

[26]  Liang Zheng,et al.  On the Viability of a Cloud Virtual Service Provider , 2016, SIGMETRICS.

[27]  Sachin Katti,et al.  Cliffhanger: Scaling Performance Cliffs in Web Memory Caches , 2016, NSDI.

[28]  Zhenhua Liu,et al.  HUG: Multi-Resource Fairness for Correlated and Elastic Demands , 2016, NSDI.

[29]  Ali Ghodsi,et al.  FairRide: Near-Optimal, Fair Cache Sharing , 2016, NSDI.

[30]  Liang Zheng,et al.  How to Bid the Cloud , 2015, Comput. Commun. Rev..

[31]  Abhishek Verma,et al.  Large-scale cluster management at Google with Borg , 2015, EuroSys.

[32]  Xiaodong Wang,et al.  XChange: A market-based approach to scalable dynamic multi-resource allocation in multicore architectures , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[33]  Bingsheng He,et al.  Reciprocal Resource Fairness: Towards Cooperative Multiple-Resource Fair Sharing in IaaS Clouds , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[34]  Hitesh Ballani,et al.  End-to-end Performance Isolation Through Virtual Datacenters , 2014, OSDI.

[35]  Srikanth Kandula,et al.  Multi-resource packing for cluster schedulers , 2014, SIGCOMM.

[36]  Bu-Sung Lee,et al.  Long-term resource fairness: towards economic fairness on pay-as-you-use computing systems , 2014, ICS '14.

[37]  Muli Ben-Yehuda,et al.  Ginseng: market-driven memory allocation , 2014, VEE '14.

[38]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[39]  Min Zhu,et al.  B4: experience with a globally-deployed software defined wan , 2013, SIGCOMM.

[40]  Sujata Banerjee,et al.  ElasticSwitch: practical work-conserving bandwidth guarantees for cloud computing , 2013, SIGCOMM.

[41]  Srikanth Kandula,et al.  Achieving high utilization with software-driven WAN , 2013, SIGCOMM.

[42]  Gala Yadgar,et al.  Cooperative caching with return on investment , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[43]  Randy H. Katz,et al.  Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[44]  Vyas Sekar,et al.  Multi-resource fair queueing for packet processing , 2012, CCRV.

[45]  Di Xie,et al.  The only constant is change: incorporating time-varying network reservations in data centers , 2012, CCRV.

[46]  Song Jiang,et al.  Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[47]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[48]  Benjamin Hindman,et al.  Dominant Resource Fairness: Fair Allocation of Multiple Resource Types , 2011, NSDI.

[49]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[50]  Albert G. Greenberg,et al.  Seawall: Performance Isolation for Cloud Datacenter Networks , 2010, HotCloud.

[51]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[52]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[53]  Mor Harchol-Balter,et al.  ATLAS : A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers , 2010 .

[54]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[55]  Adam Wierman,et al.  The Foreground-Background queue: A survey , 2008, Perform. Evaluation.

[56]  Arun Venkataramani,et al.  Do incentives build robustness in bit torrent , 2007 .

[57]  Guillaume Urvoy-Keller,et al.  Scheduling in practice , 2007, PERV.

[58]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[59]  Brian D. Noble,et al.  Samsara: honor among thieves in peer-to-peer storage , 2003, SOSP '03.

[60]  Joan Feigenbaum,et al.  Distributed algorithmic mechanism design: recent results and future directions , 2002, DIALM '02.

[61]  Hui Zhang,et al.  Hierarchical packet fair queueing algorithms , 1996, SIGCOMM '96.

[62]  Hussein M. Abdel-Wahab,et al.  A proportional share resource allocation algorithm for real-time, time-shared systems , 1996, 17th IEEE Real-Time Systems Symposium.

[63]  George Varghese,et al.  Efficient fair queueing using deficit round robin , 1995, SIGCOMM '95.

[64]  Abhay Parekh,et al.  A generalized processor sharing approach to flow control in integrated services networks-the single node case , 1992, [Proceedings] IEEE INFOCOM '92: The Conference on Computer Communications.

[65]  Paul E. McKenney,et al.  Stochastic fairness queueing , 1990, Proceedings. IEEE INFOCOM '90: Ninth Annual Joint Conference of the IEEE Computer and Communications Societies@m_The Multiple Facets of Integration.

[66]  Scott Shenker,et al.  Analysis and simulation of a fair queueing algorithm , 1989, SIGCOMM '89.

[67]  17th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2023, Boston, MA, USA, July 10-12, 2023 , 2023, OSDI.

[68]  Simon Peter,et al.  Rearchitecting Linux Storage Stack for µs Latency and High Throughput , 2021, OSDI.

[69]  Ashish Motivala,et al.  Building An Elastic Query Engine on Disaggregated Storage , 2020, NSDI.

[70]  K. V. Rashmi,et al.  A large scale analysis of hundreds of in-memory cache clusters at Twitter , 2020, OSDI.

[71]  Mor Harchol-Balter,et al.  The CacheLib Caching Engine: Design and Experiences at Scale , 2020, OSDI.

[72]  Kang G. Shin,et al.  Tiresias: A GPU Cluster Manager for Distributed Deep Learning , 2019, NSDI.

[73]  T. Karagiannis,et al.  Chatty Tenants and the Cloud Network Sharing Problem , 2013, NSDI.

[74]  Mike Paleczny,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[75]  Gautam Kumar,et al.  FairCloud: sharing the network in cloud computing , 2011, CCRV.

[76]  Ron Lavi,et al.  Algorithmic Mechanism Design , 2008, Encyclopedia of Algorithms.

[77]  T. Roughgarden Algorithmic Game Theory , 2007 .

[78]  J. Doug Tygar,et al.  Side Effects Are Not Sufficient to Authenticate Software , 2004, USENIX Security Symposium.

[79]  Carl A. Waldspurger,et al.  Stride Scheduling: Deterministic Proportional- Share Resource Management , 1995 .

[80]  L. Kleinrock Queueing Systems: Volume I-Theory , 1975 .

[81]  M. Freedman,et al.  Performance Isolation and Fairness for Multi-Tenant Cloud Storage , 2012, OSDI.