论文信息 - Towards Federated Foundation Models: Scalable Dataset Pipelines for Group-Structured Learning - 字舞流文

Towards Federated Foundation Models: Scalable Dataset Pipelines for Group-Structured Learning

We introduce a library, Dataset Grouper, to create large-scale group-structured (e.g., federated) datasets, enabling federated learning simulation at the scale of foundation models. This library allows the creation of group-structured versions of existing datasets based on user-specified partitions, and directly leads to a variety of useful heterogeneous datasets that can be plugged into existing software frameworks. Dataset Grouper offers three key advantages. First, it scales to settings where even a single group's dataset is too large to fit in memory. Second, it provides flexibility, both in choosing the base (non-partitioned) dataset and in defining partitions. Finally, it is framework-agnostic. We empirically demonstrate that Dataset Grouper allows for large-scale federated language modeling simulations on datasets that are orders of magnitude larger than in previous work. Our experimental results show that algorithms like FedAvg operate more as meta-learning methods than as empirical risk minimization methods at this scale, suggesting their utility in downstream personalization and task-specific adaptation.

Zachary B. Charles | Zachary Garrett | Krishna Pillutla | Nicole Mitchell | Krishna Pillutla | Michael Reneer

[1] Ruiyi Zhang,et al. Towards Building The Federatedgpt: Federated Instruction Tuning , 2023, ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2] Chenyou Fan,et al. Federated Prompting and Chain-of-Thought Reasoning for Improving LLMs Answering , 2023, ArXiv.

[3] H. B. McMahan,et al. How to DP-fy ML: A Practical Guide to Machine Learning with Differential Privacy , 2023, ArXiv.

[4] Sai Praneeth Karimireddy,et al. FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings , 2022, NeurIPS.

[5] Tao Guo,et al. PromptFL: Let Federated Participants Cooperatively Learn Prompts Instead of Models – Federated Learning in Age of Foundation Model , 2022, IEEE Transactions on Mobile Computing.

[6] Kunal Talwar,et al. FLAIR: Federated Learning Annotated Image Repository , 2022, NeurIPS.

[7] Zachary B. Charles,et al. Motley: Benchmarking Heterogeneity and Personalization in Federated Learning , 2022, ArXiv.

[8] Yaliang Li,et al. pFL-Bench: A Comprehensive Benchmark for Personalized Federated Learning , 2022, NeurIPS.

[9] S. Shakkottai,et al. FedAvg with Fine Tuning: Local Updates Lead to Representation Learning , 2022, NeurIPS.

[10] Michael G. Rabbat,et al. Federated Learning with Partial Model Personalization , 2022, ICML.

[11] A. Suresh,et al. Scaling Language Model Size in Cross-Device Federated Learning , 2022, FL4NLP.

[12] H. B. McMahan,et al. Improved Differential Privacy for SGD via Optimal Private Linear Operators on Adaptive Streams , 2022, NeurIPS.

[13] Florian Tramèr,et al. What Does it Mean for a Language Model to Preserve Privacy? , 2022, FAccT.

[14] Zhiwei Steven Wu,et al. Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning , 2022, ICML.

[15] Dale Schuurmans,et al. Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.

[16] Z. Harchaoui,et al. Federated learning with superquantile aggregation for heterogeneous data , 2021, Machine Learning.

[17] Tatsunori B. Hashimoto,et al. Extending the WILDS Benchmark for Unsupervised Adaptation , 2021, ICLR.

[18] Emily Denton,et al. Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research , 2021, NeurIPS Datasets and Benchmarks.

[19] Zhilin Yang,et al. P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks , 2021, ArXiv.

[20] Zachary B. Charles,et al. Iterated Vector Fields and Conservatism, with Applications to Federated Learning , 2021, ALT.

[21] Alexander M. Rush,et al. Datasets: A Community Library for Natural Language Processing , 2021, EMNLP.

[22] Noah A. Smith,et al. Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation , 2021, ICLR.

[23] Peter Kairouz,et al. Back to the Drawing Board: A Critical Evaluation of Poisoning Attacks on Production Federated Learning , 2021, 2022 IEEE Symposium on Security and Privacy (SP).

[24] Ananda Theertha Suresh,et al. FedJAX: Federated learning simulation with JAX , 2021, ArXiv.

[25] Yelong Shen,et al. LoRA: Low-Rank Adaptation of Large Language Models , 2021, ICLR.

[26] Virginia Smith,et al. On Large-Cohort Training for Federated Learning , 2021, NeurIPS.

[27] Sanjay Sri Vallabh Singapuram,et al. FedScale: Benchmarking Model and System Performance of Federated Learning at Scale , 2021, ICML.

[28] Sangseok Yun,et al. Fast Federated Learning by Balancing Communication Trade-Offs , 2021, IEEE Transactions on Communications.

[29] Jack Bandy,et al. Addressing "Documentation Debt" in Machine Learning Research: A Retrospective Datasheet for BookCorpus , 2021, ArXiv.

[30] Bill Yuchen Lin,et al. FedNLP: Benchmarking Federated Learning Methods for Natural Language Processing Tasks , 2021, NAACL-HLT.

[31] Brian Lester,et al. The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.

[32] Jakub Konecný,et al. Convergence and Accuracy Trade-Offs in Federated Learning and Meta-Learning , 2021, AISTATS.

[33] Mehrdad Mahdavi,et al. Distributionally Robust Federated Averaging , 2021, NeurIPS.

[34] Rogier C. van Dalen,et al. Federated Evaluation and Tuning for On-Device Personalization: System Design & Applications , 2021, ArXiv.

[35] S. Shakkottai,et al. Exploiting Shared Representations for Personalized Federated Learning , 2021, ICML.

[36] P. Kairouz,et al. The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation , 2021, ICML.

[37] D. Murray,et al. tf.data: A Machine Learning Data Processing Framework , 2021, Proc. VLDB Endow..

[38] Charles Foster,et al. The Pile: An 800GB Dataset of Diverse Text for Language Modeling , 2020, ArXiv.

[39] Colin Raffel,et al. Extracting Training Data from Large Language Models , 2020, USENIX Security Symposium.

[40] Pang Wei Koh,et al. WILDS: A Benchmark of in-the-Wild Distribution Shifts , 2020, ICML.

[41] Jianfeng Zhan,et al. FLBench: A Benchmark Suite for Federated Learning , 2020, Communications in Computer and Information Science.

[42] Shiau Hong Lim,et al. Robustness and Personalization in Federated Learning: A Unified Approach via Regularization , 2020, 2022 IEEE International Conference on Edge Computing and Communications (EDGE).

[43] M. Zaheer,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.

[44] Daniel J. Beutel,et al. Flower: A Friendly Federated Learning Research Framework , 2020, 2007.14390.

[45] Ramesh Raskar,et al. FedML: A Research Library and Benchmark for Federated Machine Learning , 2020, ArXiv.

[46] Qinghua Liu,et al. Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization , 2020, NeurIPS.

[47] Jaime Fern'andez del R'io,et al. Array programming with NumPy , 2020, Nature.

[48] Sai Praneeth Karimireddy,et al. Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing , 2020, ICLR.

[49] Ali Jadbabaie,et al. Robust Federated Learning: The Case of Affine Distribution Shifts , 2020, NeurIPS.

[50] Bingsheng He,et al. The OARF Benchmark Suite: Characterization and Implications for Federated Learning Systems , 2020, ACM Trans. Intell. Syst. Technol..

[51] Nguyen H. Tran,et al. Personalized Federated Learning with Moreau Envelopes , 2020, NeurIPS.

[52] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.

[53] Manzil Zaheer,et al. Adaptive Federated Optimization , 2020, ICLR.

[54] Yassine Laguel,et al. Device Heterogeneity in Federated Learning: A Superquantile Approach , 2020, ArXiv.

[55] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.

[56] Zaïd Harchaoui,et al. Robust Aggregation for Federated Learning , 2019, IEEE Transactions on Signal Processing.

[57] Richard Nock,et al. Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[58] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[59] H. Vincent Poor,et al. Federated Learning With Differential Privacy: Algorithms and Performance Analysis , 2019, IEEE Transactions on Information Forensics and Security.

[60] Farzin Haddadpour,et al. Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization , 2019, NeurIPS.

[61] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[62] Sashank J. Reddi,et al. SCAFFOLD: Stochastic Controlled Averaging for Federated Learning , 2019, ICML.

[63] Jakub Konecný,et al. Improving Federated Learning Personalization via Model Agnostic Meta Learning , 2019, ArXiv.

[64] Tzu-Ming Harry Hsu,et al. Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification , 2019, ArXiv.

[65] Peter Richtárik,et al. Tighter Theory for Local SGD on Identical and Heterogeneous Data , 2019, AISTATS.

[66] Xiang Li,et al. On the Convergence of FedAvg on Non-IID Data , 2019, ICLR.

[67] Tian Li,et al. Fair Resource Allocation in Federated Learning , 2019, ICLR.

[68] Sebastian Caldas,et al. LEAF: A Benchmark for Federated Settings , 2018, ArXiv.

[69] Joshua Achiam,et al. On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[70] H. Brendan McMahan,et al. Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[71] Karen Kafadar,et al. Letter-Value Plots: Boxplots for Large Data , 2017 .

[72] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[73] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[74] Ian Goodfellow,et al. Deep Learning with Differential Privacy , 2016, CCS.

[75] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[76] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[77] S. Piantadosi. Zipf’s word frequency law in natural language: A critical review and future directions , 2014, Psychonomic Bulletin & Review.

[78] Cynthia Dwork,et al. Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[79] Yuen Ren Chao,et al. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .

[80] Dong-Jun Han,et al. Sageflow: Robust Federated Learning against Both Stragglers and Adversaries , 2021, NeurIPS.

[81] Prateek Jain,et al. Differentially Private Model Personalization , 2021, NeurIPS.

[82] Aryan Mokhtari,et al. Personalized Federated Learning with Theoretical Guarantees: A Model-Agnostic Meta-Learning Approach , 2020, NeurIPS.

[83] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[84] Aymeric Dieuleveut,et al. Communication trade-offs for Local-SGD with large step size , 2019, NeurIPS.