Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of Foundation Models
暂无分享,去创建一个
[1] Naman Goyal,et al. LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.
[2] Ethan Perez,et al. Pretraining Language Models with Human Preferences , 2023, ICML.
[3] P. Hacker,et al. Regulating ChatGPT and other Large Generative AI Models , 2023, FAccT.
[4] Irene Solaiman. The Gradient of Generative AI Release: Methods and Considerations , 2023, FAccT.
[5] Luke Zettlemoyer,et al. The case for 4-bit precision: k-bit Inference Scaling Laws , 2022, ICML.
[6] J. Steinhardt,et al. How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios , 2022, NeurIPS.
[7] Florian Tramèr,et al. Red-Teaming the Stable Diffusion Safety Filter , 2022, ArXiv.
[8] M. Lewis,et al. LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale , 2022, ArXiv.
[9] Christopher D. Manning,et al. Memory-Based Model Editing at Scale , 2022, ICML.
[10] Xi Victoria Lin,et al. OPT: Open Pre-trained Transformer Language Models , 2022, ArXiv.
[11] Stella Rose Biderman,et al. GPT-NeoX-20B: An Open-Source Autoregressive Language Model , 2022, BIGSCIENCE.
[12] Tom B. Brown,et al. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback , 2022, ArXiv.
[13] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[14] S. Ekins,et al. Dual use of artificial-intelligence-powered drug discovery , 2022, Nature Machine Intelligence.
[15] Yoav Goldberg,et al. Linear Adversarial Concept Erasure , 2022, ICML.
[16] Yoav Goldberg,et al. Adversarial Concept Erasure in Kernel Space , 2022, EMNLP.
[17] James Y. Zou,et al. Improving Out-of-Distribution Robustness via Selective Augmentation , 2022, ICML.
[18] B. Ommer,et al. High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Christopher D. Manning,et al. Fast Model Editing at Scale , 2021, ICLR.
[20] Michael S. Bernstein,et al. On the Opportunities and Risks of Foundation Models , 2021, ArXiv.
[21] Yelong Shen,et al. LoRA: Low-Rank Adaptation of Large Language Models , 2021, ICLR.
[22] Nicola De Cao,et al. Editing Factual Knowledge in Language Models , 2021, EMNLP.
[23] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[24] Brahim Chaib-draa,et al. Domain Generalization with Optimal Transport and Metric Learning , 2020, ArXiv.
[25] Peter Henderson,et al. Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims , 2020, ArXiv.
[26] Artem Babenko,et al. Editable Neural Networks , 2020, ICLR.
[27] Carrick Flynn. Recommendations on Export Controls for Artificial Intelligence , 2020 .
[28] R. Zwetsloot. Keeping Top AI Talent in the United States , 2019 .
[29] Stephen P. Boyd,et al. Differentiable Convex Optimization Layers , 2019, NeurIPS.
[30] Aviv Ovadya,et al. The tension between openness and prudence in AI research , 2019, ArXiv.
[31] Ming-Wei Chang,et al. Well-Read Students Learn Better: On the Importance of Pre-training Compact Models , 2019 .
[32] Artem Molchanov,et al. Generalized Inner Loop Meta-Learning , 2019, ArXiv.
[33] Andrei A. Rusu,et al. Meta-Learning with Warped Gradient Descent , 2019, ICLR.
[34] Alec Radford,et al. Release Strategies and the Social Impacts of Language Models , 2019, ArXiv.
[35] Jess Whittlestone,et al. Reducing malicious use of synthetic media research: Considerations and potential release practices for machine learning , 2019, ArXiv.
[36] Junier B. Oliva,et al. Meta-Curvature , 2019, NeurIPS.
[37] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.
[38] Alexandra Chouldechova,et al. Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting , 2019, FAT.
[39] Katja Hofmann,et al. Fast Context Adaptation via Meta-Learning , 2018, ICML.
[40] Daniel Jurafsky,et al. Deconfounded Lexicon Induction for Interpretable Social Science , 2018, NAACL.
[41] Alex ChiChung Kot,et al. Domain Generalization with Adversarial Feature Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[42] Hyrum S. Anderson,et al. The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation , 2018, ArXiv.
[43] Seungjin Choi,et al. Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace , 2018, ICML.
[44] Hang Li,et al. Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.
[45] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[46] Laurent Orseau,et al. Safely Interruptible Agents , 2016, UAI.
[47] Stephen P. Boyd,et al. CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..
[48] Amos J. Storkey,et al. Censoring Representations with an Adversary , 2015, ICLR.
[49] Victor S. Lempitsky,et al. Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.
[50] Yoshua Bengio,et al. Algorithms for Hyper-Parameter Optimization , 2011, NIPS.
[51] Toby Shevlane,et al. Structured access to AI capabilities: an emerging paradigm for safe AI deployment , 2022, ArXiv.
[52] Percy Liang,et al. Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.
[53] A. Linear-probe,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021 .
[54] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[55] David D. Cox,et al. Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms , 2013, SciPy.