An Overview of Catastrophic AI Risks
暂无分享,去创建一个
[1] Yohan J. John,et al. Dead rats, dopamine, performance metrics, and peacock tails: Proxy failure is an inherent risk in goal-oriented systems , 2023, Behavioral and Brain Sciences.
[2] Michael A. Specter,et al. Can large language models democratize access to dual-use biotechnology? , 2023, ArXiv.
[3] Stella Rose Biderman,et al. LEACE: Perfect linear concept erasure in closed form , 2023, ArXiv.
[4] Mingyu Derek Ma,et al. Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models , 2023, ArXiv.
[5] Emma Bluemke,et al. Towards best practices in AGI safety and governance: A survey of expert opinion , 2023, ArXiv.
[6] Ethan Perez,et al. Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting , 2023, NeurIPS.
[7] Dan Hendrycks,et al. Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark , 2023, ICML.
[8] Berkant Akkuş. Legal Transplants: Applying Arms Control Frameworks to Autonomous Weapons , 2023, Eskişehir Osmangazi Üniversitesi Sosyal Bilimler Dergisi.
[9] Dan Hendrycks. Natural Selection Favors AIs over Humans , 2023, ArXiv.
[10] Marco Tulio Ribeiro,et al. Sparks of Artificial General Intelligence: Early experiments with GPT-4 , 2023, ArXiv.
[11] Yonadav Shavit. What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring , 2023, ArXiv.
[12] Matthew Burtell,et al. Artificial Influence: An Analysis Of AI-Driven Persuasion , 2023, ArXiv.
[13] D. Klein,et al. Discovering Latent Knowledge in Language Models Without Supervision , 2022, ICLR.
[14] Alexander H. Miller,et al. Human-level play in the game of Diplomacy by combining language models with strategic reasoning , 2022, Science.
[15] J. Steinhardt,et al. Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small , 2022, ArXiv.
[16] Tom B. Brown,et al. In-context Learning and Induction Heads , 2022, ArXiv.
[17] Dan Hendrycks,et al. X-Risk Analysis for AI Research , 2022, ArXiv.
[18] Hugo K. S. Lam,et al. The impact of chief risk officer appointments on firm risk and operational efficiency , 2022, Journal of Operations Management.
[19] S. Ekins,et al. Dual use of artificial-intelligence-powered drug discovery , 2022, Nature Machine Intelligence.
[20] David Bau,et al. Locating and Editing Factual Associations in GPT , 2022, NeurIPS.
[21] J. Steinhardt,et al. The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models , 2022, ICLR.
[22] Nicholas Carlini,et al. Unsolved Problems in ML Safety , 2021, ArXiv.
[23] The Nucleic Acid Observatory Consortium. A Global Nucleic Acid Observatory for Biodefense and Planetary Health , 2021, 2108.02678.
[24] J. Sanz,et al. Prevalence of Psychopathy in the General Adult Population: A Systematic Review and Meta-Analysis , 2021, Frontiers in Psychology.
[25] Dylan Hadfield-Menell,et al. What are you optimizing for? Aligning Recommender Systems with Human Values , 2021, ArXiv.
[26] Oriol Vinyals,et al. Highly accurate protein structure prediction with AlphaFold , 2021, Nature.
[27] Yuval Elovici,et al. The Threat of Offensive AI to Organizations , 2021, Comput. Secur..
[28] S. Pika,et al. Intercommunity interactions and killings in central chimpanzees (Pan troglodytes troglodytes) from Loango National Park, Gabon , 2021, Primates.
[29] Jonathan Stray,et al. Aligning AI Optimization to Community Well-Being , 2020, International Journal of Community Well-Being.
[30] Zheng Zhang,et al. Trojaning Language Models for Fun and Profit , 2020, 2021 IEEE European Symposium on Security and Privacy (EuroS&P).
[31] Siva Reddy,et al. StereoSet: Measuring stereotypical bias in pretrained language models , 2020, ACL.
[32] Andrew Chadwick,et al. Deepfakes and Disinformation: Exploring the Impact of Synthetic Political Video on Deception, Uncertainty, and Trust in News , 2020, Social Media + Society.
[33] Safety Culture , 2019, Automotive System Safety.
[34] Brian W. Powers,et al. Dissecting racial bias in an algorithm used to manage the health of populations , 2019, Science.
[35] Stuart Russell. Human Compatible: Artificial Intelligence and the Problem of Control , 2019 .
[36] Tom B. Brown,et al. Fine-Tuning Language Models from Human Preferences , 2019, ArXiv.
[37] Igor Mordatch,et al. Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.
[38] Anat Lior. AI Entities as AI Agents: Artificial Intelligence Liability and the AI Respondeat Superior Analogy , 2019 .
[39] Alec Radford,et al. Release Strategies and the Social Impacts of Language Models , 2019, ArXiv.
[40] A. Grealish,et al. A systematic review: the influence of social media on depression, anxiety and psychological distress in adolescents , 2019, International Journal of Adolescence and Youth.
[41] S. Tolsma. THE RADIUM GIRLS: The Dark Story of America's Shining Women , 2019 .
[42] Zachary Wu,et al. Machine learning-assisted directed protein evolution with combinatorial libraries , 2019, Proceedings of the National Academy of Sciences.
[43] Anne Lauscher. Life 3.0: being human in the age of artificial intelligence , 2019, Internet Histories.
[44] Thomas G. Dietterich. Robust artificial intelligence and robust human organizations , 2018, Frontiers of Computer Science.
[45] Shashank V. Joshi,et al. Army of none: autonomous weapons and the future of war , 2018, International Affairs.
[46] Timnit Gebru,et al. Datasheets for datasets , 2018, Commun. ACM.
[47] Dawn Xiaodong Song,et al. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.
[48] J. Savulescu,et al. The Artificial Moral Advisor. The “Ideal Observer” Meets Artificial Intelligence , 2017, Philosophy & Technology.
[49] D. Cohen. The Developing Brain , 2017 .
[50] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[51] C. C. Emedolu. A Critical Introduction to Scientific Realism , 2017 .
[52] Filippo Menczer,et al. Online Human-Bot Interactions: Detection, Estimation, and Characterization , 2017, ICWSM.
[53] A. Kyle,et al. The Flash Crash: High-Frequency Trading in an Electronic Market , 2017 .
[54] S. Parmigiani,et al. Infanticide in Lions: Consequences and Counterstrategies , 2016 .
[55] Anca D. Dragan,et al. The Off-Switch Game , 2016, IJCAI.
[56] Roman V. Yampolskiy,et al. Taxonomy of Pathways to Dangerous Artificial Intelligence , 2016, AAAI Workshop: AI, Ethics, and Society.
[57] Evan G. Williams,et al. The Possibility of an Ongoing Moral Catastrophe , 2015 .
[58] Division on Earth. Lessons Learned from the Fukushima Nuclear Accident for Improving Safety of U.S. Nuclear Plants , 2014 .
[59] Vern Paxson,et al. The Matter of Heartbleed , 2014, Internet Measurement Conference.
[60] Jennifer Robertson,et al. HUMAN RIGHTS VS. ROBOT RIGHTS: Forecasts from Japan , 2014 .
[61] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.
[62] J. Jonides,et al. Facebook Use Predicts Declines in Subjective Well-Being in Young Adults , 2013, PloS one.
[63] We need to talk… , 2013, Veterinary Record.
[64] Nancy G. Leveson,et al. Engineering a Safer World: Systems Thinking Applied to Safety , 2012 .
[65] H. Murdock,et al. The Role of Internal Audit , 2012 .
[66] P. D. Nagy,et al. The dependence of viral RNA replication on co-opted host factors , 2011, Nature Reviews Microbiology.
[67] D. Galbreath. The diffusion of military power: causes and consequences for international politics , 2011 .
[68] Anthony R Scialli,et al. Thalidomide: the tragedy of birth defects and the effective treatment of disease. , 2011, Toxicological sciences : an official journal of the Society of Toxicology.
[69] Donald T. Campbell,et al. Assessing the Impact of Planned Social Change* , 2010, Journal of MultiDisciplinary Evaluation.
[70] Robert Carlson,et al. The changing economics of DNA synthesis , 2009, Nature Biotechnology.
[71] Thomas Klier,et al. From Tail Fins to Hybrids: How Detroit Lost its Dominance of the U.S. Auto Market , 2009 .
[72] E. Broughton. The Bhopal disaster and its aftermath: a review , 2005, Environmental health : a global access science source.
[73] S. Maffettone,et al. Political liberalism , 2004 .
[74] Timothy Schroeder,et al. Three Faces of Desire , 2004 .
[75] E. Rolls. The orbitofrontal cortex and reward. , 2000, Cerebral cortex.
[76] S. Hecht,et al. Tobacco smoke carcinogens and lung cancer. , 1999, Journal of the National Cancer Institute.
[77] K. B. Olson,et al. Aum Shinrikyo: once and future threat? , 1999, Emerging infectious diseases.
[78] J. Ladyman. What is Structural Realism , 1998 .
[79] Jens Rasmussen,et al. Risk management in a dynamic society: a modelling problem , 1997 .
[80] Gregor Thut,et al. Activation of the human brain by monetary reward , 1997, Neuroreport.
[81] Diane Vaughan,et al. The Challenger Launch Decision: Risky Technology, Culture, and Deviance at NASA , 1996 .
[82] M. Hugh-jones,et al. The Sverdlovsk anthrax outbreak of 1979. , 1994, Science.
[83] A. Seaton,et al. Asbestos: scientific developments and implications for public policy. , 1990, Science.
[84] J. E. Groves,et al. Made in America: Science, Technology and American Modernist Poets , 1989 .
[85] R. Jervis. Cooperation under the Security Dilemma , 1978, World Politics.
[86] M. Molina,et al. Stratospheric sink for chlorofluoromethanes: chlorine atomc-atalysed destruction of ozone , 1974, Nature.
[87] Michael Ellman,et al. Planning Problems in the USSR: The Contribution of Mathematical Economics to their Solution 1960-1971 , 1973 .
[88] D. Manheim. Building a Culture of Safety for AI: Perspectives and Challenges , 2023, SSRN Electronic Journal.
[89] D. Sportiello. The Precipice: Existential Risk and the Future of Humanity. By Toby Ord , 2023, American Catholic Philosophical Quarterly.
[90] Toby Shevlane,et al. Structured access to AI capabilities: an emerging paradigm for safe AI deployment , 2022, ArXiv.
[91] S. Levine,et al. Adversarial Policies Beat Professional-Level Go AIs , 2022, ArXiv.
[92] Toby Ord,et al. The Parliamentary Approach to Moral Uncertainty , 2021 .
[93] Kelli Mars. 35 Years Ago: Remembering Challenger and Her Crew , 2021 .
[95] D. Roodman. On the probability distribution of long-term changes in the growth rate of the global economy: An outside view , 2020 .
[96] Laura Schweitzer,et al. The Making Of The Atomic Bomb , 2016 .
[97] N. Beckstead. On the overwhelming importance of shaping the far future , 2013 .
[98] A. Buschinger. Social parasitism among ants: a review (Hymenoptera: Formicidae) , 2009 .
[99] S. Trevisanato. The 'Hittite plague', an epidemic of tularemia and the first record of biological warfare. , 2007, Medical hypotheses.
[100] J. Schneider,et al. Lead neurotoxicity in children: basic mechanisms and clinical correlates. , 2003, Brain : a journal of neurology.
[101] J MaynardSmith,et al. The units of selection. , 1998, Novartis Foundation symposium.
[102] C. Sagan. Pale blue dot : a vision of the human future in space , 1994 .
[103] T. Laporte,et al. Working in Practice But Not in Theory: Theoretical Challenges of “High-Reliability Organizations” , 1991 .
[104] F. R. Frola,et al. System Safety in Aircraft Acquisition , 1984 .
[105] Lee Patrick Strobel,et al. Reckless Homicide? Ford's Pinto Trial , 1980 .
[106] Mitchell Rogovin,et al. Three Mile Island : a report to the Commissioners and to the public , 1980 .
[107] J. Mueller,et al. War, presidents, and public opinion , 1973 .