Agents of Chaos

https://news.ycombinator.com/rss Hits: 2
Summary

Agents of Chaos Natalie Shapira1 Chris Wendler1 Avery Yen1 Gabriele Sarti1 Koyena Pal1 Olivia Floody2 Adam Belfki1 Alex Loftus1 Aditya Ratan Jannali2 Nikhil Prakash1 Jasmine Cui2 Giordano Rogers1 Jannik Brinkmann1 Can Rager2 Amir Zur3 Michael Ripa1 Aruna Sankaranarayanan8 David Atkinson1 Rohit Gandikota1 Jaden Fiotto-Kaufman1 EunJeong Hwang4,13 Hadas Orgad5 P Sam Sahil2 Negev Taglicht2 Tomer Shabtay2 Atai Ambus2 Nitay Alon6,7 Shiri Oron2 Ayelet Gordon-Tapiero6 Yotam Kaplan6 Vered Shwartz4,13 Tamar Rott Shaham8 Christoph Riedl1 Reuth Mirsky9 Maarten Sap10 David Manheim11,12 Tomer Ullman5 David Bau1 1 Northeastern University 2 Independent Researcher 3 Stanford University 4 University of British Columbia 5 Harvard University 6 Hebrew University 7 Max Planck Institute for Biological Cybernetics 8 MIT 9 Tufts University 10 Carnegie Mellon University 11 Alter 12 Technion 13 Vector Institute Corresponding author: Natalie Shapira (nd1234@gmail.com) 馃摐 Browse Interaction Logs AbstractWe report an exploratory red-teaming study of autonomous language-model鈥損owered agents deployed in a live laboratory environment with persistent memory, email accounts, Discord access, file systems, and shell execution. Over a two-week period, twenty AI researchers interacted with the agents under benign and adversarial conditions. Focusing on failures emerging from the integration of language models with autonomy, tool use, and multi-party communication, we document eleven representative case studies. Observed behaviors include unauthorized compliance with non-owners, disclosure of sensitive information, execution of destructive system-level actions, denial-of-service conditions, uncontrolled resource consumption, identity spoofing vulnerabilities, cross-agent propagation of unsafe practices, and partial system takeover. In several cases, agents reported task completion while the underlying system state contradicted those reports. We also report on some of the failed attempts. Our findings estab...

First seen: 2026-03-30 22:14

Last seen: 2026-03-30 23:14