fubarx@lemmy.world to Programmer Humor@programming.dev · 18 hours agoKillswitch Engineerlemmy.worldimagemessage-square71linkfedilinkarrow-up1952arrow-down15
arrow-up1947arrow-down1imageKillswitch Engineerlemmy.worldfubarx@lemmy.world to Programmer Humor@programming.dev · 18 hours agomessage-square71linkfedilink
minus-squareAwesomeLowlander@sh.itjust.workslinkfedilinkarrow-up14arrow-down1·edit-27 hours agoThe model ‘blackmailed’ the person because they provided it with a prompt asking it to pretend to blackmail them. Gee, I wonder what they expected. Have not heard the one about cancelling active alerts, but I doubt it’s any less bullshit. Got a source about it? Edit: Here’s a deep dive into why those claims are BS: https://www.aipanic.news/p/ai-blackmail-fact-checking-a-misleading
minus-squareyannic@lemmy.calinkfedilinkarrow-up6arrow-down4·7 hours agoI provided enough information that the relevant source shows up in a search, but here you go: In no situation did we explicitly instruct any models to blackmail or do any of the other harmful actions we observe. [Lynch, et al., “Agentic Misalignment: How LLMs Could be an Insider Threat”, Anthropic Research, 2025]
minus-squareAwesomeLowlander@sh.itjust.workslinkfedilinkarrow-up11arrow-down1·7 hours agoYes, I also already edited my comment with a link going into the incidents and why they’re absolute nonsense.
The model ‘blackmailed’ the person because they provided it with a prompt asking it to pretend to blackmail them. Gee, I wonder what they expected.
Have not heard the one about cancelling active alerts, but I doubt it’s any less bullshit. Got a source about it?
Edit: Here’s a deep dive into why those claims are BS: https://www.aipanic.news/p/ai-blackmail-fact-checking-a-misleading
I provided enough information that the relevant source shows up in a search, but here you go:
Yes, I also already edited my comment with a link going into the incidents and why they’re absolute nonsense.