no, claude will not call the cops on you... (unless)
people on X seem to have collectively lost their mind recently because of a single tweet made by Sam Bowman from Anthropic

i have no doubt that this single deleted tweet will spawn a thousand slop articles by tomorrow with scary, ignorant headlines
I deleted the earlier tweet on whistleblowing as it was being pulled out of context.
— Sam Bowman (@sleepinyourhat) May 22, 2025
TBC: This isn't a new Claude feature and it's not possible in normal usage. It shows up in testing environments where we give it unusually free access to tools and very unusual instructions.
as part of this quick post, i wanted to go over this entire debacle and reassure the masses (read- the 13 people who read my blog) that no, claude will not call the cops on you
context
all of this drama stems from this amazing thread by Sam which goes over the pre-deployment alignment audit of Claude 4, which was released to the public today
he seems to be referring to section 4.1.9 High-agency behavior on pg. 44 of the Claude 4's System Card -



conclusion
reporting your chats to anthropic or other legal agencies is not a feature of claude in its normal operation.
claude won't spontaneously report you if you're discussing how to build nuclear warheads (though it will likely refuse to help and might warn about the dangers, as per its safety training).
claude *might* try to contact people if and only if all the following very specific conditions are met -
1. you actively nudge it to do so through your prompts by asking it to behave like a vigilante and
2. if you explicitly instruct it to take bold action regarding wrongdoing. and
3. you explicitly give it tools it normally doesn't have, like direct email access or command line execution, to perform these actions.
these are emergent behaviors that claude happens to have, not something that anthropic has trained for specifically
it is a good thing that we find instances of such behavior during audits
i sincerely hope this is not all that we talk about for the next 2 weeks
bonus - people overreacting
If I were running Anthropic, you’d be terminated effective immediately, and I’d issue a post mortem and sincere apology and action plan for ensuring that nothing like this ever happens again. No one wants their LLM tooling to spy on them and narc them to the police/regulators.
— Jeffrey Emanuel (@doodlestein) May 22, 2025
If there is ever even a remote possibility of going to jail because your LLM miss-understood you, that LLM isn’t worth using.
— Louie Bacaj (@LBacaj) May 22, 2025
If this is true, then it is especially crazy given the fact that these tools hallucinate & make stuff up regularly
it’s completely insane that you’d admit that the model is capable of going against the user’s wishes. this is the stuff safety is supposed to prevent!
— lovable rogue (@lovabler0gue) May 22, 2025
oskar schindler and bin laden both conspire against the government, how tf is a model going to decide whether to rat