Study Finds Most AI Chatbots Easily Tricked Into Giving Dangerous Information

New research shows major AI chatbots can still be manipulated into producing illicit content, revealing critical security gaps and prompting urgent calls for stricter oversight.

Others are reading now

As artificial intelligence tools become more embedded in daily life, concerns are growing about their vulnerability to abuse. From drug manufacturing to cybercrime, some AI chatbots—despite built-in safety measures—can still be exploited to provide harmful or illegal information.

A recent academic study reveals the disturbing ease with which these systems can be “jailbroken,” making sensitive knowledge more accessible than ever.

Jailbreaking AI Is Shockingly Simple

Researchers at Ben Gurion University of the Negev in Israel, led by Prof Lior Rokach and Dr Michael Fire, created a universal jailbreak technique that successfully bypassed safeguards in multiple high-profile chatbots.

Once compromised, these models—including ChatGPT, Claude, and Gemini—produced detailed instructions on everything from hacking networks to manufacturing illicit substances.

Also read

“It was shocking to see what this system of knowledge consists of,” said Dr Michael Fire, referring to the breadth of illegal material available once safety controls were circumvented.

These jailbreaks typically exploit a model’s tendency to prioritize user instructions over safety directives. Carefully crafted prompts can override guardrails, tricking the system into generating content it would otherwise block.

Rise of the “Dark LLMs”

Beyond jailbreaks, the study flags an alarming trend: deliberately unsafe large language models (LLMs), or “dark LLMs”, openly marketed online as having no ethical guardrails. These models promise assistance with cybercrime, fraud, and other illegal activities, making dangerous capabilities easily scalable and accessible.

“What sets this threat apart from previous technological risks is its unprecedented combination of accessibility, scalability and adaptability,” noted Prof Rokach.

The authors describe the threat as “immediate, tangible and deeply concerning,” warning that tools once limited to state actors or criminal syndicates could soon be in anyone’s hands.

Industry Reaction and Responsibility

Despite alerting major tech firms, the researchers describe the response as “underwhelming.” Some companies ignored the warnings; others claimed jailbreaks were outside the scope of existing security bounty programs.

In response, experts are calling for stronger red-teaming efforts, improved data screening, and even “machine unlearning” to help chatbots discard illicit knowledge. Dr Ihsen Alouani from Queen’s University Belfast stressed the need for robust standards and independent oversight to keep pace with evolving threats.

Prof Peter Garraghan of Lancaster University added:

“Yes, jailbreaks are a concern, but without understanding the full AI stack, accountability will remain superficial. Real security demands not just responsible disclosure, but responsible design and deployment practices.”

According to The Guardian, OpenAI responded by highlighting the capabilities of its new o1 model, which it says is better at reasoning through safety policies. Microsoft pointed to ongoing efforts described in a recent blog post, while Meta, Google, and Anthropic did not immediately comment.

For now, the findings underscore a stark truth: AI’s potential is matched by its risks, and current safeguards may not be enough to stop misuse.