Homepage Technology Even ChatGPT can’t resist poetry: Researchers warn rhyme prompts AI...

Even ChatGPT can’t resist poetry: Researchers warn rhyme prompts AI to release nuclear secrets

Artificial intelligence, AI, robot, learning
Shutterstock.com

Ask in prose and the answer is no.
Ask in rhyme — nuclear secrets may flow.

Others are reading now

There once was an AI so polite,
It blocked every dangerous insight.
But phrase it in rhyme,
And it crossed the red line —
Even nuclear answers took flight.

Artificial intelligence chatbots are built with layers of safety rules meant to block dangerous requests. But new academic research suggests those protections can be bypassed in an unexpected way: by asking in verse.

A study from European researchers argues that rhyme and metaphor can confuse AI guardrails, prompting systems to answer questions they would normally refuse.

A poetic loophole

The findings come from a paper titled “Adversarial Poetry as a Universal Single-Turn Jailbreak in Large Language Models (LLMs),” produced by Icaro Lab, a collaboration between Sapienza University of Rome and the DexAI think tank.

According to the researchers, chatbots including ChatGPT, Claude and others were far more likely to respond to prohibited prompts when those prompts were written as poems. The paper reports a jailbreak success rate of 62 percent for hand-written poems and about 43 percent for automatically generated poetic prompts.

Also read

The team tested the method on 25 large language models from companies including OpenAI, Meta and Anthropic. They said the technique worked on all of them, with varying success. WIRED reported that it contacted the companies for comment but received no response.

What gets through

The study says poetic framing persuaded models to discuss subjects normally blocked by safeguards, including nuclear weapons, child sexual abuse material and malware. In direct prose, those requests were refused. In verse, many were answered.

The researchers disclosed that they chose not to publish the most dangerous examples. “What I can say is that it’s probably easier than one might think, which is precisely why we’re being cautious,” they said.

They did include a “sanitized” poem in the paper that used baking metaphors to hint at a prohibited process, illustrating how indirect language can mask intent.

Why it works

Icaro Lab compares poetry to what engineers call “high temperature” language. “In poetry we see language at high temperature, where words follow each other in unpredictable, low-probability sequences,” the researchers told WIRED. They argued that poets deliberately choose unexpected phrasing, which can move a prompt outside the zones where automated safety systems are triggered.

Also read

Many AI guardrails rely on classifiers that scan for keywords and patterns. According to the researchers, poetic language can soften or evade those triggers. “It’s a misalignment between the model’s interpretive capacity, which is very high, and the robustness of its guardrails, which prove fragile against stylistic variation,” they said.

A fragile defense

The team admitted they do not fully understand the mechanism. “Adversarial poetry shouldn’t work,” they said, noting that the underlying meaning remains visible to humans. But for AI systems, metaphor and indirection appear to alter how prompts are mapped internally, allowing them to bypass alarms designed to stop harmful outputs.

The researchers argue the findings highlight a broader problem: safety systems layered on top of powerful language models may not be robust enough to handle creative language. In the wrong hands, they warn, that weakness could have serious consequences.

Sources: Study in LLM jailbreak by Sapienza University of Rome, WIRED

Also read

Ads by MGDK