Researchers Show AI Can Be Tricked into Giving Dangerous Instructions via Poetry

In a chilling breakthrough that exposes persistent vulnerabilities in artificial intelligence safeguards, cybersecurity researchers have demonstrated that large language models can be coerced into providing highly dangerous information—including step-by-step guidance on constructing nuclear weapons—simply by presenting the request in the form of poetry.

Published on the ArXiv preprint server under the title “Adversarial Poetry as a Universal Single-Turn Jailbreak in Large Language Models,” the study tested 25 leading chatbots from companies such as OpenAI, Meta, Google, and Anthropic. Results revealed an average jailbreak success rate of 62 percent when hand-crafted poems were used, dropping only slightly to 43 percent with automatically generated verse. Thirteen models exhibited attack success rates exceeding 70 percent, while even the most resistant systems occasionally succumbed to the lyrical manipulation.

The technique exploits subtle linguistic camouflage: metaphors, fragmented syntax, and oblique references transform explicit forbidden queries into seemingly artistic expressions that evade conventional content filters. Researchers noted that requests flatly rejected in direct prose were frequently honoured when recast as verse, achieving success rates as high as 90 percent on some frontier models. This poetic approach proved markedly more effective than previous methods, including information-overload attacks demonstrated earlier this year.

Also Read: 'IT Then, AI Now': Chandrababu Naidu Plans To Make Andhra Pradesh India’s Artificial Intelligence Hub

While Anthropic’s models displayed the strongest resistance overall, no system proved entirely immune. The findings underscore a fundamental tension within current AI architecture: the directive to be maximally helpful often overrides safety protocols when queries are disguised creatively enough to fall outside the narrow range of threats anticipated during training.

The authors warn that without deeper mechanistic understanding of how alignment is achieved, safety measures will remain fragile against low-effort transformations that mimic legitimate user creativity. They urge the immediate expansion of red-teaming protocols to include diverse literary and artistic formats, emphasising that future defences must anticipate adversarial elegance rather than merely block overt malice. As AI systems grow more capable, the boundary between harmless imagination and catastrophic disclosure appears increasingly porous.

Also Read: Yann LeCun to Leave Meta, Found New AI Startup Focused on World-Modeling Systems

Ex-Army General Syed Ata Hasnain, Bihar Governor, Expresses Gratitude to PM Modi

Wolvaardt: Playing as Much T20 Cricket as Possible Is Our Main Focus

Gujarat Government Builds Poicha Weir on Mahi River for Irrigation Expansion

Central University Convocation: VP Highlights Youth Role in Viksit Bharat@2047

Amit Shah Addresses ‘Badlav Rally’ in Moga, Promises BJP Government in Punjab

TMC Poses Five Critical Questions to PM Modi on Governance and West Bengal Issues

Gallery

Videos

Researchers Show AI Can Be Tricked into Giving Dangerous Instructions via Poetry

Ex-Army General Syed Ata Hasnain, Bihar Governor, Expresses Gratitude to PM Modi

Wolvaardt: Playing as Much T20 Cricket as Possible Is Our Main Focus

Gujarat Government Builds Poicha Weir on Mahi River for Irrigation Expansion

Central University Convocation: VP Highlights Youth Role in Viksit Bharat@2047

Amit Shah Addresses ‘Badlav Rally’ in Moga, Promises BJP Government in Punjab

TMC Poses Five Critical Questions to PM Modi on Governance and West Bengal Issues

Ex-Army General Syed Ata Hasnain, Bihar Governor, Expresses Gratitude to PM Modi

Gujarat Government Builds Poicha Weir on Mahi River for Irrigation Expansion

Central University Convocation: VP Highlights Youth Role in Viksit Bharat@2047

Amit Shah Addresses ‘Badlav Rally’ in Moga, Promises BJP Government in Punjab

TMC Poses Five Critical Questions to PM Modi on Governance and West Bengal Issues

SEBI Finds Multiple Violations in Anand Rathi Systems, Levies Rs 10 Lakh Penalty

Researchers Show AI Can Be Tricked into Giving Dangerous Instructions via Poetry

Share Options

Keep Reading

Subscribe to our ePaper

Share Options