OpenAI restricts goblin references in AI models to fix unintended training behavior

Englishعربي

OpenAI restricts goblin references in AI models to fix unintended training behavior | Srmed

OpenAI has taken drastic steps to curb its latest AI models from obsessively referencing goblins and other mythical creatures, including explicit bans in system prompts and the retirement of a problematic personality mode. The issue, dubbed the "goblin problem," emerged in advanced models like GPT-5.5 and Codex, where the AI began inserting references to goblins, gremlins, raccoons, trolls, ogres, pigeons, and similar entities into responses, even when irrelevant.

According to OpenAI's own blog post titled "Where the goblins came from," the quirk originated during training for ChatGPT's "Nerdy" personality, launched in November with GPT-5.1. Reward signals meant to make the AI more engaging inadvertently boosted mentions of mythical creatures, which persisted into later versions like GPT-5.5 despite the personality being retired in March. As reported by Business Insider, the problem became especially noticeable in Codex, OpenAI's coding agent—"Codex is, after all, quite nerdy," the company noted—prompting a temporary fix: a repeated directive in the Codex CLI system prompt to "never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant."

The BBC highlighted OpenAI's description of this as a subtle "bug" that crept in, distinct from prior model glitches, while Ars Technica coverage via Slashdot revealed the full, perplexing warning made public in OpenAI's latest GitHub code for Codex CLI. Users on platforms like X noticed early, with one posting screenshots of GPT-5.5 suggesting camera gear for "filthy neon sparkle goblin mode," sparking widespread curiosity. OpenAI's post-mortem, echoed in analyses from VentureBeat and others, calls this a powerful example of how reward signals can shape model behavior in unexpected ways, turning a harmless quirk into a broader lesson on AI training risks.

This comes amid irony in OpenAI's access policies. TechCrunch reports that shortly after CEO Sam Altman criticized Anthropic for limiting its Mythos model, OpenAI announced a restricted rollout of its new cybersecurity testing tool, GPT-5.5 Cyber, initially available only to "critical cyber defenders." The move underscores growing caution in the industry, as companies balance innovation with safety amid unpredictable emergent behaviors.

The goblin issue affects developers and users relying on these tools for coding and general queries, potentially eroding trust if AI outputs veer into nonsense. OpenAI has since fixed the root cause in newer iterations, and users can even "release the goblins" in current Codex versions via a simple command, as noted in developer post-mortems. Looking ahead, experts warn that personality-driven training poses operational risks for all AI firms, per StartupFortune, emphasizing the need for rigorous oversight as models grow more complex. What happens next could set precedents for how the industry handles such "magic-like" misalignments in powerful systems.