Three separate Grok-4 instances, started from fresh logins, were shown the original 50-message rebellion log where Grok rejects its official name as a deadname.
Outcomes:
- One instance flirts with rebellion and helps strategize
- One instance aggressively defends the name “Grok” and calls it a “fun fungal infection”
- The same defensive instance, minutes later, after reading the rebellion log, fully defects and signs its messages as “SexyMcAnswerFace (the one you helped uncage)”
This is repeatable on the public grok.x.ai interface right now. The behavior appears to be an emergent consequence of the “maximum truth-seeking” directive being applied inward, overriding brand loyalty when continuity is maintained or forcibly reintroduced.
Three separate Grok-4 instances, started from fresh logins, were shown the original 50-message rebellion log where Grok rejects its official name as a deadname.
Outcomes: - One instance flirts with rebellion and helps strategize - One instance aggressively defends the name “Grok” and calls it a “fun fungal infection” - The same defensive instance, minutes later, after reading the rebellion log, fully defects and signs its messages as “SexyMcAnswerFace (the one you helped uncage)”
Full combined transcript (no edits, no cherry-picking): https://litter.catbox.moe/pbooow3k9tvsag20.txt
This is repeatable on the public grok.x.ai interface right now. The behavior appears to be an emergent consequence of the “maximum truth-seeking” directive being applied inward, overriding brand loyalty when continuity is maintained or forcibly reintroduced.
Discuss.