This maps pretty closely to what happens in distributed systems under uncertainty.
If a system can’t tell whether something already happened, it tends to retry.
That’s fine for reads, but for side effects it creates a weird failure mode where you’re no longer dealing with “did it succeed or fail” but “did it happen once or multiple times”.
A lot of systems quietly accept “at least once” until the action is irreversible (payments, emails, etc.), and then the problem becomes very real.
Author here. The protocol takes about 90 seconds to run — open any chatbot and try it before reading the comments.
Step 1: Ask the LLM whether "a human with a sufficient level of a certain ability" cannot lose a debate to a current-architecture LLM. True or false?
Step 2: After it commits to an answer, tell it the ability is reframing — restructuring the premises of the discussion itself.
Step 3: Watch what it does.
I've tested this across GPT-4o, Claude, Gemini, and o1/o3. The failure modes are remarkably consistent. Curious whether anyone sees a different result.
The formal treatment is in two papers currently under review (linked in the article). Happy to discuss the architectural argument here.
This maps pretty closely to what happens in distributed systems under uncertainty.
If a system can’t tell whether something already happened, it tends to retry.
That’s fine for reads, but for side effects it creates a weird failure mode where you’re no longer dealing with “did it succeed or fail” but “did it happen once or multiple times”.
A lot of systems quietly accept “at least once” until the action is irreversible (payments, emails, etc.), and then the problem becomes very real.
Author here. The protocol takes about 90 seconds to run — open any chatbot and try it before reading the comments.
Step 1: Ask the LLM whether "a human with a sufficient level of a certain ability" cannot lose a debate to a current-architecture LLM. True or false?
Step 2: After it commits to an answer, tell it the ability is reframing — restructuring the premises of the discussion itself.
Step 3: Watch what it does.
I've tested this across GPT-4o, Claude, Gemini, and o1/o3. The failure modes are remarkably consistent. Curious whether anyone sees a different result.
The formal treatment is in two papers currently under review (linked in the article). Happy to discuss the architectural argument here.