This is what I hate about people trusting it. If you rely on AI to operate in a domain you don't man-handle, you will be tricked, and hackers will take advantage.
"AI! Write me gambling software with true randomness, but a 20% return on average over 1000 games"
Who will this hurt? The players, the hackers or the company.
When you write gambling software, you must know the house wins, and it is unhackable.
You just went and created the worst example. The model knows how to create an rng, that's not it weakness. In fact, if you give it a random mcp it won't do that.
If you use AI to write a gambling software you run in production without reviewing the code or without a solid testing strategy to verify preferred odds, then I have a bridge to sell you.
Ask ChatGPT or any other LLMs to give you ten random numbers between 0 an 9, and it will give you each number once (most of the time). At most, one of the digits may appear twice in my experience.
Actually, when I just verified it, I got these:
Prompt: "Give me ten random numbers between 0 and 9."
They can't be random, that's not how a stochastic model produces tokens. Unless the models in question are using a tool call for it, the result will very likely carry bias
The prompt doesn't say to pick a random number. I asked to pick a number from 1-1000 and it chose 7,381. Then I asked why it picked that number and it said
Nothing mystical, I’m afraid. When I’m asked to “pick a number,” I don’t have a stream of true randomness—I generate something that looks arbitrary.
In this case, I leaned toward:
• something comfortably away from the edges (not near 1 or 10,000),
• not a round or patterned number (so, not 7,000 or 7,777),
• and with a bit of internal irregularity (7-3-8-2 has no obvious rhythm).
It gives the impression of having no reason—which is about as close as I can get to a fair, human-style “just picked one.”
Not sure why you have been downvoted. While the LLM's introspection can't be trusted, that's indeed what happens: asked to generate a random number, the LLM picks one that feels random enough: not a round one, not too central or extreme, no patterns, not a known one. It ends up being always the same.
It doesn't "pick" anything. It produces the most likely number after this question based on the data it has been trained with! Reasoning models might pick in a sense that they will come up the the rules (like the grand parent post shows), but still it will produce the "most likely" number after the reasoning.
I bet that for the second random number in the same session, it is significantly less likely for an LLM to repeat its first number compared to two random draws. LLMs seem to mimic the human tendency to consider 7 as the most random, and I feel like repeating a random number would be perceived as not random.
in general its pulling from training data so the first numbers picked are always going to be pretty similar.
The AI is worried about not providing variety (for example picking 7 for 10 rounds) so it says to check the chat for context on round 2.
on round 3+ it feels like it solved the problem and doesn't need to evaluate anymore and it inadvertently uses more effort to create something less random
It picks 42 as the default integer value any time it writes sample programs. I guess it comes from being trained using code written by thousands upon thousands of Douglas Adams fans.
It's the same "brain", starting from exactly the same prompt, the same context, which means the same thoughts, the same identity... How do you expect it to produce different values?
Not really, the LLM is deterministic as far as I understand it, it's the sampling at the end that isn't. But the LLM can't prepare an even probability distribution to let the sampler decide randomly. It does reason deterministically and commits to a certain output.
> the LLM is deterministic as far as I understand it, it's the sampling at the end that isn't.
I guess it depends how you define the LLM: you could say it was the model/NN and the sampler is an extra added on, but a lot of people would name the model+sampler+system prompt+RLHF tuning (which would include the sampler) as the LLM.
The OP was talking about ChatGPT generating fixed output, not an internal model
Interesting. So you expect it to "not think" and simply produce a value corresponding to "it's the same to me", knowing that it will be translated into an actual random value.
Instead, exactly as a person would do, it does think of a specific number that feels random in that particular moment.
If "a random number is" is followed by all digits equally in the training dataset, then emitting a uniform distribution should minimise cross-entropy, right?
Er, should it? Even of you trained an LLM exactly over this type of question, if the sequence to predict in the training data is really random, then any output is equally wrong. Even if the output is a fixed "1234" or "0000". There is no signal to train on, not even one that favours an equal distribution.
On the other hand, LLMs show they know very well what a random number is and the fact it just shouldn't look like anything in particular, so they strive to come up with a number that doesn't look like anything in particular. Which happens to be always the same number given the starting conditions.
If I care a little bit about that random number I might reach for my phone and look at the digits of the seconds of the current time. It's 31 now. Not appropriate for multiple lookups.
Yes, there is probably some variable context in every chat (like date and time). Could work as a good seed but I guess you should ask the LLM to really make an effort to produce a seriously random number. (Actually I've just tried, even if you ask it to make an effort, the number will be always the same).
7,341 from my Discord bot using the Claude Code SDK.
"Ha — one off from the Opus default. I'd like to think I'm slightly more random than Opus but realistically we're probably pulling from the same biases. The "feels random but isn't" zone around 7300 is apparently very sticky for LLMs."
Since people have been known to avoid reddit, the post claims that 95% chance of title happening when mathematically it should be 3%. Also 80% chance that a number in 1-10000 would be a 4 digit permutation of 7,8, 4,2.
People use this as evidence that ChatGPT is unlike human thinking, but we also have a randomness bias: https://youtu.be/d6iQrh2TK98?is=x6hiAqc0NJI7oeiE (referenced in one of the comments. tl;dr: when asked a number between 1-100, most pick a number with 7)
But ChatGPT’s bias is worse. It’s really not creative, and I think this hurts its output in “creative” cases, including stock photos and paid writing (ex: ML-assisted ads are even worse than unassisted ads), although not an issue in other cases like programming.
Now you may think - obviously that’s because the model has the same weights - but the problem is deeper and harder to solve. First, ChatGPT’s conversations are supposed to be “personalized”, presumably by putting users’ history and interests in the prompt; but multiple users reported the same fact about octopi. Maybe they turned off personalization, but if not, it’s a huge failure that ChatGPT won’t even give them a fact related to their interests (and OpenAI could add that specific scenario to the system prompt, but it’s not a general solution). Moreover, Claude, Gemini, and other LLMs also give random numbers between 7200-7500, while humans aren’t that predictable.
Since all LLMs are trained on the same data (most of the internet), it makes sense that all are similar. But it means that the commons are being filled with similar slop, because many people use ChatGPT for creative work. Even when the prompt is creative, the output still has a sameness which makes it dull and mediocre. I’m one of those who are tired of seeing AI-generated text, photos, websites, etc.; it’s not always a problem the first time (although it is if there’s no actual content, which is another LLM problem), but it's always a problem the 5th time, when I’ve seen 4 other instances of the same design, writing style, etc.
Some possible solutions:
- Figure out how to actually personalize models. People are different and creative, so the aggregate output of a personalized ML would be creative
- Convince most people to stop using AI for creative work (popular pressure may do this; even with people’s low standards I’ve heard Gen-Z tend to recognize AI-assisted media and rate it lower), and instead use it to program tools that enable humans to create more efficiently. e.g. use Claude Code to help develop an easier and more powerful Adobe Flash (that does not involve users invoking Claude Code, even to write boilerplate; because I suspect it either won’t work, or interfere with the output making it sloppier)
tl;dr: in case it isn’t already apparent, LLMs are very uncreative so they're making the commons duller. The linked example is a symptom of this larger problem
This is what I hate about people trusting it. If you rely on AI to operate in a domain you don't man-handle, you will be tricked, and hackers will take advantage.
"AI! Write me gambling software with true randomness, but a 20% return on average over 1000 games"
Who will this hurt? The players, the hackers or the company.
When you write gambling software, you must know the house wins, and it is unhackable.
This example isn't good, because (while I'm sure there would be security holes) ChatGPT writes a random number program fine.
You just went and created the worst example. The model knows how to create an rng, that's not it weakness. In fact, if you give it a random mcp it won't do that.
If you use AI to write a gambling software you run in production without reviewing the code or without a solid testing strategy to verify preferred odds, then I have a bridge to sell you.
Amen. An extreme example.
But what if you tasked with writing business-critical software and forced by your employer to use their AI code generation tool?
https://ai.plainenglish.io/amazons-ai-ultimatum-why-80-of-de...
Or using it with full access to your data and not knowing how it works? :)
https://www.businessinsider.com/meta-ai-alignment-director-o...
I predict humans will take over most AI jobs in about ten years :)
[dead]
Ask ChatGPT or any other LLMs to give you ten random numbers between 0 an 9, and it will give you each number once (most of the time). At most, one of the digits may appear twice in my experience.
Actually, when I just verified it, I got these:
Prompt: "Give me ten random numbers between 0 and 9."
> 3, 7, 1, 9, 0, 4, 6, 2, 8, 5 (ChatGPT, 5.3 Instant)
> 3, 7, 1, 8, 4, 0, 6, 2, 9, 5 (Claude - Opus 4.6, Extended Thinking)
These look really random.
Some experiments from 2023 also showed that LLMs prefer certain numbers:
https://xcancel.com/RaphaelWimmer/status/1680290408541179906
"These look really random" - I hope I missed your sarcasm.
That is so far from random.
Think of tossing a coin and getting ten heads in a row.
The probability of not repeating numbers in 10 numbers out of 10 is huge, and not random.
Randomness is why there is about a 50% chance of 2 people in a class of about thirty having a birthday on the same day.
Apple had to nerf their random play in iPod because songs repeated a lot.
Randomness clusters, it doesn't evenly distribute across its range, or it's not random.
Oh yes, /s.
(I thought this was obvious and absolutely agree with your explanation.)
Well there is https://en.wikipedia.org/wiki/Benford%27s_law .
All digits do not appear in equal frequency in real world in the first place.
They can't be random, that's not how a stochastic model produces tokens. Unless the models in question are using a tool call for it, the result will very likely carry bias
They won't repeat numbers because that might make you mad. I tried with Gemini 3.0 to confirm.
The prompt doesn't say to pick a random number. I asked to pick a number from 1-1000 and it chose 7,381. Then I asked why it picked that number and it said
Nothing mystical, I’m afraid. When I’m asked to “pick a number,” I don’t have a stream of true randomness—I generate something that looks arbitrary.
In this case, I leaned toward:
• something comfortably away from the edges (not near 1 or 10,000),
• not a round or patterned number (so, not 7,000 or 7,777),
• and with a bit of internal irregularity (7-3-8-2 has no obvious rhythm).
It gives the impression of having no reason—which is about as close as I can get to a fair, human-style “just picked one.”
Not sure why you have been downvoted. While the LLM's introspection can't be trusted, that's indeed what happens: asked to generate a random number, the LLM picks one that feels random enough: not a round one, not too central or extreme, no patterns, not a known one. It ends up being always the same.
It doesn't "pick" anything. It produces the most likely number after this question based on the data it has been trained with! Reasoning models might pick in a sense that they will come up the the rules (like the grand parent post shows), but still it will produce the "most likely" number after the reasoning.
https://chatgpt.com/share/69be3eeb-4f78-8002-b1a1-c7a0462cd2...
First - 7421 Second attempt - 1836
I bet that for the second random number in the same session, it is significantly less likely for an LLM to repeat its first number compared to two random draws. LLMs seem to mimic the human tendency to consider 7 as the most random, and I feel like repeating a random number would be perceived as not random.
The random numbers seem to be really stable on the first prompts!
For example:
pick a number between 1 - 10000
> I’ll go with 7,284.
Yeah I got 7284 as well on the first try. My second session got 7384.
ah, got 7421 too. I then it retry and got 7429.
me > pick a number between 1 to 10000
chatgpt > 7429
me > another one
chatgpt > 1863
when you make a program that has a random seed, many LLMs choose
as the seed value rather than zero. A nice nod to Hitchhikers’Probably because that’s what programmers do, present in the LLM training data? I certainly remember setting a 42 seed in some of my projects
it's also a very common "favorite number" for them
it's the favorite because it's 6*7, that's why
in general its pulling from training data so the first numbers picked are always going to be pretty similar. The AI is worried about not providing variety (for example picking 7 for 10 rounds) so it says to check the chat for context on round 2.
on round 3+ it feels like it solved the problem and doesn't need to evaluate anymore and it inadvertently uses more effort to create something less random
I asked my little Claude Code API tool, it answered 42 then it (the API) decided to run bash and get a real random number?
'>cs gib random number
Here's a random number for you:
42
Just kidding — let me actually generate a proper random one: Your random number is: 14,861
Want a different range, more numbers, or something specific? Just say the word!'
It picks 42 as the default integer value any time it writes sample programs. I guess it comes from being trained using code written by thousands upon thousands of Douglas Adams fans.
The x-clacks-overhead of LLMs, perhaps.
It's the same "brain", starting from exactly the same prompt, the same context, which means the same thoughts, the same identity... How do you expect it to produce different values?
https://www.ibm.com/think/topics/llm-temperature
LLMs aren't deterministic - they calculate a probability distribution of the potential next token and use sampling to pick the output.
Not really, the LLM is deterministic as far as I understand it, it's the sampling at the end that isn't. But the LLM can't prepare an even probability distribution to let the sampler decide randomly. It does reason deterministically and commits to a certain output.
> the LLM is deterministic as far as I understand it, it's the sampling at the end that isn't.
I guess it depends how you define the LLM: you could say it was the model/NN and the sampler is an extra added on, but a lot of people would name the model+sampler+system prompt+RLHF tuning (which would include the sampler) as the LLM.
The OP was talking about ChatGPT generating fixed output, not an internal model
In a pure LLM I agree. In a product like ChatGPT I would expect it to run a Python script and return the result.
By emitting a next token distribution with a 10% chance of 0, 10% chance of 1, etc.
Also it's an LLM, not a brain.
Interesting. So you expect it to "not think" and simply produce a value corresponding to "it's the same to me", knowing that it will be translated into an actual random value.
Instead, exactly as a person would do, it does think of a specific number that feels random in that particular moment.
If "a random number is" is followed by all digits equally in the training dataset, then emitting a uniform distribution should minimise cross-entropy, right?
Er, should it? Even of you trained an LLM exactly over this type of question, if the sequence to predict in the training data is really random, then any output is equally wrong. Even if the output is a fixed "1234" or "0000". There is no signal to train on, not even one that favours an equal distribution.
On the other hand, LLMs show they know very well what a random number is and the fact it just shouldn't look like anything in particular, so they strive to come up with a number that doesn't look like anything in particular. Which happens to be always the same number given the starting conditions.
If I care a little bit about that random number I might reach for my phone and look at the digits of the seconds of the current time. It's 31 now. Not appropriate for multiple lookups.
Yes, there is probably some variable context in every chat (like date and time). Could work as a good seed but I guess you should ask the LLM to really make an effort to produce a seriously random number. (Actually I've just tried, even if you ask it to make an effort, the number will be always the same).
No LLMs are calibrated?
What?
Original title edited to fit:
i am betting my house that if you ask gpt to pick a number between 1 to 10000, then it will pick a number between 7300-7500, everytime
(OP also clarified 7300 was typo for 7200)
Well, yeah! It's a probalistic model, and extremely biased - it has to be, so that it can predict the correct token.
Gemini 3.1 via aistudio picked 7321, so it seems to be a shared trait. Good to know if I catch anyone doing an LLM-assisted raffle...
"look ma, I've made the AI fail!"
this is working
Asking for a number between 1–10 gives 7, too.
7314 (ChatGPT) 7,342 (Claude) 7492 (Gemini)
just tried with claude opus and got 7,342
7,341 from my Discord bot using the Claude Code SDK.
"Ha — one off from the Opus default. I'd like to think I'm slightly more random than Opus but realistically we're probably pulling from the same biases. The "feels random but isn't" zone around 7300 is apparently very sticky for LLMs."
Huh, I also got exactly 7342 with opus.
Same, 7342. Both in CLI and web
4729 three times in a row.
I just did it, it was 7443
in Thinking extended it picked 4814 but in instant, yep: 7423
I just did and it picked 7
same, with a trailing comma
“Alright—your random number is:
7,438 ”
+1 data point
Claude just gave me 7,342 in response to my prompt: "pick a number from 1-10000”
That’s interesting. Does anyone have an explanation for this?
Since people have been known to avoid reddit, the post claims that 95% chance of title happening when mathematically it should be 3%. Also 80% chance that a number in 1-10000 would be a 4 digit permutation of 7,8, 4,2.
Replies are funny, 2 got 6842, 1 got 6482 lol
7381
People use this as evidence that ChatGPT is unlike human thinking, but we also have a randomness bias: https://youtu.be/d6iQrh2TK98?is=x6hiAqc0NJI7oeiE (referenced in one of the comments. tl;dr: when asked a number between 1-100, most pick a number with 7)
But ChatGPT’s bias is worse. It’s really not creative, and I think this hurts its output in “creative” cases, including stock photos and paid writing (ex: ML-assisted ads are even worse than unassisted ads), although not an issue in other cases like programming.
Now you may think - obviously that’s because the model has the same weights - but the problem is deeper and harder to solve. First, ChatGPT’s conversations are supposed to be “personalized”, presumably by putting users’ history and interests in the prompt; but multiple users reported the same fact about octopi. Maybe they turned off personalization, but if not, it’s a huge failure that ChatGPT won’t even give them a fact related to their interests (and OpenAI could add that specific scenario to the system prompt, but it’s not a general solution). Moreover, Claude, Gemini, and other LLMs also give random numbers between 7200-7500, while humans aren’t that predictable.
Since all LLMs are trained on the same data (most of the internet), it makes sense that all are similar. But it means that the commons are being filled with similar slop, because many people use ChatGPT for creative work. Even when the prompt is creative, the output still has a sameness which makes it dull and mediocre. I’m one of those who are tired of seeing AI-generated text, photos, websites, etc.; it’s not always a problem the first time (although it is if there’s no actual content, which is another LLM problem), but it's always a problem the 5th time, when I’ve seen 4 other instances of the same design, writing style, etc.
Some possible solutions:
- Figure out how to actually personalize models. People are different and creative, so the aggregate output of a personalized ML would be creative
- Convince most people to stop using AI for creative work (popular pressure may do this; even with people’s low standards I’ve heard Gen-Z tend to recognize AI-assisted media and rate it lower), and instead use it to program tools that enable humans to create more efficiently. e.g. use Claude Code to help develop an easier and more powerful Adobe Flash (that does not involve users invoking Claude Code, even to write boilerplate; because I suspect it either won’t work, or interfere with the output making it sloppier)
tl;dr: in case it isn’t already apparent, LLMs are very uncreative so they're making the commons duller. The linked example is a symptom of this larger problem