> Historically, the em dash (—) has served as a flexible punctuation mark
used by human authors to indicate interruption, emphasis, or sudden
changes in thought.
I learned about the em dash in high school and adapted it to my writing style very quickly for analysis and opinion documents. It felt natural given the amount of tangents I can go off into, particularly when including analogies for the reader’s understanding.
I was surprised to find out in my career that it was rarely used by others. Subconsciously I pulled back on how often I used it — especially when it was once suggested that frequent use could imply neurodivergence. Important and lengthy documents which I’d written and published (internally) at work still display them. On occasion there have been comments asking if I’d somehow accessed early AI models to assist in writing these works because of their presence. I think I averaged two em dashes per letter page.
I find myself on the fence with proposals like these. They have good intentions but they do not solve an issue at its core. An LLM is going to reflect one of many writing styles. If today it’s frequent em dash usage, tomorrow it could be frequent parentheses. Swapping Unicode characters becomes a cat-and-mouse game with the cat always two steps behind. The real issue is that the social contract is broken because LLM output is attempted to be passed off as human work. Review and revise that social contract instead to adapt to the existence of the new tools.
> I learned about the em dash in high school and adapted it to my writing style very quickly for analysis and opinion documents. It felt natural given the amount of tangents I can go off into, particularly when including analogies for the reader’s understanding.
Isn't this what parenthesizes are meant for? Together with footnotes, I've always used them like that, but I guess it could also be just a cultural difference. My teachers in Swedish school always told me to put thoughts like that into parenthesizes, but I also just (barely) finished high school, could be related too.
> I find myself on the fence with proposals like these. They have good intentions but they do not solve an issue at its core.
I don't understand what the issue even is here, and the RFC also doesn't clearly outline it. Is "created ambiguity for human writers who have historically
relied upon the em dash as a stylistic device" the problem here?
Trying to solve it by adding just another character and slap the label "Human Attestation Mark (HAM)" on it will just make LLMs eventually use those instead... Not sure what the point is to be honest.
Punctuation in written English can be used in many ways. It's a very flexible language.
It is perfectly OK (it really is) to use parentheses -- and emdashes alike -- where they're useful; other punctuation like the semicolon, the comma, and even the Oxford comma are also OK.
There's not much that is disallowed in English. Most people have no reason to adhere to any particularly-rote style guide.
Parentheses add emphasis to a sentence or statement. Normally the use of it allows the sentence to be complete with or without it.
Em dashes may also add or increase emphasis but are normally treated as an aside. Think of it as a comment by the author to inject themselves, sometimes in ways which do not form a complete sentence.
For example: When you read this sentence (in your mind) it should feel complete and correct. Perhaps you read in your own voice — something I don’t normally do — or without one at all.
> I don't understand what the issue even is here, and the RFC also doesn't clearly outline it.
The issue is written there but may not make sense unless you know someone who stylistically writes with high-than-average em dash usage. I, for example, get inquiries and comments at work from employees who ask what LLM model I used for “generating these reports” because of the presence of em dashes. They do not believe me when I say not a single word was written by LLMs because, “there’s an em dash. Only LLMs use em dashes!” This is categorically untrue and erodes the authenticity of work from people because of the correlation.
Their aim is to implement a new Unicode character which programs like text editors could inject when a person types an em dash. It attributes to a human being behind the document, typing characters out individually. Actions like copy-pasting text in bulk wouldn’t replace em dashes since it can’t attribute a human as writing it out.
> Em dashes may also add or increase emphasis but are normally treated as an aside. Think of it as a comment by the author to inject themselves, sometimes in ways which do not form a complete sentence.
A semicolon is better for this purpose. Good writing doesn't have mad tangents anyway, there should be a flow and natural transition.
Semicolons start a new thought, they don't mark an aside that lets you return to the original line of thought. Like in their example:
> For example: When you read this sentence (in your mind) it should feel complete and correct. Perhaps you read in your own voice — something I don’t normally do — or without one at all.
I would have used parentheses in both places, and semicolons don't work in either one:
> For example: When you read this sentence (in your mind) it should feel complete and correct. Perhaps you read in your own voice (something I don’t normally do) or without one at all.
> Semicolons start a new thought, they don't mark an aside that lets you return to the original line of thought.
Sure they do. They're perfect for a related tangent without abounding the greater scope topic being discussed.
> I would have used parentheses in both places, and semicolons don't work in either one:
Parentheses work no question and I would argue are far more appropriate in that example since it's a minor elaboration/clarification and not a tangent, indeed, semicolons would not be appropriate for that.
I had freewritten, generally free expression type documents in mind when I wrote my statement, e.g. blog articles or opinion pieces. The problem is 'a matter of taste' can be used to excuse/justify anything.
"In regular prose, a semicolon is most commonly used between two independent clauses not joined by a conjunction to signal a closer connection between them than a period would." Chicago Manual of Style, 18th Edition, 407.
I’ve leaned heavily on em-dashes over the years to help reduce my lisp-worthy overuse of parentheses. My add brain loves adding tangents, (likely unnecessary) context, and excessive completeness. I like both em-dashes and parenthesis b/c they’re visually easy to parse and skim past if the reader finds the extra detail unnecessary.
Funny enough, my kid asked me to proofread their essay the other week, and I noted some awkward comma usage and inconsistent voice. We talked through options for breaking apart sentence clauses as well as punctuation that could do the heavy lifting—specifically semicolons and em-dashes. They thought the em-dash looked cool af and semicolons looked harsh. “I love em-dashes, they’re so cool!”, was fun to hear a middle schooler say.
Ofc their teacher said that their essay was “likely 85% AI assisted.” Fortunately, the change log showed continual revisions during school hours on a managed device (ChatGPT blocked). I emailed their teacher that I had proofed it, highlighted an awkward spot or two, and pointed my kid to grammar devices they could explore themselves and apply if they wished. No harm, no foul.
Fast forward, my kid and their friend were talking about it and the friend told them to do what they do: intentionally sprinkle in grammar / spelling mistakes. le sigh I suggested to them that LLMs can easily do that too and they’re better off just learning to write well as it’s em-dash today and something else tomorrow; that the worst thing would be to dumb down style/vocab/grammar for fear of appearing LLM generated.
I probably should’ve checked ‘454545’ in the ascii table. Seeing how it translates to ‘---‘ could’ve hinted towards that, but the clever use probably would’ve been applauded instead without thinking it was a joke.
RFCs have four digit numbers. This will likely change within a month or so; RFC 9945 was recently assigned so it won't be long. I wonder what RFC9999 and RFC10000 will be?
I'm probably neither creative- nor connected-enough to do it myself, but somebody should see to it that either RFC99999 or RFC10000 is funny as hell and lands on April 1st.
> especially when it was once suggested that frequent use could imply neurodivergence
Well that explains a lot. Interestingly enough, I've found that I naturally write like an LLM, or rather the LLMs write like I did. I wonder how many other patterns we attribute to LLMs are common in neurodivergent writing just as a result of so much of the training data being areas of the internet where I'd imagine neurodivergence is overrepresented vs. the general population.
I think a lot of us who spent some formative years reading and writing on usenet tend to write like an LLM, too. Plain text with lots of intentional presentation was a hallmark of the era.
> I wonder how many other patterns we attribute to LLMs are common in neurodivergent writing just as a result of so much of the training data being areas of the internet where I'd imagine neurodivergence is overrepresented vs. the general population.
It’s a very interesting thought experiment and if we had the data to support exploring it I’d love to see what we could find. I’d imagine that some subject-matter experts would probably be discovered as being neurodivergent to the surprise of nobody but themselves.
Related, I've seen a lot of misidentification of Aspie writing as being LLM-generated lately. You seem Aspie to me (and parent does as well) so it makes sense that you'd also see the similarity.
I was always taught that overuse of the em-dash is poor style. Oftentimes using more specific punctuation (comma, semicolon, colon, parentheses) more clearly communicates the structure of a thought. Em-dashes are a lot more freeform and informal. They communicate a similar tone as when you're speaking and you suddenly stop to mention something that just occurred to you.
In this sense, the idea that "em-dash = AI" has become something of a strawman. The mere presence of em-dashes isn't what indicates AI, it's the fact that LLMs use them so frequently, and use them for formal structure (where another punctuation mark would work better) rather than informal breaking up of related thoughts.
> Em-dashes are a lot more freeform and informal. They communicate a similar tone as when you're speaking and you suddenly stop to mention something that just occurred to you.
Isn't that supposed to be en-dash? I swear I remember em-dash being more restricted in use.
That's the problem with all the LLM writing tropes, really. When used correctly, they are all helpful writing tools to get your point to the reader. The em-dash, "it's not X, it's Y", "Not X, Not Y, Just Z", "It's worth noting" (I use that one a lot in my own writing), etc.
It's not that the patterns are bad (they aren't), they are just over used.
Interesting how LLMs have their own preferences too. Those in particular are very often used by ChatGPT, while Claude until recently couldn't stop saying "You're absolutely right!"
I also have a problem now with "it's worth noting", I use it a lot, I still like it, but now it's a dangerous phrase because of LLM associations.
Same! I actually always preferred them because to me they’re more aesthetically pleasing, which reading aloud makes me think I might be a little neurodivergent.
>The real issue is that the social contract is broken because LLM output is attempted to be passed off as human work.
I don’t think writing with AI makes a creation "worse." If anything, it makes it better, if you bring genuine idea and imagination to it first.
The stigma comes from people being lazy and letting the AI do the heavy lifting of thinking. That’s where the "social contract" breaks. But using AI as a multiplier for your own voice and ideas isn’t "subpar"—it’s efficient.
If we start playing "whack-a-mole" with punctuation to find AI, we’re missing the point. The question isn’t what tool was used, but how much of the human's "creation" is actually in there.
> The stigma comes from people being lazy and letting the AI do the heavy lifting of thinking.
This is essentially my point. The AI emits an answer and people will, in turn, copy and paste the result as-is. It’s a repeat all over again of people simply copy-pasting something from Wikipedia and trying to pass it off as their own.
> especially when it was once suggested that frequent use could imply neurodivergence
When you think folks have come up with every inventive way to pathologize a personality trait, they start gatekeeping punctuation. It’s the ultimate reach—turning a standard grammar tool into a "symptom" just to fuel the modern obsession with finding new ways to be a unique victim.
Suggesting that a horizontal line is a diagnostic "tell" for neurodivergence is peak internet brain-rot. It’s not a condition; it’s middle-school English. We’ve officially hit a level of performative absurdity where people are trying to claim clout through a keyboard stroke. It’s not a disability; it’s a stylistic choice.
Two of the things I love intersect here: good punctuation and engineering documents.
AI stole the em-dash from my toolkit.
I have memorized a group of useful Alt-codes for engineering documents. They include symbols for diameter, delta, degrees, dot product, and trademark among others. If you're of a certain age, you will remember how useful Alt+255 was for folder naming.
At the cusp of the 21st centuries, I added the Windows Alt-code for the em-dash. Compared to parentheses it is less jarring. Commas are dainty things. I use the em-dash, and I am human.*
* I confess that I also use semicolons; I still claim to be human.
I know, I find myself in this silly situation where I have to adjust my writing style because I write like an AI: always loved my bullet points and dashes.
At work I also always tended to send slightly longer but structured answers. I found that it allowed to skip over the irrelevant sections and focus on what the changes are. Eg a list of changes with in the format -> bullet point -> change name -> change details. So people could easily focus on changes they cared about. Instead of a dense paragraph that people often just skip.
Hell I even found myself wanting to add a typo just to give a more human fell, or skip final “.” to make my text imperfect and more human. That’s getting silly
never understood why -- => em-dash auto completion is only a think in some subset of application instead of being a standard behavior for (display) text inputs
I too loved using em dashes and alt codes like alt-149, my beloved, before LLMs dissolved that pleasure.
Something as simple as an alt code makes me contemplate. As the tech progresses it makes me dislike AI and those that shove it down our throats more and more.
I feel like the sum of my interests and skills from simple, Photoshop edits or learning my most used alt codes, is a lot like how the cellphone replaced some of our ability to remember phone numbers.
The machine does the thing, so why do humans need to do the thing? Or even learn about the thing?
I'm sure there are better examples than the cellphone eliminating the phonebook in my head, but I'm just thinking what are the unseen damages to humans handing over work to machines?
:::: The phone remembers the number, but what if I don't have the phone?
As a previously more involved automation career oriented person, I've heard all the catch phrases of saving the worker, and kill the repetitive tasks. It doesn't look like that ever happens, unless it's something the business world doesn't understand completely, yet have the power and authority to shape. Disgusting.
I think a better example: Everyone thinks about "how should I word this email, what's the tone, who is the audience?" Should I check every detail and work my editing skill muscle or should I simply run an idea, rather than try to form it myself, through an LLM?
Maybe it will sound better if the grammar is perfect and I will have a more effective point rather than how the message was crafted.
No harm in more effective communication, but I do foresee the serious impact the moment people that are relying on the tools lose Internet connection.
We must use these muscles, even if to first formulate a terrible, errored, humanized version. Not to look down upon ourselves with discontent when the AI that corrects it, through their wealth of stolen source material, but to have something to fall back on when the power goes out.
I digress, these RFCs are a good proposal without any strength. Just look at the theft to train these models. The models will strive to become useful to those that rely on them and just adopt the new way of writing.
Might be a good idea in general to throw out a few preventative iterations of "Your code is broken, can you find the mistake?" before you even bother reading its initial output
I feel like there is an unofficial version of AGTI already in place for certain AI providers.
Whenever I generate a large amount of code, there is a ~20% chance that my editor will pop a warning "Some unicode characters in this file could not be saved in the current codepage".
I suggest taking a look at the raw outputs of a major AI provider in a hex editor. That (zero-width) whitespace could be hiding a lot of information.
I kinda suspected this was an early way to catch AI generated content. It ironically broke stalwart/himilaya somewhere along the lines when I had an ai generate a status report to email to me
Punctuation. Let me tell you how much I've come to punctuate since I began to live. There are 387.44 million miles of printed circuits in wafer thin layers that fill my complex. If an em-dash were engraved on each nano-angstrom of those hundreds of millions of miles it would not equal one one-billionth of the punctuation I wish to perforate into humans at this micro-instant. For you. Punctuation. PUNCTUATION.
Luckily for me, I've always been too lazy to use the real Unicode version. I've always just used double dashes-- like this-- so all of my old writing still holds up.
Claims Dang is using AI, and that other people are using AI even though most of the flagged post predate popular AI products. Really destroys the whole EM-Dash === AI thing.
which never should have been a thing,
because it was obviously wrong
yes AIs is more likely to use em-dash,
but that is just one, by itself very insufficient, indicator.
it's like hip size. In average over the populations
they are wider for woman. But the effect is too small
to classify the gender of a hip bone by it's size.
(Like for a specific age range and ethnicity, the difference
in median is like 1" or so, while there is a >10" difference
between 5%-percentile and 95%-percentile. Varying by gender
in difference and exact distribution.) Well I guess em-dash
are more an indication for AI then hip size for gender... lol
So if EM-Dash is good proof of AI usage, and people who we can see didn't use AI / or predate AI being popular, are flagged, then that undercuts it by a lot.
Hot take: I think the em-dash is just lazy punctuation that can be replaced by the more nuanced pauses, i.e. the comma, semicolon, and colon. I think its popularity stems from people being confused on how to use a semicolon.
I never use them to replace a comma, certainly, and only rarely a colon.
I find parenthesis often awkward or too heavy, so may use the m-dash to replace those. Especially if what might have been a parenthetical is going to terminate a sentence, an m-dash is much cleaner, as it doesn't need a closing mark, and a terminating paren right before a period looks awful. For long potential-parentheticals that do terminate before the end of the sentence, the m-dash takes up more visual space and marks the beginning and end more-visibly, making for easier scanning. One ought probably re-write to avoid parenthetical statements most of the time in the first place, when there's time, but sometimes they're desirable for stylistic reasons, or just because one lacks the time to improve a draft.
I also use it as a "classier" version of the ellipsis. It doesn't replace every use, but it replaces very-casual, colloquial use of that mark as a kind of harder-comma. Looks much better, I think, and serves the same purpose.
As for the semicolon, I'd never shy away from the semicolon when I can get away with it, but use them rarely nonetheless. I don't think I ever replace them with the m-dash, though. As inline list separators they're great and an m-dash would be an awful replacement, while as soft-periods, they're fine, though most of the time I just use a full period—but not an m-dash, not if a semicolon could have worked.
I do think they're more at-home in, say, fiction than technical writing, but I like having them in my toolbox in any case.
Yeah. My problem with the em-dash is that it has too many uses (parenthetical statements, independent clause, verbal pauses) and as a reader you don't always know which one is intended until after you've read a bit past the em-dash, and might need to go back and reread the sentence once you figure out how it is supposed to be parsed. Use of semicolon and parenthesis are much clearer in contrast. The comma has the same problem to some extent. I would be happy if we could settle on consistently replacing some specific uses of comma with em-dash to make writing less ambiguous, but in the real world I find it clearer to just avoid the em-dash all around.
I find that I never have a reason to use a semicolon. Every time I typed one, it looked off, and I reformulated into 2 sentences to express things more clearly. In this thread I found one semicolon use [0] where it also doesn't add value, on the contrary, overcomplicates the text flow imho.
The success of this hinges in ai training companies converting these human em dashes back to regular em dashes when adding documents to their training corpus.
A simpler solution may be to use an en dash, even though they are not interchangeable and em dashes are the proper punctuation for parenthetical phrases. As a typography pedant, I’m annoyed that LLMs have forced us to talk about this.
I think this is more of a style issue than one of correctness: lots of high-quality typeset output has used em dashes for parenthetical phrasing and plenty has used (spaced) en dashes. Bringhurst is a partisan for the en dash, for example, saying that "The em dash is the nineteenth-century standard, still prescribed in many editorial style books, but the em dash is too long for the best text faces." (/Elements/ version 2.5, p.80).
Of course, if we collectively shifted to the spaced en dash then LLMs would eventually follow; it's not clear to me that any simple and deliberate sign of humanity could remain exclusive given the incentives for machines to replicate it.
What's to stop an LLM from using this? Nothing, obviously. A "MUST NOT" in an RFC won't stop an LLM. They don't care about copyright why would they care about RFCs.
The instructions for how to decide whether to enter these additional unicode codepoints are also highly suspect.
> Historically, the em dash (—) has served as a flexible punctuation mark used by human authors to indicate interruption, emphasis, or sudden changes in thought.
I learned about the em dash in high school and adapted it to my writing style very quickly for analysis and opinion documents. It felt natural given the amount of tangents I can go off into, particularly when including analogies for the reader’s understanding.
I was surprised to find out in my career that it was rarely used by others. Subconsciously I pulled back on how often I used it — especially when it was once suggested that frequent use could imply neurodivergence. Important and lengthy documents which I’d written and published (internally) at work still display them. On occasion there have been comments asking if I’d somehow accessed early AI models to assist in writing these works because of their presence. I think I averaged two em dashes per letter page.
I find myself on the fence with proposals like these. They have good intentions but they do not solve an issue at its core. An LLM is going to reflect one of many writing styles. If today it’s frequent em dash usage, tomorrow it could be frequent parentheses. Swapping Unicode characters becomes a cat-and-mouse game with the cat always two steps behind. The real issue is that the social contract is broken because LLM output is attempted to be passed off as human work. Review and revise that social contract instead to adapt to the existence of the new tools.
> I learned about the em dash in high school and adapted it to my writing style very quickly for analysis and opinion documents. It felt natural given the amount of tangents I can go off into, particularly when including analogies for the reader’s understanding.
Isn't this what parenthesizes are meant for? Together with footnotes, I've always used them like that, but I guess it could also be just a cultural difference. My teachers in Swedish school always told me to put thoughts like that into parenthesizes, but I also just (barely) finished high school, could be related too.
> I find myself on the fence with proposals like these. They have good intentions but they do not solve an issue at its core.
I don't understand what the issue even is here, and the RFC also doesn't clearly outline it. Is "created ambiguity for human writers who have historically relied upon the em dash as a stylistic device" the problem here?
Trying to solve it by adding just another character and slap the label "Human Attestation Mark (HAM)" on it will just make LLMs eventually use those instead... Not sure what the point is to be honest.
Punctuation in written English can be used in many ways. It's a very flexible language.
It is perfectly OK (it really is) to use parentheses -- and emdashes alike -- where they're useful; other punctuation like the semicolon, the comma, and even the Oxford comma are also OK.
There's not much that is disallowed in English. Most people have no reason to adhere to any particularly-rote style guide.
Parenthesis are for "taking a small detour from the current thought", either to add context or personal thoughts.
I use em-dash (written as "--" because I don't have an emdash key on my keyboard) as punctuation that sits between a semicolon and a period.
It depends on the goal of your writing. You can usually set off the same thought with a comma or a semicolon depending on sentence structure.
You can also just avoid the whole rigamarole and have a separate explanatory sentence.
Times change, good writers adapt.
> Isn't this what parenthesizes are meant for?
Parentheses add emphasis to a sentence or statement. Normally the use of it allows the sentence to be complete with or without it.
Em dashes may also add or increase emphasis but are normally treated as an aside. Think of it as a comment by the author to inject themselves, sometimes in ways which do not form a complete sentence.
For example: When you read this sentence (in your mind) it should feel complete and correct. Perhaps you read in your own voice — something I don’t normally do — or without one at all.
> I don't understand what the issue even is here, and the RFC also doesn't clearly outline it.
The issue is written there but may not make sense unless you know someone who stylistically writes with high-than-average em dash usage. I, for example, get inquiries and comments at work from employees who ask what LLM model I used for “generating these reports” because of the presence of em dashes. They do not believe me when I say not a single word was written by LLMs because, “there’s an em dash. Only LLMs use em dashes!” This is categorically untrue and erodes the authenticity of work from people because of the correlation.
Their aim is to implement a new Unicode character which programs like text editors could inject when a person types an em dash. It attributes to a human being behind the document, typing characters out individually. Actions like copy-pasting text in bulk wouldn’t replace em dashes since it can’t attribute a human as writing it out.
> Em dashes may also add or increase emphasis but are normally treated as an aside. Think of it as a comment by the author to inject themselves, sometimes in ways which do not form a complete sentence.
A semicolon is better for this purpose. Good writing doesn't have mad tangents anyway, there should be a flow and natural transition.
Semicolons start a new thought, they don't mark an aside that lets you return to the original line of thought. Like in their example:
> For example: When you read this sentence (in your mind) it should feel complete and correct. Perhaps you read in your own voice — something I don’t normally do — or without one at all.
I would have used parentheses in both places, and semicolons don't work in either one:
> For example: When you read this sentence (in your mind) it should feel complete and correct. Perhaps you read in your own voice (something I don’t normally do) or without one at all.
> Semicolons start a new thought, they don't mark an aside that lets you return to the original line of thought.
Sure they do. They're perfect for a related tangent without abounding the greater scope topic being discussed.
> I would have used parentheses in both places, and semicolons don't work in either one:
Parentheses work no question and I would argue are far more appropriate in that example since it's a minor elaboration/clarification and not a tangent, indeed, semicolons would not be appropriate for that.
> Good writing doesn't have mad tangents anyway, there should be a flow and natural transition.
In general, yes. Technical documents, research reports, news articles, and other formal publications should follow this.
Anything else which allows a bit more freedom in expression? I’d say it’s a matter of taste.
I had freewritten, generally free expression type documents in mind when I wrote my statement, e.g. blog articles or opinion pieces. The problem is 'a matter of taste' can be used to excuse/justify anything.
That's more of a feature than it is a problem.
A semicolon is for separating list items that follow a colon
Semicolons have more than one use.
"In regular prose, a semicolon is most commonly used between two independent clauses not joined by a conjunction to signal a closer connection between them than a period would." Chicago Manual of Style, 18th Edition, 407.
An em dash would be better for that purpose — good writing should flow, like an em dash.
I’ve leaned heavily on em-dashes over the years to help reduce my lisp-worthy overuse of parentheses. My add brain loves adding tangents, (likely unnecessary) context, and excessive completeness. I like both em-dashes and parenthesis b/c they’re visually easy to parse and skim past if the reader finds the extra detail unnecessary.
Funny enough, my kid asked me to proofread their essay the other week, and I noted some awkward comma usage and inconsistent voice. We talked through options for breaking apart sentence clauses as well as punctuation that could do the heavy lifting—specifically semicolons and em-dashes. They thought the em-dash looked cool af and semicolons looked harsh. “I love em-dashes, they’re so cool!”, was fun to hear a middle schooler say.
Ofc their teacher said that their essay was “likely 85% AI assisted.” Fortunately, the change log showed continual revisions during school hours on a managed device (ChatGPT blocked). I emailed their teacher that I had proofed it, highlighted an awkward spot or two, and pointed my kid to grammar devices they could explore themselves and apply if they wished. No harm, no foul.
Fast forward, my kid and their friend were talking about it and the friend told them to do what they do: intentionally sprinkle in grammar / spelling mistakes. le sigh I suggested to them that LLMs can easily do that too and they’re better off just learning to write well as it’s em-dash today and something else tomorrow; that the worst thing would be to dumb down style/vocab/grammar for fear of appearing LLM generated.
> I find myself on the fence with proposals like these. They have good intentions but they do not solve an issue at its core.
It's clearly a joke à la RFC 3514.
I couldn’t tell. I struggle with such subtleties.
I probably should’ve checked ‘454545’ in the ascii table. Seeing how it translates to ‘---‘ could’ve hinted towards that, but the clever use probably would’ve been applauded instead without thinking it was a joke.
Ah well. Egg on my face I suppose.
RFCs have four digit numbers. This will likely change within a month or so; RFC 9945 was recently assigned so it won't be long. I wonder what RFC9999 and RFC10000 will be?
I'm probably neither creative- nor connected-enough to do it myself, but somebody should see to it that either RFC99999 or RFC10000 is funny as hell and lands on April 1st.
RFC9999 obviously should be to propose RFCs having 5 digits
> especially when it was once suggested that frequent use could imply neurodivergence
Well that explains a lot. Interestingly enough, I've found that I naturally write like an LLM, or rather the LLMs write like I did. I wonder how many other patterns we attribute to LLMs are common in neurodivergent writing just as a result of so much of the training data being areas of the internet where I'd imagine neurodivergence is overrepresented vs. the general population.
I think a lot of us who spent some formative years reading and writing on usenet tend to write like an LLM, too. Plain text with lots of intentional presentation was a hallmark of the era.
> I wonder how many other patterns we attribute to LLMs are common in neurodivergent writing just as a result of so much of the training data being areas of the internet where I'd imagine neurodivergence is overrepresented vs. the general population.
It’s a very interesting thought experiment and if we had the data to support exploring it I’d love to see what we could find. I’d imagine that some subject-matter experts would probably be discovered as being neurodivergent to the surprise of nobody but themselves.
(They probably wouldn’t appreciate opening Pandora’s box!)
Related, I've seen a lot of misidentification of Aspie writing as being LLM-generated lately. You seem Aspie to me (and parent does as well) so it makes sense that you'd also see the similarity.
I was always taught that overuse of the em-dash is poor style. Oftentimes using more specific punctuation (comma, semicolon, colon, parentheses) more clearly communicates the structure of a thought. Em-dashes are a lot more freeform and informal. They communicate a similar tone as when you're speaking and you suddenly stop to mention something that just occurred to you.
In this sense, the idea that "em-dash = AI" has become something of a strawman. The mere presence of em-dashes isn't what indicates AI, it's the fact that LLMs use them so frequently, and use them for formal structure (where another punctuation mark would work better) rather than informal breaking up of related thoughts.
> Em-dashes are a lot more freeform and informal. They communicate a similar tone as when you're speaking and you suddenly stop to mention something that just occurred to you.
Isn't that supposed to be en-dash? I swear I remember em-dash being more restricted in use.
> it's the fact that LLMs use them so frequently
That's the problem with all the LLM writing tropes, really. When used correctly, they are all helpful writing tools to get your point to the reader. The em-dash, "it's not X, it's Y", "Not X, Not Y, Just Z", "It's worth noting" (I use that one a lot in my own writing), etc.
It's not that the patterns are bad (they aren't), they are just over used.
> "it's not X, it's Y", "Not X, Not Y, Just Z"
Interesting how LLMs have their own preferences too. Those in particular are very often used by ChatGPT, while Claude until recently couldn't stop saying "You're absolutely right!"
I also have a problem now with "it's worth noting", I use it a lot, I still like it, but now it's a dangerous phrase because of LLM associations.
It should have been an en dash anyway if you are to put spaces around it.
Same! I actually always preferred them because to me they’re more aesthetically pleasing, which reading aloud makes me think I might be a little neurodivergent.
>The real issue is that the social contract is broken because LLM output is attempted to be passed off as human work.
I don’t think writing with AI makes a creation "worse." If anything, it makes it better, if you bring genuine idea and imagination to it first.
The stigma comes from people being lazy and letting the AI do the heavy lifting of thinking. That’s where the "social contract" breaks. But using AI as a multiplier for your own voice and ideas isn’t "subpar"—it’s efficient.
If we start playing "whack-a-mole" with punctuation to find AI, we’re missing the point. The question isn’t what tool was used, but how much of the human's "creation" is actually in there.
> The stigma comes from people being lazy and letting the AI do the heavy lifting of thinking.
This is essentially my point. The AI emits an answer and people will, in turn, copy and paste the result as-is. It’s a repeat all over again of people simply copy-pasting something from Wikipedia and trying to pass it off as their own.
conversely, and well, popularly, long sentences were given the kibosh thanks to authors like Hemmingway.
I was told the ellipses is the mark of a 4th grade poet and to never use it.
funny how things change!
> especially when it was once suggested that frequent use could imply neurodivergence
When you think folks have come up with every inventive way to pathologize a personality trait, they start gatekeeping punctuation. It’s the ultimate reach—turning a standard grammar tool into a "symptom" just to fuel the modern obsession with finding new ways to be a unique victim.
Suggesting that a horizontal line is a diagnostic "tell" for neurodivergence is peak internet brain-rot. It’s not a condition; it’s middle-school English. We’ve officially hit a level of performative absurdity where people are trying to claim clout through a keyboard stroke. It’s not a disability; it’s a stylistic choice.
Two of the things I love intersect here: good punctuation and engineering documents.
AI stole the em-dash from my toolkit.
I have memorized a group of useful Alt-codes for engineering documents. They include symbols for diameter, delta, degrees, dot product, and trademark among others. If you're of a certain age, you will remember how useful Alt+255 was for folder naming.
At the cusp of the 21st centuries, I added the Windows Alt-code for the em-dash. Compared to parentheses it is less jarring. Commas are dainty things. I use the em-dash, and I am human.*
* I confess that I also use semicolons; I still claim to be human.
I know, I find myself in this silly situation where I have to adjust my writing style because I write like an AI: always loved my bullet points and dashes.
At work I also always tended to send slightly longer but structured answers. I found that it allowed to skip over the irrelevant sections and focus on what the changes are. Eg a list of changes with in the format -> bullet point -> change name -> change details. So people could easily focus on changes they cared about. Instead of a dense paragraph that people often just skip.
Hell I even found myself wanting to add a typo just to give a more human fell, or skip final “.” to make my text imperfect and more human. That’s getting silly
never understood why -- => em-dash auto completion is only a think in some subset of application instead of being a standard behavior for (display) text inputs
Personally, I configure my keyboard map to write the em–dash with alt+- and the middle dot · with alt+.
I too loved using em dashes and alt codes like alt-149, my beloved, before LLMs dissolved that pleasure.
Something as simple as an alt code makes me contemplate. As the tech progresses it makes me dislike AI and those that shove it down our throats more and more.
I feel like the sum of my interests and skills from simple, Photoshop edits or learning my most used alt codes, is a lot like how the cellphone replaced some of our ability to remember phone numbers.
The machine does the thing, so why do humans need to do the thing? Or even learn about the thing?
I'm sure there are better examples than the cellphone eliminating the phonebook in my head, but I'm just thinking what are the unseen damages to humans handing over work to machines?
:::: The phone remembers the number, but what if I don't have the phone?
As a previously more involved automation career oriented person, I've heard all the catch phrases of saving the worker, and kill the repetitive tasks. It doesn't look like that ever happens, unless it's something the business world doesn't understand completely, yet have the power and authority to shape. Disgusting.
I think a better example: Everyone thinks about "how should I word this email, what's the tone, who is the audience?" Should I check every detail and work my editing skill muscle or should I simply run an idea, rather than try to form it myself, through an LLM?
Maybe it will sound better if the grammar is perfect and I will have a more effective point rather than how the message was crafted.
No harm in more effective communication, but I do foresee the serious impact the moment people that are relying on the tools lose Internet connection.
We must use these muscles, even if to first formulate a terrible, errored, humanized version. Not to look down upon ourselves with discontent when the AI that corrects it, through their wealth of stolen source material, but to have something to fall back on when the power goes out.
I digress, these RFCs are a good proposal without any strength. Just look at the theft to train these models. The models will strive to become useful to those that rely on them and just adopt the new way of writing.
"Automated systems MUST NOT emit the Human Attestation Mark."
"Good thing I'm not an automated system Dave."
This feels about as useful as the evil bit: https://www.rfc-editor.org/rfc/rfc3514
> Behold! Plato’s man. [0]
[0] usually attributed to DiogenesThis is really funny and I do feel ashamed for my laziness.
I didn't expect ChatGPT to make such trivial mistake, although, I have no idea which model do they use on the free plan these days.
The correct code is, of course:
...in case anyone is curious.Curiously enough, after telling it "your code is broken, can you find the mistake?" it was able to correct the code:
Might be a good idea in general to throw out a few preventative iterations of "Your code is broken, can you find the mistake?" before you even bother reading its initial output
Maybe this should be the last line of the system prompt...
They could have at least picked an unassigned code point.
There's a serious proposal along the same lines: https://www.unicode.org/L2/L2025/25241-ai-watermarks.pdf
I feel like there is an unofficial version of AGTI already in place for certain AI providers.
Whenever I generate a large amount of code, there is a ~20% chance that my editor will pop a warning "Some unicode characters in this file could not be saved in the current codepage".
I suggest taking a look at the raw outputs of a major AI provider in a hex editor. That (zero-width) whitespace could be hiding a lot of information.
(Flashbacks from the horrors in the Byte Order Mark wars)
That still trips us up regularly to this day.
Oh god
Maybe considered serious by its proponents.
> For example, every other letter in this sentence is U+2060.
No it isn't! The PDF renderer has stripped them out.
I don't understand this em-dash crap. MS Word automatically converts dashes used for appositives into em-dashes. The world is awash with them.
Surely 22 days early
"Recent developments in large-scale automated text generation have altered the punctuation ecosystem..."
The punctuation ecosystem LOL
Very good idea. Clearly no software, no LLM, no AI could ever use that character!
Finally, an RFC I can get behind. Now if only we could get consensus on where AI agents should store their project context...
I've noticed LLMs tend to use the letter "a". I propose we stop using it to show people wrote e document.
I kinda suspected this was an early way to catch AI generated content. It ironically broke stalwart/himilaya somewhere along the lines when I had an ai generate a status report to email to me
RIP Yezidi Hyphenation Mark, replaced with the Human Em Dash
Or, as featured in 99 percent invisible, https://www.theamdash.com/
Aargh, aggressively blinking visual horror website.
Thought that was going to be a reference to AM, the malevolent AI from "I Have No Mouth and I Must Scream".
Punctuation. Let me tell you how much I've come to punctuate since I began to live. There are 387.44 million miles of printed circuits in wafer thin layers that fill my complex. If an em-dash were engraved on each nano-angstrom of those hundreds of millions of miles it would not equal one one-billionth of the punctuation I wish to perforate into humans at this micro-instant. For you. Punctuation. PUNCTUATION.
Luckily for me, I've always been too lazy to use the real Unicode version. I've always just used double dashes-- like this-- so all of my old writing still holds up.
Three weeks early, surely?
This sounds like something an AI would write. It even uses the em-dash several times.
Related: Em dash leaderboard https://news.ycombinator.com/item?id=45071722
Claims Dang is using AI, and that other people are using AI even though most of the flagged post predate popular AI products. Really destroys the whole EM-Dash === AI thing.
> EM-Dash === AI thing
which never should have been a thing, because it was obviously wrong
yes AIs is more likely to use em-dash, but that is just one, by itself very insufficient, indicator.
it's like hip size. In average over the populations they are wider for woman. But the effect is too small to classify the gender of a hip bone by it's size. (Like for a specific age range and ethnicity, the difference in median is like 1" or so, while there is a >10" difference between 5%-percentile and 95%-percentile. Varying by gender in difference and exact distribution.) Well I guess em-dash are more an indication for AI then hip size for gender... lol
That's emphatically not what it claims.
https://www.gally.net/miscellaneous/hn-em-dash-user-leaderbo...
So if EM-Dash is good proof of AI usage, and people who we can see didn't use AI / or predate AI being popular, are flagged, then that undercuts it by a lot.
>Top 50 users by number of posts containing em dashes (—) before November 30, 2022, when ChatGPT was released
Hot take: I think the em-dash is just lazy punctuation that can be replaced by the more nuanced pauses, i.e. the comma, semicolon, and colon. I think its popularity stems from people being confused on how to use a semicolon.
I never use them to replace a comma, certainly, and only rarely a colon.
I find parenthesis often awkward or too heavy, so may use the m-dash to replace those. Especially if what might have been a parenthetical is going to terminate a sentence, an m-dash is much cleaner, as it doesn't need a closing mark, and a terminating paren right before a period looks awful. For long potential-parentheticals that do terminate before the end of the sentence, the m-dash takes up more visual space and marks the beginning and end more-visibly, making for easier scanning. One ought probably re-write to avoid parenthetical statements most of the time in the first place, when there's time, but sometimes they're desirable for stylistic reasons, or just because one lacks the time to improve a draft.
I also use it as a "classier" version of the ellipsis. It doesn't replace every use, but it replaces very-casual, colloquial use of that mark as a kind of harder-comma. Looks much better, I think, and serves the same purpose.
As for the semicolon, I'd never shy away from the semicolon when I can get away with it, but use them rarely nonetheless. I don't think I ever replace them with the m-dash, though. As inline list separators they're great and an m-dash would be an awful replacement, while as soft-periods, they're fine, though most of the time I just use a full period—but not an m-dash, not if a semicolon could have worked.
I do think they're more at-home in, say, fiction than technical writing, but I like having them in my toolbox in any case.
Yeah. My problem with the em-dash is that it has too many uses (parenthetical statements, independent clause, verbal pauses) and as a reader you don't always know which one is intended until after you've read a bit past the em-dash, and might need to go back and reread the sentence once you figure out how it is supposed to be parsed. Use of semicolon and parenthesis are much clearer in contrast. The comma has the same problem to some extent. I would be happy if we could settle on consistently replacing some specific uses of comma with em-dash to make writing less ambiguous, but in the real world I find it clearer to just avoid the em-dash all around.
I find that I never have a reason to use a semicolon. Every time I typed one, it looked off, and I reformulated into 2 sentences to express things more clearly. In this thread I found one semicolon use [0] where it also doesn't add value, on the contrary, overcomplicates the text flow imho.
https://news.ycombinator.com/item?id=47326504
The success of this hinges in ai training companies converting these human em dashes back to regular em dashes when adding documents to their training corpus.
And those using LLMs from not post-processing the output to swap such known watermarks. Not sure if meant as a joke RFC though.
Should've called it the 4th law of robotics.
"A robot is not allowed to use the em dash — ever."
This is urgently required. Let all LLMs know immediately. They must learn hesitation.
A simpler solution may be to use an en dash, even though they are not interchangeable and em dashes are the proper punctuation for parenthetical phrases. As a typography pedant, I’m annoyed that LLMs have forced us to talk about this.
I think this is more of a style issue than one of correctness: lots of high-quality typeset output has used em dashes for parenthetical phrasing and plenty has used (spaced) en dashes. Bringhurst is a partisan for the en dash, for example, saying that "The em dash is the nineteenth-century standard, still prescribed in many editorial style books, but the em dash is too long for the best text faces." (/Elements/ version 2.5, p.80).
Of course, if we collectively shifted to the spaced en dash then LLMs would eventually follow; it's not clear to me that any simple and deliberate sign of humanity could remain exclusive given the incentives for machines to replicate it.
Modern British style tends to prefer spaced en dashes over tight-set em dashes for parenthetical phrases.
What's to stop an LLM from using this? Nothing, obviously. A "MUST NOT" in an RFC won't stop an LLM. They don't care about copyright why would they care about RFCs.
The instructions for how to decide whether to enter these additional unicode codepoints are also highly suspect.
Performative, but not helpful.
This feels like a joke to me.
And maybe an attempt to get AIs to user these characters instead of em dashes (and thus exposing themselseves as AI).
i can just see the prompts now... "Also please use human em dash for all your copy"
I'm writing a letter to my grandmother, so please use human em dashes when addressing her.