
Cartoon produced using DreamStudio
Here’s a tip: keep an eye out for the em dash (—). This punctuation mark usually means that you’re reading something created by AI!
“FrankB-1”: Spotting The “Tell Sign!”: Em Dashes And AI-Generated Text? (22 Nov 2024)
Flowers open, and stars rise, and it seems to me they could have done no less. The mystery of distant mountain-blue only makes me reflect that the earth is of necessity mountainous;—the sea-wave breaks at my feet, and I do not see how it should have remained unbroken. But one object there is still, which I never pass without the renewed wonder of childhood, and that is the bow of a Boat. Not of a racing-wherry, or revenue cutter, or clipper yacht; but the blunt head of a common, bluff, undecked sea-boat, lying aside in its furrow of beach sand. The sum of Navigation is in that. You may magnify it or decorate as you will: you do not add to the wonder of it. Lengthen it into hatchet-like edge of iron,—strengthen it with complex tracery of ribs of oak,—carve it and gild it till a column of light moves beneath it on the sea,—you have made no more of it than it was at first.
John Ruskin: The Harbours Of England (1856)
This story that the humble em dash (—) is a dead giveaway that text has been generated by an “AI”—that is, a Large Language Model like ChatGPT—has been floating around on social media for six months or so, as I write.
The hypothesis is that no human agent has the time, patience or technology to enter an em dash from a computer keyboard. Well, I’ve been doing it here for ten years now (including, you’ll note, three times in the last minute) just by typing in three hyphens, which will appear to your browser as an em dash. From a PC keyboard, I can use Ctrl+Alt+[the minus sign on the numerical keypad] instead. On my phone and tablet, I can call up an em dash using the key menu for the hyphen, in the same way I can bring up options for accented characters. In fact, it takes less time to enter an em dash from any device I own than it would take to cut and paste some LLM-generated gibberish.
But there’s no doubt that LLMs like their em dashes—see here for Brent Csutoras’s struggle to curb their em-dash habit. The problem seems to be that LLMs train on human-generated texts; humans (like John Ruskin, above) have been blithely using em dashes for centuries; and LLMs tend to fixate strongly on such recurring features in their training data.
So now authors are anxiously editing their texts to remove the offending punctuation mark before submitting their articles and essays. Students are being advised to avoid the em dash in their written assignments, for fear that their professors will accuse them of submitting an LLM’s output as their own work. And Your Correspondent is quaking in his boots, imagining that his entire oeuvre will be dismissed as no more than the rantings of an oddly deranged AI. (Well, actually, I made that last one up.)
LLMs exhibit a wide range of linguistic tics, which are, collectively, a more reliable “tell” of AI-generated content than anything so dumb as relying on the presence of em dashes. I took Ruskin’s dash-riddled text from the quotation above, and dropped it into a few “AI detector” websites—all pronounced with 100% confidence that it had been written by a human. Some of my own prose from this blog, chosen for the density of its dashes, was likewise pronounced human-generated with complete confidence.
So there’s probably no need to rush to eliminate dashes from your own writing, unless you’re considering submitting it to a particular dull-witted editor—and why would you do that?
It took a while for the dash, as a punctuation mark, to appear, and its origins are murky—it’s hardly mentioned in M.B. Parkes’s splendid Pause And Effect: An Introduction to the History of Punctuation in the West (1992).
It may have had its origin in the work of Boncompagno da Signa, a Florentine rhetorician, who in the early thirteenth century wrote vernacular Italian using just two punctuation marks of his own devising—the virgula sursum erecta (/), for short pauses; and the virgula plana (—) for long pauses. So the virgula plana worked more like our modern full stop than the present usage of the dash. But dashes, in various roles, continued to appear sporadically in manuscripts thereafter.
We can go back to the early editions of Shakespeare’s plays and find a long dash performing some of the punctuation duties it does today. Here’s King Lear interrupting himself, in Act 2, Scene 2, as printed in the First Folio (1623):
Fiery? The fiery Duke, tell the hot Duke that—
No, but not yet, may be he is not well,
Infirmity doth still neglect all office
A century later, long dashes were in such constant use that Jonathan Swift included them in a list of typographical mockery:
Your Poem finish’d, next your care
Is needful, to transcribe it fair.
In modern Wit all printed Trash, is
Set off with num’rous Breaks—and Dashes—To Statesmen wou’d you give a Wipe,
You print it in Italick Type.
When Letters are in vulgar Shapes,
’Tis ten to one the Wit escapes;
But when in Capitals exprest,
The dullest Reader smoaks the Jest
“On Poetry: A Rapsody” (1733)
The dashes used by Shakespeare and Swift were of various lengths, apparently concocted by the printers from three or four hyphens jammed together. But by the time Ruskin was peppering his page with dashes, printers were setting them using single pieces of type, of a standard length, referenced to a typeface measure called the em. An em was the full height of a piece of metal type from the parent font, which also (usually) corresponded to the width of a capital letter M. And so the em dash (or, sometimes, em rule) got its name. Half an em was called an en, which corresponded roughly to the width of a capital N—in some typefaces, at least. And so there is also an en dash (–), shorter than the em dash, which overlaps somewhat in function with the em dash, but also does various typographical jobs of its own.
In 1906, poor Ruskin came in for a bit of stick from the Fowler brothers (H.W. and F.G.) in their usage guide, The King’s English. But not because they felt he was overusing his dashes, but because he was mixing his stops. Like many of his contemporaries, Ruskin was given to placing his dashes after some other piece of punctuation—you can see both the commash (,—) and semi-commash (;—) in my Ruskin quotation at the head of this post. As the Fowlers pointed out, this was not only superfluous, but ugly-looking, and the practice soon died out. Last to go was the colash (:—).*
A new problem for editors and printers came with the burgeoning popularity of the typewriter. These came with a limited number of symbol options, and em and en dashes did not feature among them. Here’s the manual typewriter keyboard on which I wrote my first published short story:†
The line above the 6 is the underscore character, with which you could, by shifting the carriage backwards, underline some previously typed text. Under the exclamation mark, at right in the top row, is the hyphen/minus, which was the only available dash-like character. To insert an em dash, I would type a hyphen, hold down the space-bar to half-space forward, type another hyphen, then release the space-bar and type another hyphen. Many writers would just type two hyphens. Some would type one hyphen. So in those days it was the job of a subeditor to mark up a typewritten manuscript with specific instructions for the typesetter, according to both the writer’s apparent intention and the house style. Here’s the appropriate blue-pen mark-up, “1M”, indicating that a typewritten double hyphen should be typeset as an em dash.

What can you do with an em dash?
- You can use it as an informal substitute for a colon or semicolon—like that.
- Or you can use it—like this—as a substitute for a pair of parentheses.
(Some house style use spaced en dashes – like this – instead.) If you mark up with dashes, you’re making a fairly bold interjection; if you use parentheses, you give the impression of a more private aside.
- You can use it in reported speech, to show aposiopesis, a sudden breaking off, as in Lear’s “Tell the hot Duke that— No”, above.
- You can use it to attribute a quotation:
The only difference between resigning and resigning is a hyphen —Tim McCarver
- If you’re James Joyce, you can use it instead of quotation marks:
— He was raving all night about a black panther, Stephen said. Where is his guncase?
— A woful lunatic! Mulligan said. Were you in a funk?
— I was, Stephen said with energy and growing fear.
(But if you’re not James Joyce, maybe you shouldn’t.)
Some stylebooks suggest that these last two, quotation-related usages, should employ a punctuation mark called the horizontal bar, or quotation dash (―), instead of the em dash. The horizontal bar has its own entry in the Unicode table, but its length varies between typefaces, and it all seems to be a bit of a mess.
- During the eighteenth and nineteenth century, it was common practice to decently conceal profanity with an em dash: “That d—d cat has got itself locked in the pantry again!” But there seems to be no support for the suggestion that this practice gave rise to the use of the word “dashed” as a minced oath to replace “damned”.
Sometimes em dashes could be doubled (⸺) or trebled (⸻) to give some indication of the length of the word thus censored.
And that sort of “censorship” was often used for other purposes by authors during the same time period. Here’s Jane Austen, in Pride And Prejudice (1813):
The officers of the ⸻shire were in general a very creditable, gentlemanlike set
And Robert Louis Stevenson in Treasure Island (1883):
I take up my pen in the year of grace 17—, and go back to the time when my father kept the “Admiral Benbow” Inn
Austen wants to evoke a military regiment, without stipulating which; Stevenson wants us to know that his story was set down long ago, during the eighteenth century, but no more than that. It seems an odd convention now, but it was routine for more than a century.‡
My father, who was a typesetter in the days of “hot metal” publishing, once told me of the potential for confusion between “em dash” and “en dash” when spoken aloud in the noisy environment of the case-room. So they were referred to as a mutton dash and a nut dash, after the fashion of the NATO phonetic alphabet used in military applications. But before relaying this information to you, I thought I’d do a quick Google search to ensure my recollection was correct. (It is.)
Picture my delight to be assured by Google’s “AI Overview”:
In the realm of typography and punctuation, “em dash” is often informally referred to as a “mutton dash”. This nickname arises because an em dash is approximately the width of a capital “M” in a given typeface, which is a common size for a piece of livestock like mutton.
At the bottom of this text, in small letters, is the message AI responses may include mistakes.
No kidding. (And not an em dash in sight.)
* Nicholson Baker coined the collective name dashtards for these extinct compound punctuation marks, in his review of Pause and Effect for the New York Review (1993).
† This was the last generation of mechanical typewriters, and it had a few more moving parts than its precursors, allowing a few more characters to be typed directly, like the fractions, the plus sign, and the equals sign. On my grandparents’ ancient machine, I recall typing an approximation to a “+” by superimposing “—” on a “/“; a “=” by superimposing two hyphens with the carriage half-advanced for the second one; a “!” by superimposing a full stop and a single straight quotation mark; and a “£” by combining an “L” and an “f“.
‡ There are some specialist uses for double and treble em dashes, but I’ll spare you the detail.