The Missing 23rd of the Month
Previously, I explained why the 11th of most months is mentioned far less than the other days in the Google Ngrams database of English literature from 1800-2008. This was to solve a long-standing question posed in an xkcd comic. While researching this, I encountered another mystery: the 2nd, 3rd, 22nd, and 23rd are unusually low as well—but only until the 1930s, at which point they become perfectly normal days. Last time, I set this question aside to focus on the 11th. In this installment, I explain the strange behavior of these four days.
To remind everyone, the graph below is the mystery we are dealing with. The 2nd
, 3rd
, 22nd
, and 23rd
are practically unused in 1800, the earliest point in the database. Around 1810 is when the first substantial uses appear; they grow at about the same rate as the other days, maintaining a substantial gap at about half of what one would expect until about the 1890s. Then suddenly, the gap shrinks and continues to do so until the 1930s when 2nd
, 3rd
, 22nd
, and 23rd
are absorbed into the main group.
Ye old style
So were 2
and 3
unlucky numbers in the 1800s? Did Google's algorithm have a hard time reading the 2
s and 3
s of old-timey fonts? Nope, it turns out that people used to write these ordinals as 2d
, 3d
, 22d
, and 23d
. I took the median over January 2d
, February 2d
, etc. for each year and did the same for the other old-style ordinals. The graph below shows the use of old-style ordinals, which start as normal days within the main group, but slowly diverge until they drop off exponentially in the 1890s, reaching a tiny residue by the 1930s.
Sometimes you can encounter a modern use of the old-style abbreviations when the ordinal is part of a name with a very long history, like the 3d Marine Division. This is not why the old-style has a small residual in the latter half of the twentieth century. If you search through Google Books for modern uses of January 2d
, you will only find reprints of old books and publications of old diaries.
Combined graph
The old style falls away as the new style emerges. When we add the old-style and new-style ordinals together, we get the graph below, which shows that once the two styles are accounted for, these four days of the months are actually quite ordinary.
I don't have a fully satisfying explanation for why the 2nd
and 3rd
now peek their heads above the main group from time to time. I guess if the 1st
on the month is hugely over-represented, it is reasonable to expect that the next smallest ordinals would be slightly over-represented. ("Let's have our meeting on the first of the month." "I have ten other meetings on the first!" "Ok then, the second.") However, if I search Google Books for instances of January 2d
or January 2nd
, there are a sizable number of hits from lists like this: Google Books apparently ignores commas. With the 1st
, 2nd
, 3rd
, and 4th
being the only regular ordinals for weeks of the month, these might get a boost this way.
Speculation
Why did writers use these one-letter abbreviations? Probably to follow Latin, where this practice originated and the ordinal indicator is a single letter o
. The Romance languages, like Spanish, Italian, and Portuguese, still use o
and a
. I expect that we would still be using d
if it wasn't for 1st
, 4th
, etc. whose final consonant sound cannot be represented by a single letter. In the end, consistency within the English language by using two letters for all ordinals was more attractive than similarity to Latin.