Previously, I explained why the 11th of most months is mentioned far less than the other days in the Google Ngrams database of English literature from 1800-2008. This was to solve a long-standing question posed in an xkcd comic. While researching this, I encountered another mystery: the 2nd, 3rd, 22nd, and 23rd are unusually low as well—but only until the 1930s, at which point they become perfectly normal days. Last time, I set this question aside to focus on the 11th. In this installment, I explain the strange behavior of these four days.
To remind everyone, the graph below is the mystery we are dealing with. The
23rd are practically unused in 1800, the earliest point in the database. Around 1810 is when the first substantial uses appear; they grow at about the same rate as the other days, maintaining a substantial gap at about half of what one would expect until about the 1890s. Then suddenly, the gap shrinks and continues to do so until the 1930s when
23rd are absorbed into the main group.
Ye old style
3 unlucky numbers in the 1800s? Did Google’s algorithm have a hard time reading the
3s of old-timey fonts? Nope, it turns out that people used to write these ordinals as
23d. I took the median over
February 2d, etc. for each year and did the same for the other old-style ordinals. The graph below shows the use of old-style ordinals, which start as normal days within the main group, but slowly diverge until they drop off exponentially in the 1890s, reaching a tiny residue by the 1930s.
Sometimes you can encounter a modern use of the old-style abbreviations when the ordinal is part of a name with a very long history, like the 3d Marine Division. This is not why the old-style has a small residual in the latter half of the twentieth century. If you search through Google Books for modern uses of
January 2d, you will only find reprints of old books and publications of old diaries.
The old style falls away as the new style emerges. When we add the old-style and new-style ordinals together, we get the graph below, which shows that once the two styles are accounted for, these four days of the months are actually quite ordinary.
I don’t have a fully satisfying explanation for why the
3rd now peek their heads above the main group from time to time. I guess if the
1st on the month is hugely over-represented, it is reasonable to expect that the next smallest ordinals would be slightly over-represented. (“Let’s have our meeting on the first of the month.” “I have ten other meetings on the first!” “Ok then, the second.”) However, if I search Google Books for instances of
January 2d or
January 2nd, there are a sizable number of hits from lists like this: Google Books apparently ignores commas. With the
4th being the only regular ordinals for weeks of the month, these might get a boost this way.
Why did writers use these one-letter abbreviations? Probably to follow Latin, where this practice originated and the ordinal indicator is a single letter
o. The Romance languages, like Spanish, Italian, and Portuguese, still use
a. I expect that we would still be using
d if it wasn’t for
4th, etc. whose final consonant sound cannot be represented by a single letter. In the end, consistency within the English language by using two letters for all ordinals was more attractive than similarity to Latin.