Previously, I explained why the 11th of most months is mentioned far less than the other days in the Google Ngrams database of English literature from 1800-2008. This was to solve a long-standing question posed in an xkcd comic. While researching this, I encountered another mystery: the 2nd, 3rd, 22nd, and 23rd are unusually low as well—but only until the 1930s, at which point they become perfectly normal days. Last time, I set this question aside to focus on the 11th. In this installment, I explain the strange behavior of these four days.

To remind everyone, the graph below is the mystery we are dealing with. The 2nd, 3rd, 22nd, and 23rd are practically unused in 1800, the earliest point in the database. Around 1810 is when the first substantial uses appear; they grow at about the same rate as the other days, maintaining a substantial gap at about half of what one would expect until about the 1890s. Then suddenly, the gap shrinks and continues to do so until the 1930s when 2nd, 3rd, 22nd, and 23rd are absorbed into the main group.

new_style

Ye old style

So were 2 and 3 unlucky numbers in the 1800s? Did Google’s algorithm have a hard time reading the 2s and 3s of old-timey fonts? Nope, it turns out that people used to write these ordinals as 2d, 3d, 22d, and 23d. I took the median over January 2d, February 2d, etc. for each year and did the same for the other old-style ordinals. The graph below shows the use of old-style ordinals, which start as normal days within the main group, but slowly diverge until they drop off exponentially in the 1890s, reaching a tiny residue by the 1930s.

old_style

Sometimes you can encounter a modern use of the old-style abbreviations when the ordinal is part of a name with a very long history, like the 3d Marine Division. This is not why the old-style has a small residual in the latter half of the twentieth century. If you search through Google Books for modern uses of January 2d, you will only find reprints of old books and publications of old diaries.

Combined graph

The old style falls away as the new style emerges. When we add the old-style and new-style ordinals together, we get the graph below, which shows that once the two styles are accounted for, these four days of the months are actually quite ordinary.

sum_new_old_style

I don’t have a fully satisfying explanation for why the 2nd and 3rd now peek their heads above the main group from time to time. I guess if the 1st on the month is hugely over-represented, it is reasonable to expect that the next smallest ordinals would be slightly over-represented. (“Let’s have our meeting on the first of the month.” “I have ten other meetings on the first!” “Ok then, the second.”) However, if I search Google Books for instances of January 2d or January 2nd, there are a sizable number of hits from lists like this: january_2dGoogle Books apparently ignores commas. With the 1st, 2nd, 3rd, and 4th being the only regular ordinals for weeks of the month, these might get a boost this way.

Speculation

Why did writers use these one-letter abbreviations? Probably to follow Latin, where this practice originated and the ordinal indicator is a single letter o. The Romance languages, like Spanish, Italian, and Portuguese, still use o and a. I expect that we would still be using d if it wasn’t for 1st, 4th, etc. whose final consonant sound cannot be represented by a single letter. In the end, consistency within the English language by using two letters for all ordinals was more attractive than similarity to Latin.

The code used for this analysis is available on Github.
  • I have a suggestion for the why the 2nd and 3rd peek their heads above the main group. We’re assuming that people like to have events on the first of the month, however for business events people obviously avoid weekends. Something that would normally be on the 1st would be on the 2nd or 3rd if the 1st was a Saturday or Sunday. So at least some of the occurrences that would be assigned to the 1st get assigned to the 2nd or 3rd. This effect will apply to all dates, but only shows up on the 2nd and 3rd because the 1st has such a huge share to start with.

    • David Hagen

      I like this explanation.

  • ralokt

    This is awesome :D

    Would you mind posting a graph with just the sums of occurrences where everything is assimilated into the main group? That would make my brain incredibly happy XD

  • dendodge

    Bit late to the party, but the fact that the corrected values are unusually high could have something to do with the fact that “d” is the old abbreviation for pence, so plenty of old British texts will include things costing 2d.

    • David Hagen

      This was also brought up in the Reddit discussion. It seems possible. But remember, not only does “2d” have to appear, but a month has to precede it, like “March 2d”, which is not a common construction. Maybe something like: “In March, 2d was paid per ounce…”. Or tables of prices: “March: 2d. April: 3d.”