Skip to content


Excellent error messages

Error messages are pervasive throughout programming, yet little has been written on the design of error messages for languages, libraries, and APIs. Much good advice can be found via simple web search on good error messages as shown to end users in GUIs, but standards for error messages intended for an audience of programmers is hard to find. This is not due to a lack of attention to error messages. There are certainly places where error messages are neglected, but neglect is far from universal. In fact, some of the best discussions on good error messages come from specific efforts by big projects to improve their error messages.

The sort-within-groups problem

There is an interesting edge case in data grammars when a grouped data table is sorted by non-group columns. For example, what should the following dplyr code produce?


df <- data.frame(
  group = c(2, 2, 1, 1, 2, 2),
  value = c(3, 4, 3, 1, 1, 3)

df %>% group_by(group) %>% arrange(value)

Multidimensional Pairing Functions

In the previous post, I compared ways to take two infinite streams and generate a new stream that is all possible combinations of the elements in those streams. This post takes it up another level and generalizes this procedure to an arbitrarily long list of infinite streams. This is a trickier task than the 2-dimensional case, utilizing recursion into each dimension to cleanly generate all combinations.

Throughout this post, the caret ^ will indicate exponentiation and parentheses list(index) will be used to indicate indexing a list. As before, 0-indexing will be used because it makes the math simpler. 

Superior Pairing Function

Given two sequences of objects, it is often desirable to generate a sequence which is all possible pairwise combinations of those sequences—the Cartesian product. If the sequences are finite in length, then it is a trivial function to write in any programming language. The function even exists in many standard libraries and packages, such as itertools.product in Python. But if the sequences are infinite in length (that is, they are streams rather than arrays, depending on your terminology), the typical approach fails. Finding a way to iterate over all pairs of two infinitely long sequences is called a "pairing function" in mathematics and has practical uses. There are some existing pairing functions, but many have limitations. I describe the properties of a superior pairing function and a couple of methods that satisfy them.

Marginal vs. Conditional Subtyping

In computer programming, a type is a just a set of objects—not a physical set of objects in memory, but a conceptual set of objects. The type Integer is the set {0, 1, -1, ...}. Types are used to reason about the correctness of programs. A function that accepts an argument of type Integer, will work for any value 0, 1, -1, etc., for some definition of "works". Subtyping is used to define relationships between types. Type B is a subset of a type A if the set of objects defined by B is a subset of the objects defined by A. Every function that works on an instance of A also works on an instance of B, for some definition of "works". If we were just doing math, subsets would be the end of subtyping. But types as pure sets exist only conceptually. Actual types must be concretely defined. It is not very efficient to define the Integer class by writing all possible Integers! In most programs, the possible values of a type are constrained by their fields. Subtypes and supertypes typically differ by having more or fewer fields than the other. Sometimes, the subtype has more fields, and sometimes, the supertype has more fields. Many of you may be thinking, "What? It's always one way, not the other!" The funny thing is that some of you think the subtype always has more fields and others think the supertype always has more fields. That's because there are two kinds of subtyping. The first is what I call "marginal subtyping", which is encountered in application programming and is well modeled by inheritance. The second is what I call "conditional subtyping", which is encountered in mathematical programming and is well modeled by implicit conversions. Depending on the genre of programming you work in, the other kind of subtyping may be unknown and the language features needed to implement it may be maligned. But both needs are real and both features are necessary.

Sane Equality

Equality: every programming language has it, the == syntax is as universal as 1+1, and it works almost the same in every language. When the left and right operands are the same type, equality is easy. No one questions that 1==1 evaluates to true or that "a"=="b" evaluates to false. This post is about what to do when the operands are different types. What should 1=="1" be? Or what should Circle(1,2,2)==Point(1,2) be? Or what should ColoredPoint(1,2,red)==Point(1,2) be?

When Sequences Go Bad

In my last post, I talked about the various kinds of syntax for getting and setting elements in sequences. This post will talk about semantics. What exactly should get and mutate do when invoked? What should happen when the index is valid is hopefully obvious. But because we have to handle the case of an invalid index—in particular, an index larger than the length of the sequence, the answer is not as clear-cut as it may seem. If "throw an exception" is the only thing that comes to mind, you have been stuck in procedural programming for too long.

One-Sided Debate over Sequence Syntax

Computers are famously stupid machines. You have to tell them in perfect detail not just what you want them to do but how to do it. A computer may be able to add 1 and 2 faster than I can, but it will take me longer to tell it to do that than for me to do it myself. The more complex the task, the more time it takes to code. Coding is still laborious and entirely not worth it unless such code will be used many times. I posit that a computer is only useful for doing work when the vast majority of the work to be done is repetitive tasks on simple objects. The most common abstraction for representing a bunch of objects is the sequence (also known as a list or array), in which each object in the collection is associated with an integer called its index. There is a wide diversity of syntax and semantics for accessing and changing sequences.

Difficulty of Pattern Matching Syntax

Pattern matching is compact syntax, originating in functional programming languages, that acts a super-charged switch statement, allowing one to concisely branch on the types and components of a value of interest. Like a traditional switch statement, a pattern match takes a single object and compares it to sequence of cases, running the code body associated with the matching case. There are many parts to a pattern matcher. and design concise and unambiguous syntax is a difficult endeavor, one failed by many popular programming languages.

Comparison of Iteration Styles in Programming

It is difficult to overstate the importance of iteration in programming—the process performing an operation on each element of a list. I would rank only variable assignment, functions calling, and branching as more important. Unlike the if statement, which is essentially the same in every language, the semantics of the for loop varies across languages. Mainly for my own future reference, this post compares the various styles of iteration that I have come across. The examples are in pseudo code which is the only way to write so many different iteration styles under similar syntax. Each of the examples is trying to do the same thing: print out each element of a list called list.