Making my life easier with two GADTs

Matt Parsons wrote and commented about Lucas Escot writing about their experience and thoughts about GADTs. The articles are called, respectively, “Making My Life Harder with GADTs” and “Making my life easier with GADTs”. In this article I don’t want to participate in the discussion about advanced and simple Haskell, but I do like GADTs and two of them are even more outrageous or better than one, depending on which “side” you are, respectively!

Event sourcing

Recently I have been extracting a “template” from my existing Haskell back-end for future projects. The original application is built using DDD principles, the Onion architecture and event sourcing. The Onion architecture, in particular, is realized using the “mtl-style”. In the new template I wanted to migrate to the interface-passing design, also known as “services”, “handlers” and so on.

As part of the template extraction, at some point it was the turn of event sourcing modules to get some refactoring/redesign. I came up with the following interface for a journal (I dislike the standard ES terms “event” and “stream”):

data Journal record = Journal {

  journalVersion :: IO Version,

  saveRecord :: record -> Version -> IO (Maybe Version)

  -- Other journal operations here
}

The Journal record interface is a record of functions (although Haskell does not have real records, only record syntax) which specifies operations for working with a journal, consisting of records of type record. One layer of the program defines this interface and another layer provides its implementation, i.a. a value of type Journal record for some record type.

As more records are appended (saved) to the journal, its version changes. Current version of a journal can be obtained using the journalVersion operation. Operation saveRecord appends a new record to the journal. The reason why saveRecord also requires a Version is that it performs concurrent change detection. Given a new record and the expected journal version, it appends the record only if the journal version is still the expected version; only then the result is Just carrying the new version of the journal, now corresponding to the old journal together with the newly appended record. If, on the other hand, another record was added concurrently, the expected version will not match the actual journal version, the new record will be rejected and no new version will be returned, i.e. the result becoming Nothing. Nothing very special so far.

Using this new framework, I proceeded with defining a user activity journal. For all users, this journal stores what a user did in the application, on a very high level. For the template, only one example record was needed, so the type of records looked like this:

data UserActivityRecord =
  LoggedIn (Identifier User)

The API of the user activity journal correspondingly becomes

journalVersion :: IO Version

saveRecord :: UserActivityRecord -> Version -> IO (Maybe Version)

The idea was to store a LoggedIn record every time a user successfully authenticates in the application. However, it quickly became clear that the definition of saveRecord was not convenient: it was not interesting or important to detect concurrent changes to the user activity journal, simply because there are no conflicts to resolve or changes to lose! Crucially, other journals certainly will be subject to these considerations, so completely dropping the concurrent change detection aspect of saveRecord is not an option.

More specifically, saveRecord in this form is not convenient in the HTTP handler for authentication, because I would have to

  1. obtain the expected version of the journal from somewhere;

  2. even before that, define what “expected” even means in this case (again, we are not dealing with some change to business data here);

  3. pass a version to saveRecord and deal with the potential outcome of rejecting the new record (!);

  4. retry in case of a “concurrent change”.

All of this is clearly not applicable and nonsensical for user authentication; the journal API is too inflexible. A straightforward solution would be to just define two saveRecord operations: one with and one without concurrent change detection. I did not like the idea for various objective and subjective reasons, so here comes a change of the API.

Make unnecessary outcomes not required (to handle)

The title of this section is a pun on the well-known “make illegal states unrepresentable”. In a way, this is what GADTs allow us to achieve for this design problem, only there are no illegal states, but unwanted outcomes instead. But first, what is this problem more exactly? The problem is that:

What we want instead is:

The tricky part is achieving this with only one saveRecord operation.

If we look carefully at the requirements above, it is clear we need to deal with two “sides”: the parameters and the result. First, we define a GADT which differentiates asking and not asking for concurrent change detection:

data ConcurrencyCheck (detect :: Bool) where
  NoCheck         ::            ConcurrencyCheck False
  ExpectedVersion :: Version -> ConcurrencyCheck True

The NoCheck constructor is our request to not perform change detection, therefore it does not carry a Version. We now have a way of not having to come up with the expected journal version! But as this is a GADT, we have to talk about the type variable as well: detect tracks whether the concurrent change detection is really requested, which is False in case of NoCheck.

The ExpectedVersion constructor carries the expected version and detect signifies a request for change detection with type True. Notice, that ExpectedVersion v for some v :: Version corresponds exactly to passing v in the current signature saveRecord :: record -> Version -> IO (Maybe Version). In other words, we generalized the input of saveRecord to support both requirements above.

Now for the result type. We introduce another GADT which describes our desired outcomes:

data NewVersion (detect :: Bool) where
  Stored                   :: Version -> NewVersion detect
  ConcurrentChangeDetected ::            NewVersion True

Value Stored nv means that the new record was accepted, a new journal version nv generated and it is now returned to the client module. On the other hand, constructor ConcurrentChangeDetected does not carry a new journal version, because it means that a concurrent change was detected. We have the following correspondence with the original Maybe Version result type:

Stored nv ~ Just nv
ConcurrentChangeDetected ~ Nothing

However, the main trick is that we have more information now, in the form of the detect type variable. In the new signature of saveRecord we can relate the request to perform concurrent change detection to the outcome of the operation like this:

data Journal record = Journal {

  journalVersion :: IO Version,

  saveRecord :: forall detect. record -> ConcurrencyCheck detect -> IO (NewVersion detect)

  -- Other journal operations here
}

The essence here is that

By repeating type variable detect on the input and result sides, we relate certain inputs with certain outcomes, but some combinations are impossible.

What is possible are exactly the cases in “what we want” above. What is not possible is that we are forced to check for concurrent changes even we don’t want to. Which is exactly what we want!

It will become clear if we consider all the cases explicitly. Suppose we pass ExpectedVersion v to saveRecord. Effectively, its type becomes specialized to

saveRecord :: record -> ConcurrencyCheck True -> NewVersion True

True appears on both sides because type variable detect is the same on both sides. Now, what values can saveRecord return in this case? This amounts to the question: which constructors of the NewVersion GADT have type NewVersion True? There are two such constructors:

ConcurrentChangeDetected :: NewVersion True -- Trivially, by definition

-- and

Stored nv :: NewVersion True -- Because `detect` in `Stored` unifies with `True`

The Stored case is more tricky because it works due to being polymorphic in detect, unlike the fixed True in the ConcurrentChangeDetected case. So two outcomes are possible and we must pattern match on them to deal with potential concurrent changes, which is good and is just like the original Maybe Version!

But we are here for the second case: passing NoCheck to saveRecord. Repeating the exercise, its type effectively becomes

saveRecord :: record -> ConcurrencyCheck False -> NewVersion False

What values can saveRecord return now? Which constructors of the NewVersion GADT have type NewVersion False? There is only one such constructor:

Stored nv :: NewVersion False -- Because `detect` in `Stored` unifies with `False`

The ConcurrentChangeDetected case is not possible anymore, because its True type variable instantiation does not unify with False, no magic here. Polymorphism of Stored does its work again, but at the same time the other, unnecessary outcome became not required to handle.

To cement what just happened, the following and only the following combinations of inputs and outputs of saveRecord are possible:

ExpectedVersion v <---> { Stored nv, ConcurrentChangeDetected }

NoCheck           <---> { Stored nv }

Convince yourself that this corresponds to our wishlist formulated above. Other combinations simply do not type check.

Final words

While this article is not exactly short, the final journal interface definition is and fits in several lines:

data Journal record = Journal {

  journalVersion :: IO Version,

  saveRecord :: forall detect. record -> ConcurrencyCheck detect -> IO (NewVersion detect)

  -- ..
}

data ConcurrencyCheck (detect :: Bool) where
  NoCheck         ::            ConcurrencyCheck False
  ExpectedVersion :: Version -> ConcurrencyCheck True

data NewVersion (detect :: Bool) where
  Stored                   :: Version -> NewVersion detect
  ConcurrentChangeDetected ::            NewVersion True

There is a lot to unpack here with several techniques coming together (I did not even discuss the strange Bool, True and False…). However, I would not call GADTs themselves (very) advanced or fancy Haskell, which seems to be a popular opinion.

I also haven’t seen the trick with “matching” of cases belonging to different GADTs with a type variable described explicitly anywhere else. It’s no special though and undoubtedly known to many. Maybe this combination does move the needle towards the “advanced” Haskell side, I don’t know…