Basic principles of record storing

In the previous notes I introduced record storing as an alternative term for event sourcing and summarized some points of critique regarding the de facto approach and advice. In these notes, I sketch out the general principles of record storing, but not yet within a concrete architecture of an application.

Records and journal

An application which uses record storing as a persistence mechanism stores data as a sequence of read-only records in an append-only journal. The order of records in a journal is significant and the application relies on this when reading the journal, more on this later.

One of the main conceptual benefits of record storing is that the records in a journal can be interpreted in new and unforeseen ways while the application evolves, given that they contain enough information. More precisely, a sub-sequence of records can be used to compute values of type A now, while in the future, given new requirements, another but possibly overlapping sub-sequence of records can be used to compute values of type B.

Furthermore, new requirements leading to the need for computing B (or C, D and so on) do not affect the overall design of data persistence in the application. There is no need in new database tables, foreign keys or other DB artefacts, no new and potentially very different SQL queries with their own optimizations, no new repositories and the like in the inner layers of the application and no new implementations of these repositories in the outer layers (I am alluding to the Onion architecture). That is, the main API of data access/persistence stays the same even if requirements change.

Conceptually and almost literally in practice, the API for working with a journal is as follows (using an F#/Haskell/PureScript-like pseudo-code):

storeRecord : Record -> Version -> IO (Maybe Version)

journalVersion : IO Version

foldRecords : Fold (Record, Version) r -> IO r

Operation storeRecord is for writing to a journal. It takes a new record, the expected version of the journal and attempts appending the record to the journal. If this succeeds, the version new version of the journal is returned, reflecting the journal extended with the new record. It may also happen, that the journal contains more records than reflected by the expected version. In this case, the new record will not be appended and no new version is returned. This is the main mechanism of concurrency control, which I will come back to in future notes.

As the notion of journal version introduced above suggests, it makes sense to ask for the current version of the journal. Operation journalVersion does exactly that, for the complete journal. That is, each time a record is stored, journalVersion returns a new (increasing) Version. As one may expect, immediately after appending a record, the result of storeRecord is consistent with journalVersion.

Finally, operation foldRecords is for reading the journal and is perhaps the most interesting one. Its name already suggests the main and only pattern of inspecting journal records: by folding the records, in the order from the least recently stored to most recently stored, into a value of type r which stands for “result”. Crucially, the operation does not predefine the (type of the) result, as witnessed by the type variable r. This corresponds to the central principle that various, independent and potentially completely different values can be computed from the same journal, see A and B above.

What does determine the desired result of reading a journal? It is the thing passed to foldRecords, a value of type Fold (Record, Version) r which “chooses” the concrete type for the type variable r. In functional programming, folding is one of the most basic and well-known “patterns”. It is present in most programming languages under different names: “reducing”, “aggregation” and so on. Normally, it is a higher-order function which takes a “folder function”, a data structure, most typically a list and “loops” the folder function through the data structure, accumulating a final result using each element of the data structure and some intermediate result.

Conceptually, the same happens here as well. One difference is that there is no “data structure” but a journal and that the latter is not visible anywhere. Instead, foldRecords folds the journal “internally”, without exposing it. (Some materials about ES show exposed “streams” as actual lists, loaded from the “store” and returned to the application. Obviously, completely different from what happens here.) Another difference, is that there is no folding function, but the somewhat strange parameter of type Fold (Record, Version) r. This is the fold(ing) itself, as a value. It should be understood as a computation which “sees” every record in a journal sequentially, does something internally and, in the end, produces an r. Specifically, what a fold sees is a tuple (pair) of each Record and the Version for the prefix of the journal ending in that record (i.e. the version of the journal up till and including the record), before producing a result of type r. Operation foldRecords accepts a fold, applies it to the complete journal and returns whatever the fold returned in the end. The fold itself does not know how many records there are or anything else about how the journal is implemented. This decoupling of a Fold from the journal implementation details in the outer layers of the application is an important property which fits very well with the Onion architecture. I will discuss this in the future.

(No) identifiers and special queries

Note how the journal API does not include any entity identifiers (in the abstract sense of “entity”, i.e. problem domain entity) and no distinction between or support for different “queries” on a journal. This is also in contrast to common ES explanations, which sometimes put heavy emphasis on identifiers and their relations to streams. At this point, it is time to introduce an example.

Our example will be managing paints for plastic model making. The hypothetical application PaintBro (TM) essentially keeps track of a modeler’s collection of paints: which colors are in stock, how many jars or bottles of each, how much remains in each jar, its condition and so on.

Retrieving a specific color

The first use-case is retrieving an overview about a specific color. Each paint manufacturer has a numbering scheme for their colors, often within a pain type or range. For example, we are interested in LP79, which is flat red in the Tamiya range of paints. LP79 is an unique identifier of a paint, also across different manufacturers and we use this fact in PaintBro.

There is no obvious place in the journal API where the LP79 identifier can be passed to. The answer is that every Fold has all the “query data” it needs baked into it, in this case the color identifier. This translates to the following function:

tamiyaColor : TamiyaNumber -> Fold (Record, Version) (List Jar)

This is just a normal function, but it can be seen as a “factory” which prepares a fold, for a given Tamiya color number. It is important to realize, that tamiyaColor does not actually do anything with a journal yet. It only returns a Fold value, which may or may not be used on a journal. Because a fold is a regular value, we can store it, pass it around and do even more exciting things with it later. The usual functional programming goodness…

The resulting Fold (Record, Version) (List Jar) works internally as follows. While observing each Record, it:

considers only records which are required for (re)constructing the end result (a list of jars);
ignores all records which are related to manufacturers other than Tamiya;
ignores all Tamiya-related records which are not related to the given TamiyaNumber color identifier.

Finally, actually obtaining all jars of Tamiya flat red, we use the expression

foldRecords (tamiyaColor (TamiyaNumber "LP79"))
-- Result is of type IO (List Jar)

What to buy next

The second example use-case is for the “low stock” functionality of PaintBro. The modeler wants to know which paints are almost almost used up for that next sweet visit to the model shop. Suppose the application models paint level in a jar using the following type:

data JarLevel =
    | Empty
    | Low
    | Half
    | Full

Although we are interested in low stock, we can define a more general fold for any given JarLevel and use it to determine specifically low stock:

stockByLevel : JarLevel -> Fold (Record, Version) (List Jar)

The result is also a list of jars, but clearly there is nothing like an identifier involved, unlike TamiyaNumber above. Instead, JarLevel is a general property applicable to all paints, regardless of manufacturer or other properties. Still, we use exactly the same (very compact) journal API also for this use-case:

foldRecords (stockByLevel Low)
-- Result is of type IO (List Jar)

It should be easy to imagine which records of the journal are considered and which are ignored by this fold(ing).

Records

We turn our attention to the records themselves. In the journal API, they are represented by the type Record. It is a single type, while we obviously need different records, with different data, for record storing to be useful. Unlike OO languages where class inheritance is the obvious choice, in FP languages we have algebraic data types (a.k.a discriminated unions) which I find a perfect fit. We can have a single type for all records in a journal, without inheritance issues, and different “shapes” of records as needed.

In PaintBro we could have the following records, relevant for the two use-cases above:

data Record =
    ...
    | BoughtJars (List TamiyaNumber) (List VallejoNumber) Date
    | Consumed (Either TamiyaNumber VallejoNumber) HowMuch
    ...

The meaning of the BoughtJars record is self-explanatory. It contains identifiers of the purchased colors, unrealistically limited to two fixed manufacturers (Tamiya and Vallejo). If a color number occurs multiple times, then this means that the modeler bought multiple jars of that color on the same Date.

The tamiyaColor fold would ignore all VallejoNumber values and consider only the presence and the number of TamiyaNumber values. Also, depending on whether this fold returns only non-empty jars, it would have to perform some computations with HowMuch values by processing also Consumed records, but only for the given TamiyaNumber.

Typically, a type like Record contains many more than two records. For the example PaintBro application I would expect it to be easily more than ten.

Conclusion

The record storing API is surprisingly compact and uniform. It easily supports the principle of being able to load “any kind” of data from a journal, including future needs. The two example use-cases show both identifier-based and arbitrary property-based “queries”.

The treatment of the API is not yet complete. In the upcoming notes I will focus on the typical patterns of API usage within an application, continuing with the PaintBro example. I will also claim that a single journal for everything is the best thing ever.

Lastly: do you even make and paint plastic models, bro?!