Better Data Types a la Carte

To be honest with you, my approach to procedurally generating RPG stories has been stymied until very recently. Recall the command functor:

data StoryF a = Change Character ChangeType (ChangeResult -> a)
              | forall x y. Interrupt (Free StoryF x) (Free StoryF y) (y -> a)
              | -- whatever else

This recursively defined Interrupt command has caused more than its fare share of grief. The idea is that it should represent one potential line of action being interrupted by another. The semantics are rather hazy, but this should result in grafting the Free StoryF y monad somewhere inside of the Free StoryF x monad. Once we’ve done whatever analysis on the original story, we can then forget that the Interrupt bit was ever there in the first place.

In effect, we want this:

data StoryF' a = Change Character ChangeType (ChangeResult -> a)
               | -- whatever else

runInterrupt :: StoryF a -> StoryF' a
runInterrupt = -- ???

where runInterrupt’s job is to remove any instances of the Interrupt command from our story – replacing them with the “canonical” version of what actually happened.

Of course, we could just remove all of the Interrupt data constructors from our Free StoryF a object, and rely on convention to keep track of that for us. However, if you’re like me, whenever the phrase “by convention” is used, big alarm bells should go off in your head. Convention isn’t enforced by the compiler, and so anything maintained “by convention” is highly suspect to bugs.

What would make our lives better is if we could define StoryF and StoryF' somehow in terms of one another, so that there’s no hassle to keep them in sync with one another. Even better, in the future, maybe we’ll want to remove or add other constructors as we interpret our story.

What we really want to be able to do is to mix and match individual constructors into one larger data structure, which we can then transform as we see fit.

Fortunately for us, the machinery for this has already been built. It’s Swierstra’s Data Types a la Carte (henceforth DTalC) – essentially a set of combinators capable of composing data types together, and tools for working with them in a manageable way.

Unfortunately for us, Data Types a la Carte isn’t as type-safe as we’d like it to be. Additionally, it’s missing (though not fundamentally) the primitives necessary to remove constructors.1

This post presents a variation of DTalC which is type-safe, and contains the missing machinery.

But first, we’ll discuss DTalC as it is described in the original paper, in order to get a feeling for the approach and where the problems might lie. If you know how DTalC works already, consider skipping to the next heading.

Data Types a la Carte🔗

Data Types a la Carte presents a novel strategy for building data types out of other data types with kind2 * -> *. A code snippet is worth a thousand words, so let’s dive right in. Our StoryF command functor as described above would instead be represented like this:

data ChangeF    a = Change Character ChangeType (ChangeResult -> a)
data InterruptF a = forall x y.
                    Interrupt (Free StoryF x) (Free StoryF y) (y -> a)

type StoryF = ChangeF :+: InterruptF

Here, (:+:) is the type operator which composes data types together into a sum type (there is a corresponding (:*:) for products, but we’re less interested in it today.)

Because the kindedness of (:+:) lines up with that of the data types it combines, we can nest (:+:) arbitrarily deep:

type Something = Maybe :+: Either Int :+: (,) Bool :+: []

In this silly example, Something a might be any of the following:

  • Maybe a
  • Either Int a
  • (Bool, a)
  • [a]

but we can’t be sure which. We will arbitrary decide that (:+:) is right-associative – although it doesn’t matter in principle (sums are monoidal), part of our implementation will depend on this fact.

Given a moment, if you’re familiar with Haskell, you can probably figure out what the machinery must look like:

data (f :+: g) a = InL (f a)
                 | InR (g a)
                 deriving Functor
infixr 8 :+:

(:+:) essentially builds a tree of data types, and then you use some combination of InL and InR to find the right part of the tree to use.

However, in practice, this becomes annoyingly painful and tedious; adding new data types can completely shuffle around your internal tree structure, and unless you’re careful, things that used to compile will no longer.

But fear not! Swierstra has got us covered!

class (Functor sub, Functor sup) => sub :<: sup where
    inj :: sub a -> sup a

This class (and its instances) say that f :<: fs means that the data type f is nestled somewhere inside of the big sum type fs. Furthermore, it gives us a witness to this fact, inj, which lifts our small data type into our larger one. With some clever instances of this typeclass, inj will expand to exactly the right combination of InL and InRs.

These instances are:

instance Functor f => f :<: f where
    inj = id

instance (Functor f, Functor g) => f :<: (f :+: g) where
    inj = InL

instance {-# OVERLAPPABLE #-}
         (Functor f, Functor g, Functor h, f :<: g) => f :<: (h :+: g) where
    inj = InR . inj

The first one states “if there’s nowhere left to go, we’re here!”. The second: “if our desired functor is on the left, use InL”. The third is: “otherwise, slap a InR down and keep looking”.

And so, we can now write our smart constructors in the style of:

change :: (ChangeF :<: fs) => Character -> ChangeType -> Free fs ChangeResult
change c ct = liftF . inj $ Change c ct id

which will create a Change constructor in any data type which supports it (witnessed by ChangeF :<: fs).

Astute readers will notice immediately that the structural induction carried out by (:<:) won’t actually find the desired functor in any sum tree which isn’t right-associative, since it only ever recurses right. This unfortunate fact means that we must be very careful when defining DTalC in terms of type aliases.

In other words: we must adhere to a strict convention in order to ensure our induction will work correctly.

Better Data Types a la Carte🔗

The problem, of course, is caused by the fact that DTalC can be constructed in ways that the structural induction can’t handle. Let’s fix that by constraining how DTalCs are constructed.

At the same time, we’ll add the missing inverse of inj, namely outj :: (f :<: fs) => fs a -> Maybe (f a)3, which we’ll need later to remove constructors, but isn’t fundamentally restricted in Swiestra’s method.

On the surface, our structural induction problem seems to be that we can only find data types in right-associative trees. But since right-associative trees are isomorphic to lists, the real flaw is that we’re not just using lists in the first place.

With the help of {-# LANGUAGE DataKinds #-}, we can lift lists (among other term-level things) to the type level. Additionally, using {-# LANGUAGE TypeFamilies #-}, we’re able to write type-level functions – functions which operate on and return types!

We define a type class with an associated data family:

class Summable (fs :: [* -> *]) where
    data Summed fs :: * -> *

Here fs is a type, as is Summed fs. Take notice, however, of the explicit kind annotations: fs is a list of things that look like Functors, and Summed fs looks like one itself.

Even with all of our fancy language extensions, a type class is still just a type class. We need to provide instances of it for it to become useful. The obvious case is if fs is the empty list:

instance Summable '[] where
    data Summed '[] a = SummedNil Void
                      deriving Functor

The funny apostrophe in '[] indicates that what we’re talking about is an empty type-level list, rather than the type-constructor for lists. The distinction is at the kind level: '[] :: [k] for all kinds k, but [] :: * -> *.

What should happen if we try to join zero data types together? This is obviously crazy, but since we need to define it to be something we make it wrap Void. Since Void doesn’t have any inhabitants at the term-level, it is unconstructible, and thus so too is SummedNil.

But what use case could an unconstructible type possibly have? By itself, nothing, but notice that Either a Void must be Right a, since the Left branch can never be constructed. Now consider that Either a (Either b Void) is isomorphic to Either a b, but has the nice property that its innermost data constructor is always Left (finding the a is Left, and finding b is Right . Left).

Let’s move to the other case for our Summable class – when fs isn’t empty:

instance Summable (f ': fs) where
    data Summed (f ': fs) a = Here (f a)
                            | Elsewhere (Summed fs a)

Summed for a non-empty list is either Here with the head of the list, or Elsewhere with the tail of the list. For annoying reasons, we need to specify that Summed (f ': fs) is a Functor in a rather obtuse way:

instance Summable (f ': fs) where
    data Summed (f ': fs) a = Functor f => Here (f a)
                            | Elsewhere (Summed fs a)

{-# LANGUAGE StandaloneDeriving #-}
deriving instance Functor (Summed fs) => Functor (Summed (f ': fs))

but this now gives us what we want. Summed fs builds a nested sum-type from a type-level list of data types, and enforces (crucially, not by convention) that they form a right-associative list. We now turn our attention to building the inj machinery a la Data Types a la Carte:

class Injectable (f :: * -> *) (fs :: [* -> *]) where
    inj :: f a -> Summed fs a

We need to write instances for Injectable. Note that there is no instance Injectable '[] fs, since Summable '[] is unconstructible.

instance Functor f => Injectable f (f ': fs) where
    inj = Here

instance {-# OVERLAPPABLE #-} Injectable f fs => Injectable f (g ': fs) where
    inj = Elsewhere . inj

These instances turn out to be very inspired by the original DTalC. This should come as no surprise, since the problem was with our construction of (:+:) – which we have now fixed – rather than our induction on (:<:).

At this point, we could define an alias between f :<: fs and Injectable f fs, and call it a day with guaranteed correct-by-construction data types a la carte, but we’re not quite done yet.

Remember, the original reason we dived into all of this mumbo jumbo was in order to remove data constructors from our DTalCs. We can’t do that yet, so we’ll need to set out on our own.

We want a function outj :: Summed fs a -> Maybe (f a) which acts as a prism into our a la carte sum types. If our Summed fs a is constructed by a f a, we should get back a Just – otherwise Nothing. We define the following type class:

class Outjectable (f :: * -> *) (fs :: [* -> *]) where
    outj :: Summed fs a -> Maybe (f a)

with instances that again strongly resemble DTalC:

instance Outjectable f (f ': fs) where
    outj (Here fa)     = Just fa
    outj (Elsewhere _) = Nothing

instance {-# OVERLAPPABLE #-} Outjectable f fs => Outjectable f (g ': fs) where
    outj (Here _ )      = Nothing
    outj (Elsewhere fa) = outj fa

The first instance says, “if what I’m looking for is the head of the list, return that.” The other says, “otherwise, recurse on an Elsewhere, or stop on a Here.”

And all that’s left is to package all of these typeclasses into something more easily pushed around:

class ( Summable fs
      , Injectable f fs
      , Outjectable f fs
      , Functor (Summed fs)
      ) => (f :: * -> *) :<: (fs :: [* -> *])
instance ( Summable fs
         , Injectable f fs
         , Outjectable f fs
         , Functor (Summed fs)
         ) => (f :<: fs)

This is a trick I learned from Edward Kmett’s great talk on Monad Homomorphisms, in which you build a class that has all of the right constraints, and then list the same constraints for an instance of it. Adding the new class as a constraint automatically brings all of its dependent constraints into scope; f :<: fs thus implies Summable fs, Injectable f fs, Outjectable f fs, and Functor (Summed fs) in a much more terse manner.

As a good measure, I wrote a test that outj is a left-inverse of inj:

injOutj_prop :: forall fs f a. (f :<: fs) => Proxy fs -> f a -> Bool
injOutj_prop _ fa = isJust $ (outj (inj fa :: Summed fs a) :: Maybe (f a))

{-# LANGUAGE TypeApplications #-}
main = quickCheck (injOutj_prop (Proxy @'[ []
                                         , Proxy
                                         , Maybe
                                         , (,) Int
                                         ]) :: Maybe Int -> Bool)

where we use the Proxy fs to drive type checking for the otherwise hidden fs from the type signature in our property.

And there you have it! Data types a la carte which are guaranteed correct-by-construction, which we can automatically get into and out of. In the next post we’ll look at how rewriting our command functor in terms of DTalC solves all of our Interrupt-related headaches.

A working version of all this code together can be found on my GitHub repository.

  1. EDIT 2016-09-14: After re-reading the paper, it turns out that it describes (though doesn’t implement) this functionality.↩︎

  2. For the uninitiated, kinds are to types as types are to values – a kind is the “type” of a type. For example, Functor has kind * -> * because it doesn’t become a real type until you apply a type to it (Functor Int is a type, but Functor isn’t).↩︎

  3. I couldn’t resist the fantastic naming opportunity.↩︎