# Static Analysis of Free Monads

## Motivationđź”—

A common misperception of free monads is that they allow for analysis of an program expressed with them. This is not true, and in fact, monads are too liberal of an abstraction to allow for inspection in general.

In order to see why, consider the following monadic expression:

```
getLine
>>= \str -> if str == "backdoor"
then launchNukes
else pure ()
```

The problem here is that bind is expressed via a continuation, and weâ€™re unable to look inside that continuation without calling the function. So weâ€™re stuck. We canâ€™t determine if the above program will ever call `launchNukes`

unless we just happen to call the lambda with the exact string `"backdoor"`

.

So, in general, weâ€™re unable to statically inspect monads. We can *run* them (not necessarily in the `IO`

monad) and see what happens, but getting a free monad to help with this is equivalent to mocking the exact problem domain. But, even though we canâ€™t do it in general, it seems like we should be able to do it in certain cases. Consider the following monadic expression:

```
getLine
>>= \_ -> launchNukes
```

In this case, the computation doesnâ€™t actually care about the result of `getLine`

, so in theory we can just call the continuation with `undefined`

and find that yes indeed this expression will call `launchNukes`

.

Notice that we *canâ€™t* use this strategy in the first expression we looked at, because that one scrutinized the result of `getLine`

, and branched depending on it. If we tried passing `undefined`

to it, it would crash with an error when we tried to force the final value of the monad (although this might be preferable to actually launching nukes.)

This example of `launchNukes`

is admittedly rather silly. My original motivation for investigating this is in the context of ecstasy in which users can query and manipulate disparate pieces of data. For example, if we wanted to write a physics simulator where each object may or may not have any of a `position :: V2 Double`

, a `velocity :: V2 Double`

and a `hasPhysics :: Bool`

, we could write the following piece of code to update the positions of any entities that have a velocity and are, in fact, affected by physics:

```
$ do
emap <- query position
p <- query velocity
v <- query hasPhysics
h
guard h
pure unchanged
= Set $ p + v ^* timeDelta
{ position }
```

Because objects are not required to have all of the possible data, mapping this function will intentionally fail for any of the following reasons:

- the object did not have a
`position`

field - the object did not have a
`velocity`

field - the object did not have a
`hasPhysics`

field - the object had a
`hasPhysics`

field whose value was`False`

Without being able to statically analyze this monadic code, our only recourse is to attempt to run it over every object in the universe, and be happy when we succeed. While such an approach works, itâ€™s terribly inefficient if the universe is large but any of the `position`

, `velocity`

or `hasPhysics`

fields is sparse.

What would be significantly more efficient for large worlds with sparse data would be to compute the intersection of objects who have all three of `position`

, `velocity`

and `hasPhysics`

, and then run the computation only over those objects. Free applicatives (which *are* amenable to static analysis) are no good here, since our `guard h`

line really-and-truly is necessarily monadic.

Any such static analysis of this monad would be purely an optimization, which suggests we donâ€™t need it to be *perfect;* all that we are asking for is for it to be better than nothing. A best-effort approach in the spirit of our earlier â€śjust pass `undefined`

around and hope it doesnâ€™t crashâ€ť would be sufficient. If we can be convinced it wonâ€™t actually crash.

What weâ€™d *really* like to be able to do is count every occurrence of `query`

in this monad before it branches based on the result of an earlier `query`

. Which is to say weâ€™d like to pass `undefined`

around, do as much static analysis as we can, and then somehow `fail`

our analysis *just before* Haskell would crash due to evaluating an `undefined`

.

## Prospecting Monadsđź”—

Iâ€™ve been playing around with this conceptual approach for some time, but could never seem to get it to work. Laziness can sure be one hell of a bastard when youâ€™re trying to pervert Haskellâ€™s execution order.

However, last week Foner et al. dropped a bomb of a paper Keep Your Laziness in Check, which describes a novel approach for observing evaluations of thunks in Haskell. The gist of the technique is to use `unsafePerformIO`

to build an `IORef`

, and then set its value at the same time you force the thunk. If you (unsafely) read from the `IORef`

and see that it hasnâ€™t been set, then nobody has forced your value yet.

We can use a similar approach to accomplish our optimization goals. For the crux of the approach, consider the follow `verify`

function that will evaluate a pure thunk and return `empty`

if it instead found a bottom:

```
verify :: Alternative f => a -> f b
= unsafePerformIO $ do
verify f a catch
let !_ = a
(in pure $ pure a)
_ :: SomeException) -> pure empty) (\(
```

The bang pattern `!_ = a`

tells Haskell to `seq`

`a`

, which reduces it to WHNF, which, if its WHNF is bottom, will be caught inside of the `catch`

. `unsafePerformIO`

is necessary here, because exceptions can only be caught in `IO`

.

Using this approach, if weâ€™re very careful, we can tear down a free monad by following its continuations using bottom, and doing the `verify`

trick to stop exactly when we need to.

I call such a thing `prospect`

, and you can find it on github. The name comes from the fact that this can lead to gold, but carries with it the intrinsic dangers of plumbing into the depths, such as cave-ins, blackened lungs, or the worse things that dwell in the darkness.

The primary export of `prospect`

is the titular function `prospect :: Free f a -> (Maybe a, [f ()])`

, which tears down a free monad, tells you whether or not it has a pure return value, and spits out as many `f`

constructors as it could before the computation branched. If you got a `Just`

back, it means it found every constructor, but there are no other guarantees.

Huge shoutouts to Vikrem who was a very patient sounding-board for all of my crazy ideas, and to kcsongor who suggested that I pay a lot more attention to where Iâ€™m being strict.