This weekend I spent time with the inimitable David Rusu. The bar for my trip has been set extremely high; not only is David an amazing host, but we hashed out a particularly interesting project in a couple of days. We call it nimic.
Nimic is a freeform macro language, without any real syntax, or operational semantics. We have a super bare bones parser that groups parenthetical expressions, and otherwise admits any tokens, separated by whitespace. The language will attempt to run each of its macros on the deepest, leftmost part of this grouping structure. If nothing matches, the program is stuck and we just leave it.
Therefore, hello world in nimic is just the stuck program:
hello world
which you have to admit is about as small as you can get it. The core language installs four builtin macros; the most interesting of which is macro
— allowing you to define new macros. The syntax is macro pattern rewrite
, which itself will be rewritten as the stuck term defined
.
As a first program, we can use macro
to rewrite the defined
term:
macro defined (nimic is weird)
which will step to defined
via the definition of macro
, and then step to nimic is weird
via the new defined
rule. Here it gets stuck and does no more work.
You can use the special tokens #foo
to perform pattern matching in a macro. These forms are available in the rewrite rule. For example,
macro (nimic is #adjective) (nimic is very #adjective)
will replace our nimic is weird
term with nimic is very weird
. You can bind as many subterms in a macro as you’d like.
Because nimic attempts to run all of its macros on the deepest, leftmost part of the tree it can, we can exploit this fact to create statements. Consider the program:
(macro (#a ; #b) #b)
; ( (macro (what is happening?) magic)
; (what is happening?)
)
Here we’ve built a cons list of the form (head ; tail)
. Our default evaluation order will dive into the leftmost leaf of this tree, and evaluate the (macro (#a ; #b) #b)
term there, replacing it with defined
. Our tree now looks like this:
(defined
; ( (macro (what is happening?) magic)
; (what is happening?)
)
where our new #a : #b
rule now matches, binding #a
to defined
, and #b
to the tail of this cons cell. This rule will drop the defined
, and give us the new tree:
( (macro (what is happening?) magic)
; (what is happening?)
)
whereby we now can match on the leftmost macro again. After a few more steps, our program gets stuck with the term magic
. We’ve defined sequential evaluation!
But writing cons cells by hand is tedious. This brings us to the second of our builtin macros, which is rassoc #prec #symbol
. The evaluation of this will also result in defined
, but modifies the parser so that it will make #symbol
be rightassociative with precedence #prec
. As a result, we can use rassoc 1 ;
to modify the parser in order to turn a ; b ; c
into a ; (b ; (c))
.
Therefore, the following program will correctly get stuck on the term finished
:
(macro (#a ; #b) #b)
; ((rassoc 1 ;)
;
( this is now
; parsed correctly as a cons cell
; finished
)))
The final primitive macro defined in nimic
is bash #cmd
, which evaluates #cmd
in bash, and replaces itself with the resulting output. To illustrate, the following program is another way of writing hello world:
bash (echo "hellozworld"  tr "z" " ")
Note that the bash
command isn’t doing any sort of bash parsing here. It just takes the symbols echo
"hellozworld"

tr
"z"
"
and "
, and dumps them out pretty printed into bash. There are no string literals.
We can use the bash
command to do more interesting things. For example, we can use it to define an import
statement:
macro (import #file) (bash (cat #file))
which when you evaluate import some/file.nim
, will be replaced with (bash (cat some/file.nim))
, which in turn with the contents of some/file.nim
. You have to admit, there’s something appealing about even the module system being defined in library code.
But we can go further. We can push our math runtime into bash!
macro (math #m) (bash (bc <<< " #m "))
which will correctly evaluate any math expressions you shoot its way.
Personally, I’m not a big fan of the macro #a #b
notation. So instead I defined my own sequent notation:
rassoc 2 
; macro (#a  #b) (macro #a #b)
This thing defines a macro, which, when expanded, will itself define a macro. Now David and I don’t need to have any discussions bikeshedding over syntax. We can just define whatever we want!
For a longer example, I wanted to implement pattern matching a la Haskell.
My first step was to define a lazy if
statement. Because macros are tried mostrecentlydefined first, I can define my operational semantics first. The rule is to force the condition:
; if #cond then #a else #b

if !#cond then #a else #b
(the exclamation marks here are magic inside of the macro
macro, which will force macro expansion at whatever it’s attached to) Next, I give two more expansion rules for what to do if my condition is true and false:
; if True then #a else #b

#a
; if False then #a else #b

#b
Great! We can define syntactic equality of stuck terms by forcing two subterms, and then checking them in bash for string equality:
; judge #a #b

is zero !(bash ([[ " !#a " == " !#b " ]]; echo $?))
; is zero #n

False
; is zero 0

True
We can try this out. judge hello hello
and judge (macro (a a a) b) defined
both step to True
, but judge foo bar
steps to False
.
Finally, we’re ready to define pattern matching. We start with the base case in which there is only one pattern we can match on:
; match #a of (#m1 > #r1)

if !(judge #a #m1) then #r1 else (failed match)
We replace our match statement with an equality test, and produce failed match
if the two things aren’t identical.
There’s also an induction case, where we want to match on several possibilities.
; match #a of (#m1 > #r1 ; #z)

if !(judge #a #m1) then #r1 else (match !#a of #z)
Here we’d like to perform the same rewrite, but the else case should pop off the failed match.
Amazingly, this all just works. Let’s try it:
; rassoc 3 =
; #a = #b

#a

#b
; not #t =
match #t of
( True > False
; False > True
)
; match (not True) of
( True > hello
; False > it works!
)
This program will get stuck at it works!
. Pretty sweet!
The core nimic implementation is 420 lines of Haskell code, including a handrolled parser and macro engine. But wait, there’s more! There’s also an additional 291 line interactive debugger, which allows you to step through the macro expansion of your program. It’s even smart enough to colorize the variables in your source tree that are being matched by the current macro.
Not bad for a weekend of work. I can barely contain my excitement for what other cool projects are on the horizon.
]]>I nominated myself because of problems I perceived in the statusquo. Lately, there have been a few notable cases of dissatisfaction in the community with the proposal process. I have certainly had my qualms with it in the past, but I decided that rather than complain about it, I should just get involved and try my best to help.
Which takes us to today.
I’m extremely humbled to be included with such a group of incredible individuals as my cocommittee members are. I don’t claim to know everything about writing industrialgrade compilers, nor about the intricacies of cuttingedge type system research. Most of the time I’m only barely smart enough to keep up with the conversation, let alone have a strongly informed opinion on the topic. WIth that in mind…
I don’t intend to be a gatekeeper. Instead, I intend to be a facilitator; someone whose job it is is to know the right people to bring into the discussion. Nobody in this community is an expert at everything, but everyone has a niche in which they are an expert. I want to make sure that those people are involved in the process in which their expertise can be used.
That’s not to say I’m not going to do everything in my power to understand each and every proposal under my stewardship. Just that I’ll be the first to admit that I don’t know everything, and I don’t want my understanding to get in the way of good ideas!
I intend to inspire the committee to move faster. I think any process which can take upwards of two years is too slow — it’s a waste of everyone’s time, energy and attention. My preference is fast responses to proposals, whether it be yea or nay. I don’t want to railroad any decisions, but I also don’t want to sit on the fence out of fear of making the wrong move.
Along those lines, I intend to enthusiastically support any qualityoflife improvements to the language that are small in scope, compose well, and don’t get in the way of active, longerterm goals. Delaying a decision on the basis of “there’s probably a more general solution here that nobody has time to think about, nor do they plan to actually propose” seems like a bad strategy for me. I don’t want to let perfect be the enemy of the much better.
I intend to preemptively shut down discussions around bikeshedding. Too often bikeshedding completely derails the conversation. It’s an enormous waste of time. Feelings get hurt, and everyone just talks past one another anyway. I intend to give the proposal author and the Simons the only say on bikeshedding matters. They can canvas for opinions if they’d like, but in my opinion, should do so outside the official proposal channels.
To be clear, I am not saying you shouldn’t offer suggestions if they improve a proposals’ composability, typesafety, elegance or parsimony. But let’s agree as a community that opinions of style are just opinions, and are never going to convince anyone anyway.
I intend to remove as many barriers to entry as humanly possible. The Haskell community is by far the smartest, kindest, most incredible group of people I’ve ever had the good fortune to be a part of. I firmly believe that each and every one of us has great things to contribute to this community, and I suspect the world would be a better place if more of us felt like we had some ownership over the language.
Please take a moment to remember that Haskell is only as great as we make it. There is no “community,” nor is there a “GHCHQ.” There are just people who step up to solve the problems that they see. I encourage you to be one of those people. You don’t need permission. You just need to be kind, genuine and hardworking. Remember that nobody involved in our community is stupid, and if they “just don’t get it,” maybe your argument wasn’t as persuasive as you thought.
All of this goes beyond making proposals to the compiler. Be the community you want to see. Don’t assume somebody else is going to do it for you.
Last but not least, I intend to be a good addition to this committee. I solicit anonymous feedback, so please don’t hesitate to call me out if you’re ever unhappy with my behavior. I promise to not be offended by any constructive feedback you send my way.
I will do my best to stay honest to these intentions, but help keep me in tow when I slip up.
The GHC steering committee is a thankless job. I’d like to take a moment to explicitly call out every member of the committee, past and present, for their selfless commitments to the community. These are people who are taking time out of their own research and projects in order to make our fantastic language much more transparent than it would be otherwise. Really and truly, these people are heroes, even if we don’t always agree with their decisions. Thank you, from the very bottom of my heart.
Finally, I intend to live up to everyone’s expectations. Thanks for your time.
]]>aka “what the hell is that Tactics
stuff?”
This is the second post in a series of implementation details in polysemy — a fast, powerful and lowboilerplate effectsystem library.
In the last post we discussed the Yo
type, which accumulates weaving functions of the form Functor f => f () > (∀ x. f (m x) > n (f x)) > e m a > e n (f a)
. As a quick reminder, the f
functor corresponds to some piece of (possibly trivial) state, and the ∀ x. f (m x) > n (f x)
is a distribution function, roughly analogous to a function like runStateT
.
Where our story left off, we had accumulated all of our desired weave
s into Yo
, but hadn’t yet used them for anything. The developer experience around Yo
is fraught with peril, and even as the guy who implemented it, I’m often stymied about how to get all the types to line up. Such a detail is not the sort of thing you can expose in a library that you expect people to actually use.
data Yo e m a where
Yo :: Functor f
=> e m a
> f ()
> (∀ x. f (m x) > n (f x))
> (f a > b)
> Yo e n b
At the types Yo
usually gets instantiated, it looks something like Yo (State s) (Sem r) Int
. Which looks easy enough, until you realize that packed inside of this thing is an existential m
(which was originally a Sem r0
for some unknown effect row r0
), and an existential functor f
which is all of the initial state we’re carrying around from other effects who have already run.
Yo
is the free Effect
, which means that like all free structures, it provides dependency injection so you can later decide what that Effect
means. It’s not a magic bullet — you still need to actually write code somewhere. Somebody needs to actually use that f ()
and ∀ x. f (m x) > n (f x)
to actually do something!
As a first attempt, let’s see what happens if we just expose them in the interpretation API. We’ll write a quick interpret
function which can handle an effect e m x
by producing a Sem r (f x)
. The implementation is given below. Don’t worry too much about its body; but pay attention to just how gruesome the type is.
interpret
:: (∀ x m f
. Functor f
=> f ()  initial state
> (∀ y. f (m y) > Sem (e ': r) (f y))  distrib function
> e m x  effect constructor
> Sem r (f x)
)
> Sem (e ': r) a
> Sem r a
interpret f (Sem m) = m $ \u >
case decomp u of
Right (Yo eff state distrib y) >
fmap y $ f state distrib eff
Left another_effect >
liftSem $ hoist (interpret f) another_effect
For example, we can use it to implement an interpretation of the Reader
effect:
data Reader i m a where
Ask :: Reader i m i
Local :: (i > i) > m a > Reader i m a
runReader :: i > Sem (Reader i ': r) a > Sem r a
runReader i = interpret $ \state distrib > \case
Ask > pure $ i <$ state
Local f ma > runReader (f i) $ distrib $ ma <$ state
Because Ask
doesn’t have any embedded computations, it doesn’t need to do anything fancy. It can just icecream cone to put i
inside of the state it was given, and return that. But Local
is a more complicated beast! It must icecream cone its ma
computation into the state, and then distrib
that thing into a Sem (Reader i '; r)
, and then run the Reader
effect off of that!
It’s not the end of the world, but it’s a nontrivial amount of boilerplate that needs to be duplicated for every interpreter. Combined with the terrifying types, this feels like a nogo.
Let’s look at an interpretation for the Resource
effect (which gives bracket
semantics.) Resource
is more complicated than Reader
, and this complexity serves to illustrate some common patterns that come up and up again when writing interpreters.
data Resource m a where
Bracket
:: m a  allocate
> (a > m b)  finalize
> (a > m c)  use
> Resource m c
runResource
:: ∀ r a
. Member (Lift IO) r
=> (∀ x. Sem r x > IO x)  irrelevant to the discussion
> Sem (Resource ': r) a
> Sem r a
runResource lower = interpret $ \state distrib > \case
Bracket alloc finalize use > do
let toIO :: Sem (Resource ': r) x > IO x
toIO = lower . runResource lower
sendM $ X.bracket
(toIO $ distrib $ alloc <$ state)  1
(\a > toIO $ distrib $ fmap finalize a)
(\a > toIO $ distrib $ fmap use a)
The bracket
function allocates some resource of type a
, provides it to the use
block for some computation, and ensures that it will be cleaned up via finalize
— even if the use
block crashed.
There are a few subtleties in the type instantiation here. In the comment marked  1
, we run distrib
on our m a
parameter, which transforms it into an Sem (Resource ': r) (f a)
. Note that we’ve introduced an f
here! This in turn unifies our finalize
and use
types as f a > m b
and f a > m c
, respectively. Because we later need to distribute to turn those m
s into Sem (Resource ': r)
s, we also introduce f
s into b
and c
.
In essence, we end up with functions alloc :: Sem (Resource ': r) (f a)
, finalize :: f a > Sem (Resource ': r) (f b)
and use :: f a > Sem (Resource ': r) (f c)
. This threading of f
evident in the types corresponds directly to the fact that we need to keep track of other people’s state. As we’ll see in a future post, is indicative of a huge problem with the naive semantics we’ve given to Resource
here.
Anyway, looking at runReader
and runResource
, we see two particular patterns emerge in our interpreters:
distrib $ ma <$ state
for the case of an m a
argument\fa > distrib $ fmap mb fa
for the case of an a > m b
argumentThe insight here is that maybe we can just make these combinators a part of the interpret
interface directly, rather than have people write them by hand for each interpreter. It doesn’t help the horrifying types:
interpret
:: (∀ x m f
. Functor f
=> (∀ y. m y > Sem (e ': r) (f y))
> (∀ y z. (y > m z) > f y > Sem (e ': r) (f z))
> e m x
> Sem r (f x)
)
> Sem (e ': r) a
> Sem r a
interpret f (Sem m) = m $ \u >
case decomp u of
Right (Yo eff state distrib y) >
fmap y $ f (distrib . (<$ state))
(\mf > distrib . fmap mf)
eff
Left another_effect >
liftSem $ hoist (interpret f) another_effect
But it sure as heck improves the ergonomics:
runResource
:: ∀ r a
. Member (Lift IO) r
=> (∀ x. Sem r x > IO x)
> Sem (Resource ': r) a
> Sem r a
runResource lower = interpret $ \start continue > \case
Bracket alloc finalize use > do
let toIO :: Sem (Resource ': r) x > IO x
toIO = lower . runResource lower
sendM $ X.bracket
(toIO $ start alloc)
(toIO . continue finalize)
(toIO . continue use)
Much nicer! If only we could do something about those gnarly types, we’d be in business!
The last conceptual step here is to realize that the start :: ∀ y. m y > Sem (e ': r) (f y)
and continue :: ∀ y z. (y > m z) > f y > Sem (e ': r) (f z)
parameters are static. That means we could stick them into a reader monad — or perhaps more mindcrushingly, an effect.
And so, we can provide the two following primitive actions in our Tactics
effect, and then derive start
and continue
from them:
data Tactics f n r m a where
GetInitialState :: Tactics f n r m (f ())
HoistInterpretation :: (a > n b) > Tactics f n r m (f a > Sem r (f b))
type WithTactics e f m r = Tactics f m (e ': r) ': r
This thing is a mess of type parameters, but f
is exactly what you’d expect. The n
corresponds to what m
used to be (it’s standard operating procedure in polysemy to use m
as the name of the secondlast type argument.) And we introduce r
which corresponds to the effect row that we’re trying to interpret.
Interpreters for effect actions e m
end up running with the ∀ f. Functor f => WithTactics e f m r
effect row. This thing gives us access to a Tactics
capable of producing Sem (e ': r)
s, but doesn’t itself have access to e
effects.
Finally, we use a type synonym to hide most of the nasty details.
Given an appropriate runTactics
interpreter:
runTactics
:: Functor f
=> f ()
> (∀ x. f (m x) > Sem r2 (f x))
> Sem (Tactics f m r2 ': r) a
> Sem r a
runTactics state distrib (Sem m) = m $ \u >
case decomp u of
Left x > liftSem $ hoist (runTactics state distrib) x
Right (Yo GetInitialState state' _ y _) >
pure $ y $ state <$ state'
Right (Yo (HoistInterpretation na) state' _ y _) >
pure $ y $ (distrib . fmap na) <$ state'
We can finally implement interpret
:
interpret
:: (∀ x m . e m x > Tactical e m r x)
> Sem (e ': r) a
> Sem r a
interpret f (Sem m) = m $ \u >
case decomp u of
Left x > liftSem $ hoist (interpret f) x
Right (Yo eff state distrib y) > do
a < runTactics state distrib $ f eff
pure $ y a
We’ve hid all of the nasty type inside of that Tactical
synonym (which admittedly is still gross, but at least it’s not rank 3.) And we’ve create an effect interpreter effect in which we can put any combinators people will need for writing interpreters.
After renaming start
to runT
and continue
to bindT
for branding purposes, runResource
ends up in its final form:
runResource
:: ∀ r a
. Member (Lift IO) r
=> (∀ x. Sem r x > IO x)
> Sem (Resource ': r) a
> Sem r a
runResource lower = interpret $ \case
Bracket alloc dealloc use > do
let toIO :: Sem (Resource ': r) x > IO x
toIO = lower . runResource lower
a < runT alloc
d < bindT dealloc
u < bindT use
sendM $ X.bracket (toIO a) (toIO . d) (toIO . u)
I’m unable to properly express the amount of joy I get in using a library to implement core features in itself. The result is one of the most mindcrushingly meta things I’ve ever written, but it elegantly solves a real problem — so why not?
In the next post in this series, we’ll discuss the semantics behind the order in which you interpret effects, and how this can get you into trouble with things like runResource
. Stay tuned.
It sounds snide, but it’s true. Here’s a function from my library polysemy:
hoistStateIntoStateT
:: Sem (State s ': r) a
> S.StateT s (Sem r) a
hoistStateIntoStateT (Sem m) = m $ \u >
case decomp u of
Left x > S.StateT $ \s >
liftSem . fmap swap
. weave (s, ())
(\(s', m') > fmap swap
$ S.runStateT m' s')
(Just . snd)
$ hoist hoistStateIntoStateT x
Right (Yo Get z _ y _) > fmap (y . (<$ z)) $ S.get
Right (Yo (Put s) z _ y _) > fmap (y . (<$ z)) $ S.put s
Gee, that’s complicated! I must be really smart to have written such a function, right?
Wrong! I just have a trick!
The technique is called “just use type holes,” and for my money, it’s the most important skill in a Haskeller’s toolbelt. The idea is to implement the tiny part of a function that you know how to do, and then ask the compiler for help on the rest of it. It’s an iterative process. It’s a discussion with the compiler. Each step of the way, you get a little closer to the right answer, and after enough iterations your function has written itself — even if you’re not entirely sure how.
Let’s go through an example together. Consider the random type signature that I just made up:
If you want a challenge, take a few minutes to try to implement this function. It’s tricky, and most people get lost along the way. When you’re convinced that it’s sufficiently hard, continue reading.
The first step of writing a function is to bind all of the variables we have. That’s the a > b
and (a > Int) > Int
bits here. I usually give them names that help me remember their types — such as ab
and aii
, respectively.
Then, bang out a _
on the right hand side. This thing is a placeholder, and is called a type hole.
Try to compile this (consider using something like ghcid so you don’t need to call ghc
by hand.) The compiler will yell at you:
• Found hole: _ :: (b > Int) > Int
Where: ‘b’ is a rigid type variable bound by
the type signature for:
jonk :: forall a b.
(a > b) > ((a > Int) > Int) > (b > Int) > Int
at /home/sandy/Test.hs:3:162
• In the expression: _
In an equation for ‘jonk’: jonk ab aii = _
• Relevant bindings include
aii :: (a > Int) > Int (bound at /home/sandy/Test.hs:4:9)
ab :: a > b (bound at /home/sandy/Test.hs:4:6)
jonk :: (a > b) > ((a > Int) > Int) > (b > Int) > Int
(bound at /home/sandy/Test.hs:4:1)

4  jonk ab aii = _

A common complaint from beginners is that GHC’s error messages are noisy. This is true. To a first approximation, the useful bit of this error message is this:
• Found hole: _ :: (b > Int) > Int
• Relevant bindings include
aii :: (a > Int) > Int (bound at /home/sandy/Test.hs:4:9)
ab :: a > b (bound at /home/sandy/Test.hs:4:6)
jonk :: (a > b) > ((a > Int) > Int) > (b > Int) > Int
(bound at /home/sandy/Test.hs:4:1)
There’s no way of getting GHC to shut up about that other stuff, so you just need to train yourself to focus on this core piece of information. That’s not to say the other stuff isn’t helpful, just that this stuff is almost always enough.
So what is the compiler telling us? Two things:
_
with must have type (b > Int) > Int
.aii
, ab
, jonk
, and their types) that we can use to help with the implementation.Using this information, our goal is to write the correct expression in place of the type hole. In most cases doing that in one step is unfeasible, but we can often write a little more of expression, and use a type hole in that.
In this case, we notice that our hole has type (b > Int) > Int
, which is to say, that it’s a function that takes a (b > Int)
and returns an Int
. As such, it means we should bind the (b > Int)
in a lambda:
The resulting error message in full is this:
• Found hole: _ :: Int
• In the expression: _
In the expression: \ bi > _
In an equation for ‘jonk’: jonk ab aii = \ bi > _
• Relevant bindings include
bi :: b > Int (bound at /home/sandy/Test.hs:4:16)
aii :: (a > Int) > Int (bound at /home/sandy/Test.hs:4:9)
ab :: a > b (bound at /home/sandy/Test.hs:4:6)
jonk :: (a > b) > ((a > Int) > Int) > (b > Int) > Int
(bound at /home/sandy/Test.hs:4:1)
Valid hole fits include
maxBound :: forall a. Bounded a => a
with maxBound @Int
(imported from ‘Prelude’ at /home/sandy/Test.hs:1:1
(and originally defined in ‘GHC.Enum’))
minBound :: forall a. Bounded a => a
with minBound @Int
(imported from ‘Prelude’ at /home/sandy/Test.hs:1:1
(and originally defined in ‘GHC.Enum’))

4  jonk ab aii = \bi > _

GHC now mentions “Valid hole fits”. In my experience, these are almost always useless, so I just exclude them. In GHCi, the following incantation will make them disappear.
:set fmaxvalidholefits=0
(or you can just squint and ignore them manually!)
Again, ignoring the irrelevant pieces of the error message, we can pare GHC’s response down to this:
• Found hole: _ :: Int
• Relevant bindings include
bi :: b > Int (bound at /home/sandy/Test.hs:4:16)
aii :: (a > Int) > Int (bound at /home/sandy/Test.hs:4:9)
ab :: a > b (bound at /home/sandy/Test.hs:4:6)
jonk :: (a > b) > ((a > Int) > Int) > (b > Int) > Int
(bound at /home/sandy/Test.hs:4:1)
OK, great! Now we just need to produce an Int
. While we could just put 0
here, that is a clearly wrong solution, since we wouldn’t be using any of ab
, aii
or bi
. Don’t just return 0
.
But we notice that both aii
and bi
will return an Int
. Since that’s what we want to return, the odds are good that we want to call one of these functions in this hole. Let’s choose aii
as a guess. Feel free to write in your notebook that you are guessing about aii
, but also bi
could have been chosen — we have no guarantees that aii
is the right call!
• Found hole: _ :: a > Int
• Relevant bindings include
bi :: b > Int (bound at /home/sandy/Test.hs:4:16)
aii :: (a > Int) > Int (bound at /home/sandy/Test.hs:4:9)
ab :: a > b (bound at /home/sandy/Test.hs:4:6)
jonk :: (a > b) > ((a > Int) > Int) > (b > Int) > Int
(bound at /home/sandy/Test.hs:4:1)
Our hole has a function type, so let’s introduce a lambda:
• Found hole: _ :: Int
• Relevant bindings include
a :: a (bound at /home/sandy/Test.hs:4:29)
bi :: b > Int (bound at /home/sandy/Test.hs:4:16)
aii :: (a > Int) > Int (bound at /home/sandy/Test.hs:4:9)
ab :: a > b (bound at /home/sandy/Test.hs:4:6)
jonk :: (a > b) > ((a > Int) > Int) > (b > Int) > Int
(bound at /home/sandy/Test.hs:4:1)
We need to produce an Int
again. Since we don’t have one in scope, our only options are again aii
and bi
. But we’ve already used aii
, so let’s try bi
this time.
• Found hole: _ :: b
• Relevant bindings include
a :: a (bound at /home/sandy/Test.hs:4:29)
bi :: b > Int (bound at /home/sandy/Test.hs:4:16)
aii :: (a > Int) > Int (bound at /home/sandy/Test.hs:4:9)
ab :: a > b (bound at /home/sandy/Test.hs:4:6)
jonk :: (a > b) > ((a > Int) > Int) > (b > Int) > Int
(bound at /home/sandy/Test.hs:4:1)
Great! Now we need to produce a b
. We have a function that can do that, ab :: a > b
. So let’s call that:
• Found hole: _ :: a
• Relevant bindings include
a :: a (bound at /home/sandy/Test.hs:4:29)
bi :: b > Int (bound at /home/sandy/Test.hs:4:16)
aii :: (a > Int) > Int (bound at /home/sandy/Test.hs:4:9)
ab :: a > b (bound at /home/sandy/Test.hs:4:6)
jonk :: (a > b) > ((a > Int) > Int) > (b > Int) > Int
(bound at /home/sandy/Test.hs:4:1)
Finally, we have a hole whose type is a
. And we have an a
! Let’s just use that thing!
[1 of 1] Compiling Main ( /home/sandy/Test.hs, interpreted )
Ok, one module loaded.
Cool! It worked! We just wrote a nontrivial function without doing any thinking, really. Not bad! But can we be confident that our implementation is any good?
The first line of defense against this is to enable Wall
. In GHCi, you can do this via:
:set Wall
You’ll notice there are no warnings generated by our definition. This is usually enough of a sanity check that our implementation is fine. For example, let’s see what happens when we try the obviously stupid implementation:
/home/sandy/Test.hs:4:6: warning: [Wunusedmatches]
Defined but not used: ‘ab’

4  jonk ab aii = \bi > 0
 ^^
/home/sandy/Test.hs:4:9: warning: [Wunusedmatches]
Defined but not used: ‘aii’

4  jonk ab aii = \bi > 0
 ^^^
/home/sandy/Test.hs:4:16: warning: [Wunusedmatches]
Defined but not used: ‘bi’

4  jonk ab aii = \bi > 0

Those warnings are pointing out that we haven’t used everything available to us. If we assume that the type of jonk
is correct, then any implementation of jonk
which doesn’t use all of its variables is extremely suspect.
The other common way to go wrong here is that you’ll notice that jonk
comes up in the relevant bindings while trying to write jonk
. For example, this thing will happily typecheck:
But this too is clearly wrong, since we haven’t done any work. The situation becomes more insidious when you call yourself recursively after doing some work, which can be correct. Let’s look at an example of that.
Let’s try this type on for size:
The first thing to do is to bind all of our variables:
But we notice that as
has type [a]
. Since [a]
has two constructors, let’s pattern match on those before going any further.
• Found hole: _ :: b
• Relevant bindings include
b :: b (bound at /home/sandy/Test.hs:4:10)
abb :: a > b > b (bound at /home/sandy/Test.hs:4:6)
zoop :: (a > b > b) > b > [a] > b
(bound at /home/sandy/Test.hs:4:1)
• Found hole: _ :: b
• Relevant bindings include
as :: [a] (bound at /home/sandy/Test.hs:5:17)
a :: a (bound at /home/sandy/Test.hs:5:13)
b :: b (bound at /home/sandy/Test.hs:5:10)
abb :: a > b > b (bound at /home/sandy/Test.hs:5:6)
zoop :: (a > b > b) > b > [a] > b
(bound at /home/sandy/Test.hs:4:1)
Oh god! Too many holes at once. My brain is already exploding. You honestly expect me to keep this much information in my head at once?? Instead, we can replace one of the holes with undefined
in order to get GHC to shut up and let us focus.
• Found hole: _ :: b
• Relevant bindings include
b :: b (bound at /home/sandy/Test.hs:4:10)
abb :: a > b > b (bound at /home/sandy/Test.hs:4:6)
zoop :: (a > b > b) > b > [a] > b
(bound at /home/sandy/Test.hs:4:1)
Much easier. We see that we need to produce a b
, and hey, look at that. We already have one. Furthermore, we don’t have an a
, and so we have no chance of calling abb
. So we assume b
is correct. Let’s fill it in, and then replace our undefined
with a hole again:
• Found hole: _ :: b
• Relevant bindings include
as :: [a] (bound at /home/sandy/Test.hs:5:17)
a :: a (bound at /home/sandy/Test.hs:5:13)
b :: b (bound at /home/sandy/Test.hs:5:10)
abb :: a > b > b (bound at /home/sandy/Test.hs:5:6)
zoop :: (a > b > b) > b > [a] > b
(bound at /home/sandy/Test.hs:4:1)
Again we want to produce a b
. We could use the b
we have, but that would mean abb
is completely unused in our function. So let’s assume we want to call abb
instead. Since it takes two arguments, let’s give the first one a hole, and the second undefined
. One step at a time.
• Found hole: _ :: a
• Relevant bindings include
as :: [a] (bound at /home/sandy/Test.hs:5:17)
a :: a (bound at /home/sandy/Test.hs:5:13)
b :: b (bound at /home/sandy/Test.hs:5:10)
abb :: a > b > b (bound at /home/sandy/Test.hs:5:6)
zoop :: (a > b > b) > b > [a] > b
(bound at /home/sandy/Test.hs:4:1)
We want an a
. And we have an a
. Since we have no guarantees that as
isn’t []
, this a
is our only choice. So it’s pretty safe to assume our hole should be filled with a
.
• Found hole: _ :: b
• Relevant bindings include
as :: [a] (bound at /home/sandy/Test.hs:5:17)
a :: a (bound at /home/sandy/Test.hs:5:13)
b :: b (bound at /home/sandy/Test.hs:5:10)
abb :: a > b > b (bound at /home/sandy/Test.hs:5:6)
zoop :: (a > b > b) > b > [a] > b
(bound at /home/sandy/Test.hs:4:1)
So we need to produce a b
, and we still have the unused as :: [a]
to work with, so it’s unlikely to just be our binding b
. Instead, our only option which takes a [a]
is zoop
itself! This is a recursive call, but we’ve already popped the head off our list, so it’s not going to be an infinite loop.
Lets fill in our hole with zoop _ _ as
. Or, zoop _ undefined as
if you prefer.
• Found hole: _ :: a > b > b
• Relevant bindings include
as :: [a] (bound at /home/sandy/Test.hs:5:17)
a :: a (bound at /home/sandy/Test.hs:5:13)
b :: b (bound at /home/sandy/Test.hs:5:10)
abb :: a > b > b (bound at /home/sandy/Test.hs:5:6)
zoop :: (a > b > b) > b > [a] > b
(bound at /home/sandy/Test.hs:4:1)
Probably abb
, because we’re recursing, and have no real reason to want to change this function. Fill it in, and, for the same argument, replace our undefined
with b
. Our final function in all its glory is this:
zoop :: (a > b > b) > b > [a] > b
zoop abb b [] = b
zoop abb b (a : as) = abb a $ zoop abb b as
And it works! Except that Wall
yells at us:
/home/sandy/Test.hs:4:6: warning: [Wunusedmatches]
Defined but not used: ‘abb’

4  zoop abb b [] = b

This is a little alarming, until we realize that abb
isn’t not used in zoop
, it’s just not used in this branch. We can put a wildcard to match abb
here to get rid of this warning:
(note that this _
on the lefthand side of the equals sign is not a type hole, it’s a wildcard pattern match!)
Finally we’re finished! A little experimentation will convince you that this zoop
thing we just wrote is in fact just foldr
! Pretty impressive for just blindly filling in holes, no?
I’m not going to say that blindly filling in type holes always works, but I’d say maybe 95% of the time? It’s truly amazing just how far you can get by writing down the right type and making sure you use every variable.
The reason why this works is known as theorems for free, which roughly states that we can infer lots of facts about a type signature (assuming it’s correct.) One of those facts we can infer is often the the only possible implementation. It’s cool as fuck, but you don’t need to understand the paper to use this idea in practice.
One question you might have is “what the heck does it mean for a type to be correct?” Good question! It means your type should be as polymorphic as possible. For example, if you want a function that creates a list with length \(n\), where all elements are the same value, then that thing should have type Int > a > [a]
, not Int > Bool > [Bool]
. Because we can do this operation for any type, we don’t need to give it a monomorphic type. Here we would say Int > a > [a]
is the correct type for this operation, while Int > Bool > [Bool]
is not.
You know when people say “types are not an alternative to documentation?” I think this is a pretty knockdown argument to that claim. Once you really understand the typesystem, most of the time, types really are the best documentation — they often tell you exactly what the function does, in a way that English comments never will.
In conclusion, a strong type system is fucking awesome because it’s smart enough to know the necessary type of any given expression. Which means you can slowly use type holes to chip away at a difficult implementation, without ever really knowing what you’re doing. It’s marvelous. Get into the habit of using this technique, and you’ll quickly be amazed at just how good you get at Haskell.
]]>aka “what the hell is that Yo
type?”
This is the first post in a series of implementation details in polysemy
— a fast, powerful and lowboilerplate effectsystem library.
Even if you’re not particularly interested in polysemy
, there are some functional pearls here — and a crash course on the history on the implementations of free monads in Haskell.
Critics of free monads often make the claim that higherorder effects aren’t possible. This has historically been true, but Wu, Schrijvers and Hinze’s paper Effect Handlers in Scope gives a technique for lifting the restriction. Today I want to illustrate the problem, discuss Wu et al.’s solution, and then show what changes polysemy
makes to remove the boilerplate. In the process, we’ll look at finding free constructions for tricky typeclasses.
Let’s consider the Error e
effect, in which we’d like to be able to throw
errors of type e
, and catch
any errors thrown within a specific block of code. You’re already familiar with this concept, in transformers
it’s called ExceptT e
, and in mtl
, MonadError e
. A typical usage of this effect might be:
foo =
catch
do  computation to run
when (not someBool) $ throw SomeError
pure True
\SomeError >  error handler
pure False
We would expect foo
to be pure False
whenever someBool
is False
; and vice versa. The idea is that a throw
should shortcircuit the rest of the computation, until it reaches the end of a catch
statement. This is the basis of every exception system of all time, so we won’t belabor the example any further.
Given some appropriate m
, we’d like to model this problem with the following interface:
In firstorder effect systems such as freersimple
, our effects have kind * > *
. With such a kind, we can easily model throw
, but it’s less clear how to model catch
:
We simply don’t have an m
available to us in order to write something equivalent to m a > (e > m a) > m a
. There are a few unsatisfactory solutions here — you can either choose a concrete m
and bake it in (which defeats the entire purpose of effect systems), or you can attempt to encode m
somewhere inside of the Error e
part. Neither is fruitful.
freersimple
actually takes a pretty clever approach to this problem. Instead of modeling catch
in the Error e
effect, it just provides catch
as a function:
catch
:: Member (Error e) r
=> Eff r a
> (e > Eff r a)
> Eff r a
catch ma f =  replace every call to `throw e` in `ma` with `f e`
And what do you know, this solution actually works pretty well. It accurately captures the semantics of catch
for ExceptT
. Success! For most people, most of the time, this implementation of catch
is perfectly fine.
But let’s consider an interpretation of Error e
which isn’t completely analogous to ExceptT
. After all, the whole point of effectsystems is to be able to arbitrarily reinterpret the meaning of your programs. So let’s pretend that we’re writing an interpretation of the system which wants to audit the happy code path. As a result, we’d like to log whether or not we successfully got to the end of a catch
block.
In essence, we’d like to replace every call to catch ma f
with:
meaning logSuccessfulExit
will be called if and only if ma
didn’t contain a throw
statement.
Unfortunately, the clever encoding of catch
as a separate function outside of Effect e
means that this interpretation of catch
is impossible. The problem is fundamentally that by virtue of being outside the effect, catch
must choose its own interpretation of catching effects, and you’re out of luck if its choice isn’t what you want.
This is a bit of a contrived example, but it shows up every time you want to embed a computation; such as doing callbacks, coroutines, asynchronous work, or resource bracketing. It’s a big class of problems that quickly become untenable in the firstorder world.
Wu et al. give us a real solution for the problem above. Instead of modeling our effects with kind * > *
, we give them a kind (* > *) > * > *
. This extra (* > *)
is enough to hold a monad in. As such, Error e
is now modeled as:
This extra m
parameter lets us write Catch
as a constructor, meaning it is now part of the effect algebra. By writing clever constructors, we can force m
to be the effect stack we’re running in:
which nicely ties the recursive knot.
This change is pretty straightforward, and has probably occurred to most people who’ve spent any time playing around with the internals of firstorder free monads. However, here is where the first problem sets in.
Effect systems model interpretations of effects as functions. For example, lets’ assume we have a State s
effect to play with. We can give an interpretation of it with the type:
In the firstorder world, you can just have runState
walk through every action in Eff
, and handle the State s
ones. In the higherorder world, however, we also need to run runState
on all of the embedded computations (like Catch
) as well — and then somehow merge the resulting side states back into the main thread.
Recall above that we tied the recursive knot on catch
, so that the m
in Error e m
was always equal to the actual Eff
monad its being run in. By calling runState
, we’re promising that that m
is of the form Eff (State s ': r)
. But now we’re eliminating the State s
effect, and we want to maintain the invariant that m
is the same monad. Which means, we need to somehow use runState
to eliminate the State s
inside of Catch
.
It makes my head spin, too. English is not particularly good at describing these kinds of things, so pay attention to the types here:
catch :: Eff r a > (e > Eff r0 a) > Eff r0 a
somewhere in our application coderunState :: s > Eff (State s ': r1) a > Eff r1 (s, a)
r0 ~ (State s ': r1)
runState
, we are left only with r1
in our effect stack.catch
still contains r0
. We need to transform it into r1
to maintain our invariant that the computations embedded inside catch
are in same monad as the call to catch
.Doing such a thing is going to require a function:
which for reasons that will become clearer later, we will uncurry into:
The implementation of this function is guided by the types, and looks like this:
call'runState'InsideError
:: (s, Error (Eff (State s ': r)) a)
> Error (Eff r) (s, a)
call'runState'InsideError (_, Throw e) = Throw e
call'runState'InsideError (s, Catch ma f) =
Catch (runState s ma)
(\e > runState s $ f e)
Such an example is helpful for building intuition, but is completely infeasible in the real world. Not only do we need one of these functions for every effect inside of our stack, but we also need one for every interpretation of every effect in our stack! This is O(m*n)
functions in the number of effects and interpretations we have.
The insight of Wu et al. is that we can get this down to O(n)
— one function analogous to call'runState'InsideError
for each effect. Let’s go through the derivation together.
The first thing to notice is that we don’t need to hardcode runState
in call'runState'InsideError'
. It’s fine to just pass it in as a parameter:
elimStateInsideError
:: (forall x. (s, Eff (State s ': r) x) > Eff r (s, x))
> (s, Error (Eff (State s ': r)) a)
> Error (Eff r) (s, a)
elimStateInsideError _ (_, Throw e) = Throw e
elimStateInsideError elimState (s, Catch ma f) =
Catch (elimState (s, ma))
(\e > elimState (s, f e))
Note that the elimState
function must be rank2 so that we can use it on every instance of Catch
— there’s no guarantee that they’ll all be called to produce the same type.
The next step is to notice that there’s a homomorphism here; we transforming a (s, m a)
into m' (s, a)
, by somehow pushing the (,) s
bit through the monad. We can make that a little more clear by explicitly factoring it out:
elimStateInsideError
:: (f ~ ((,) s))
=> (forall x. f (Eff (State s ': r) x) > Eff r (f x))
> f (Error (Eff (State s ': r)) a)
> Error (Eff r) (f a)
This type is identical to before, we’ve just renamed (,) s
to f
. Let’s do the same renaming trick on Eff (State s ': r)
:
elimStateInsideError
:: ( f ~ ((,) s)
, m ~ Eff (State s ': r)
)
=> (forall x. f (m x) > Eff r (f x))
> f (Error m a)
> Error (Eff r) (f a)
and then again on Eff r
:
elimStateInsideError
:: ( f ~ ((,) s)
, m ~ Eff (State s ': r)
, n ~ Eff r
)
=> (forall x. f (m x) > n (f x))
> f (Error m a)
> Error n (f a)
As it stands, our current implementation of elimStateInsideError
will actually work for any m
and n
; so we can just get rid of those renames:
elimEffectInsideError
:: (f ~ ((,) s))
=> (forall x. f (m x) > n (f x))
> f (Error m a)
> Error n (f a)
elimEffectInsideError _ (_, Throw e) = Throw e
elimEffectInsideError elim (s, Catch ma f) =
Catch (elim (s, ma))
(\e > elim (s, f e))
Let’s now undo our uncurrying of our s > Error m a > ...
as (s, Error m a) > ...
. But since we’ve renamed s
away, we’re not allowed to reference it anymore. Instead, we can use f ()
, aka (s, ())
, which you’ll notice is isomorphic to s
.
elimEffectInsideError
:: (f ~ ((,) s))
=> (forall x. f (m x) > n (f x))
> f ()
> Error m a
> Error n (f a)
elimEffectInsideError _ _ Throw e = Throw e
elimEffectInsideError elim (s, ()) (Catch ma f) =
Catch (elim (s, ma))
(\e > elim (s, f e))
As one last step, we can rewrite the explicit destructuring of the f ()
parameter using its functor instance. Given the icecream cone function (<$) :: Functor f => a > f b > f a
, which replaces the contents of a functor, we can rewrite elimEffectInsideError
as follows:
elimEffectInsideError
:: (f ~ ((,) s))
=> (forall x. f (m x) > n (f x))
> f ()
> Error m a
> Error n (f a)
elimEffectInsideError _ _ Throw e = Throw e
elimEffectInsideError elim s (Catch ma f) =
Catch (elim $ ma <$ s)
(\e > elim $ f e <$ s)
and in doing so, are now fully functoragnostic, so we can get rid of the f
renaming now:
elimEffectInsideError
:: Functor f
=> (forall x. f (m x) > n (f x))
> f ()
> Error m a
> Error n (f a)
That was a lot of work! But we’ve bought ourselves a huge amount with this. Now elimEffectInsideError
is general enough that it supports eliminating any effect inside of Error
. The last step is to wrap this thing up into a typeclass, which Wu et al. call weave
:
class (∀ m. Functor m => Functor (e m)) => Effect e where
weave
:: (Functor f, Functor m, Functor n)
=> f ()
> (∀ x. f (m x) > n (f x))
> e m a
> e n (f a)
Don’t worry about the extra mentions of Functor
in this definition; they’re there for reasons we don’t care about today.
By giving an instance of Effect
for e
, we can now thread any other effects through e
. If we give an instance of Effect
for every effect, we get higherorder effects that can be run through one another in any order. Happy days!
This weave
transformation is the major contribution of Effect Handlers in Scope. And while it does indeed solve the problem of higherorder effects, such a thing brings with it a lot of boilerplate; we need to write an instance of Effect
for each of our effects, which is nontrivial and can’t be automated via today’s support for generics.
Back in the bad old days of free
, we would have had to model the firstorder version of Error e
above (the one that just has Throw
) as follows:
while State s
would look like this:
It’s gross, and you’d need to give Functor
instances for both. AND you can’t even derive Functor
for Error e
due to the existential.
The specifics here aren’t very important, but the point is that this was a bunch of boilerplate that got in the way of doing any work. The main contribution of Kiselyov and Ishii’s paper Freer Monads, More Extensible Effects is that we can use a free functor to automate away this boilerplate. The result is what puts the “simple” in freersimple
^{1}.
The free functor is called Coyoneda
^{2}, and it looks like this:
data Coyoneda f b where
Coyoneda :: f a > (a > b) > Coyoneda f b
instance Functor (Coyoneda f) where
fmap f' (Coyoneda fa f) = Coyoneda fa (f' . f)
As you can see, Coyoneda f
is a Functor
, even when f
itself isn’t. Coyoneda
just accumulates all of the fmap
s you wanted to do, and you can choose later what to do with the resulting function.
This got me to thinking. Maybe there’s a free Effect
that can likewise accumulate all of the weave
ing we’d like to do, so that library users don’t need to write those instances themselves.
The “trick” to making a free construction is to just make a datatype that stores each parameter to the characteristic function. In the Functor
example, you’ll notice a similarity between the types of (flipped) fmap
and Coyoneda
:
So let’s do the same thing, for weave
, and construct an equivalent datatype. Recall the type of weave
:
weave
:: (Functor f, Functor m, Functor n)
=> f ()
> (∀ x. f (m x) > n (f x))
> e m a
> e n (f a)
As a first attempt, let’s just turn this thing into a GADT and see what happens. I called it Yo
a little because it’s sorta like Coyoneda
, but mostly because naming things is hard.
data Yo e m a where
Yo :: Functor f
=> e m a
> f ()
> (forall x. f (m x) > n (f x))
> Yo e n (f a)
While this looks right, it turns out to be a nogo. We can’t actually give an instance of Effect
for Yo e
. We can get close, by realizing that the composition of any two functors is also a functor (given via the Compose
newtype). With that in mind, it’s just a little work to make all of the types line up:
instance Effect (Yo e) where
weave s' elim' (Yo e s elim) =
Yo e (Compose $ s <$ s')
(fmap Compose . elim' . fmap elim . getCompose)
Unfortunately, this definition doesn’t quite work. The problem is that weave s elim
is supposed to result in a e m a > e n (f a)
, but ours has type e m (g a) > e n (Compose f g a)
! By hardcoding that f
into the result of our GADT, we’ve painted ourselves into a corner. Similar problems would crop up if we wanted to give a Functor
instance to Yo e m
.
As is so often the case in this line of work, the solution is to make f
existential, and to take another function which is responsible for producing the desired type. We add a (f a > b)
parameter to Yo
, and make it return Yo e n b
:
data Yo e m a where
Yo :: Functor f
=> e m a
> f ()
> (forall x. f (m x) > n (f x))
> (f a > b)
> Yo e n b
We can now call getCompose
in this last function — in order to undo our trick of packing the two pieces of state together.
instance Effect (Yo e) where
weave s' elim' (Yo e s elim f) =
Yo e (Compose $ s <$ s')
(fmap Compose . elim' . fmap elim . getCompose)
(fmap f . getCompose)
Giving an instance of Functor (Yo e m)
can also riff on this final parameter, exactly in the same way that Coyoneda
did:
(The real implementation also needs hoist :: (forall x. m x > n x) > e m a > e n a
, which turns out to be a special case of weave
. This is left as an exercise for the ambitious reader.)
All that’s left is be able to lift e m a
s into Yo e m a
s. In every free construction I’ve ever seen, this operation is to just fill all of your parameters with identity — and this case is no different!
liftYo :: Functor m => e m a > Yo e m a
liftYo e = Yo e (Identity ()) (fmap Identity . runIdentity) runIdentity
We’re done! This funny Yo
construction is powerful enough to coalesce entire chains of effect interpreters into a single call. We haven’t done anything magical here — someone still needs to figure out what these functions actually mean for their interpretation. By collecting it all into a single place, we can cut down on boilerplate and find easier ways to express these concepts to the enduser.
But that’s a tale for another time, when we talk about polysemy
’s Tactics
machinery.
The approach here, and my original implementation are both lifted almost entirely from Luka Horvat’s plugin for simpleeffects
. All praise should be directed to him.
Last time we chatted about using a GHC plugin to run custom CoretoCore transformations on the programs that GHC is compiling. Doing so allows us to add custom optimization passes, and even other, more exotic things like rewriting lambda expression as categorical operations.
Today I want to talk about another sort of GHC plugin: typechecker plugins! TC plugins let you hook into GHC’s constraint machinery and help it solve domainspecific problems that it wouldn’t be able to otherwise. One of the more interesting examples of a TC plugin is nomeata’s ghcjustdoit — which will automatically generate a value of the correct type, essentially letting you leave implementations as “exercises for the compiler.”
Polysemy uses a TC plugin in order to improve typeinference. The result is that it can provide typeinference that is as good as mtl
’s, without succumbing to the pitfalls that accompany mtl
’s approach.
Consider the following program:
Such a thing compiles and runs no problem. There are no surprises here for any Haskell programmers who have ever run into mtl
. But the reason it works is actually quite subtle. If we look at the type of modify
we see:
which suggests that the s > s
function we pass to it should determine the s
parameter. But our function (+ 1)
has type Num a => a > a
, therefore the type of modify (+1)
should be this:
So the question is, why the heck is GHC willing to use a MonadState Int m
constraint to solve the wanted (MonadState s m, Num s)
constraint arising from a use of modify (+1)
? The problem feels analogous to this one, which doesn’t work:
Just because we have a Show Bool
constraint in scope doesn’t mean that a
is a Bool
! So how come we’re allowed to use our MonadState Int m
constraint, to solve a (MonadState s m, Num s)
? Completely analogously, we don’t know that s
is an Int
!
The solution to this puzzler is in the definition of MondState
:
Notice this  m > s
bit, which is known as a functional dependency or a fundep for short. The fundep says “if you know m
, you also know s
,” or equivalently, “s
is completely determined by m
.” And so, when typechecking foo
, GHC is asked to solve both MonadState Int m
and (Num s, MonadState s m)
. But since there can only be a single instance of MonadState
for m, this means that MonadState Int m
and MonadState s m
must be the same. Therefore s ~ Int
.
This is an elegant solution, but it comes at a cost — namely that we’re only allowed to use a single MonadState
at a time! If you’re a longtime Haskell programmer, this probably doesn’t feel like a limitation to you; just stick all the pieces of state you want into a single type, and then use some classy fields to access them, right? Matt Parsons has a blog post on the pain points, and some bandages, for doing this with typed errors. At the end of the day, the real problem is that we’re only allowed a single MonadError
constraint.
Polysemy “fixes the glitch” by just not using fundeps. This means you’re completely free to use as many state, error, and whatever effects you want all at the same time. The downside? Typeinference sucks again. Indeed, the equivalent program to foo
in polysemy
doesn’t compile by default:
• Ambiguous use of effect 'State'
Possible fix:
add (Member (State s0) r) to the context of
the type signature
If you already have the constraint you want, instead
add a type application to specify
's0' directly, or activate polysemyplugin which
can usually infer the type correctly.
• In the expression: modify (+ 1)
In an equation for ‘foo'’: foo' = modify (+ 1)
This situation blows chunks. It’s obvious what this program should do, so let’s just fix it.
Let’s forget about the compiler for a second and ask ourselves how the Human Brain Typechecker(TM) would typecheck this problem. Given the program:
A human would look at the modify
here, and probably run an algorithm similar to this:
State
is modify
running over here?Num
.Member (State Int) r
constraint in scope.modify
is running over State Int
.Pretty great algorithm! Instead, here’s what GHC does:
State
is modify
running over here?Num
.(Num n, Member (State n) r)
constraint.Member (State Int) r
constraint here?And then worse, it won’t compile because the generated n
type is now ambiguous and not mentioned anywhere in the type signature!
Instead, let’s use a TC plugin to make GHC reason more like a human when it comes to Member
constraints. In particular, we’re going to mock the fundep lookup algorithm:
Member (effect a) r
constraintMember (effect b) r
a ~ b
constraint, allowing GHC to use the given Member (effect b) r
constraint to solve the wanted Member (effect a) r
At its heart, a TC plugin is a value of type TcPlugin
, a record of three methods:
data TcPlugin = forall s. TcPlugin
{ tcPluginInit :: TcPluginM s
, tcPluginSolve :: s > [Ct] > [Ct] > [Ct] > TcPluginM TcPluginResult
, tcPluginStop :: s > TcPluginM ()
}
The tcPluginInit
field can be used to allocate a piece of state that is passed to the other two records, and tcPluginStop
finalizes that state. Most plugins I’ve seen use the s
parameter to lookup the GHC representation of classes that they want to help solve. However, the most interesting bit is the tcPluginSolve
function.
tcPluginSolve
takes three lists of Ct
s, which are different varieties of constraints relevant to the problem.
From these three lists, we are expected to provide a TcPluginResult
, which for our purposes is a pair of new Ct
s we’d like GHC to solve; and a list of the Ct
s we solved, along with the corresponding dictionaries. Returning two empty lists here signals to GHC “I can’t do any more work!”
So let’s get to work. The first thing we need to do is get our hands on the Member
class we want to solve. In polysemy
, Member
is actually just a type synonym for a few other typeclasses; so the real typeclass we’d like to solve for is called Find
.
As a brief aside on the Find
class, its definition is this:
and it means “lookup the index of t
inside r
”. In Polysemy, r
is usually left polymorphic, for the same reasons that we leave the m
polymorphic in MonadState s m
.
Anyway, we want to find the Find
class. We can do this by writing a function for our tcPluginInit
function:
findFindClass :: TcPlugin Class
findFindClass = do
md < lookupModule
(mkModuleName "Polysemy.Internal.Union")
(fsLit "polysemy")
find_tc < lookupName md $ mkTcOcc "Find"
tcLookupClass find_tc
We first lookup the defining module, here Polysemy.Internal.Union
in package polysemy
. We then lookup the Find
name in that module, and then lookup the class with that name. By setting findFindClass
as our tcPluginInit
, our tcPluginSolve
function will receive the Find
class as a parameter.
Before diving into tcPluginSolve
, we’re going to need some helper functions.
allFindCts :: Class > [Ct] > [(CtLoc, (Type, Type, Type))]
allFindCts cls cts = do
ct < cts
CDictCan { cc_tyargs = [ _, r, eff ] } < pure ct
guard $ cls == cc_class cd
let eff_name = getEffName eff
pure (ctLoc ct, (eff_name, eff, r))
getEffName :: Type > Type
getEffName t = fst $ splitAppTys t
The allFindCts
function searches through the Ct
s for Find
constraints, and unpacks the pieces we’re going to need. We first pattern match on whether the Ct
is a CDictCan
, which corresponds to everyday typeclassy constraints. We ensure it has exactly three type args (Find
takes a kind, and then the two parameters we care about), and ensure that this class is the cls
we’re looking for.
We return four things for each matching Ct
:
CtLoc
— corresponding to where the constraint came from. This is necessary to keep around so GHC can give good error messages if things go wrong.State
.t
parameter in a Find
constraint. In the ongoing example, State s
.r
in the Find
constraint).So remember, our idea is “see if there is exactly one matching given Find
constraint for any wanted Find
constraint — and if so, unify the two.”
findMatchingEffect
:: (Type, Type, Type)
> [(Type, Type, Type)]
> Maybe Type
findMatchingEffect (eff_name, _, r) ts =
singleListToJust $ do
(eff_name', eff', r') < ts
guard $ eqType eff_name eff_name'
guard $ eqType r r'
pure eff
singleListToJust :: [a] > Maybe a
singleListToJust [a] = Just a
singleListToJust _ = Nothing
findMatchingEffect
takes the output of allFindCts
for a single wanted constraint, and all of the given constraints, and sees if there’s a single match between the two. If so, it returns the matching effect.
We need one last helper before we’re ready to put everything together. We wanted to be able to generate new wanted constraints of the form a ~ b
. Emitting such a thing as a new wanted constraint will cause GHC to unify a
and b
; which is exactly what we’d like in order to convince it to use one given constraint in place of another.
mkWanted :: CtLoc > Type > Type > TcPluginM (Maybe Ct)
mkWanted loc eff eff' = do
if eqType (getEffName eff) (getEffName eff')
then do
(ev, _) < unsafeTcPluginTcM
. runTcSDeriveds
$ newWantedEq loc Nominal eff eff'
pure . Just $ CNonCanonical ev
else
pure Nothing
What’s going on here? Well we check if the two effects we want to unify have the same effect name. Then if so, we use the wanted’s CtLoc
to generate a new, derived wanted constraint of the form eff ~ eff'
. In essence, we’re promising the compiler that it can solve the wanted if it can solve eff ~ eff'
.
And finally we’re ready to roll.
solveFundep :: Class > [Ct] > [Ct] > [Ct] > TcPluginM TcPluginResult
solveFundep find_cls giv _ want = do
let wanted_effs = allFindCts find_cls want
given_effs = fmap snd $ allFindCts find_cls giv
eqs < forM wanted_effs $ \(loc, e@(_, eff, r)) >
case findMatchingEffect e given_effs of
Just eff' > mkWanted loc eff eff'
Nothing > do
case splitAppTys r of
(_, [_, eff', _]) > mkWanted loc eff eff'
_ > pure Nothing
pure . TcPluginOk [] $ catMaybes eqs
We get all of the Find
constraints in the givens and the wanteds. Then, for each wanted, we see if there is a singularly matching given, and if so, generate a wanted constraint unifying the two.
However, if we don’t find a singularly matching effect, we’re not necessarily in hot water. We attempt to decompose r
into a type constructor and its arguments. Since r
has kind [k]
, there are three possibilities here:
r
is a polymorphic type variable, in which case we can do nothing.r
is '[]
, so we have no effects to possibly unify, and so we can do nothing.r
has form e ': es
, in which case we attempt to unify e
with the wanted.What’s going on with this? Why is this bit necessary? Well, consider the case where we want to run our effect stack. Let’s say we have this program:
foo' :: Member (State Int) r => Sem r ()
foo' = modify (+ 1)
main :: IO ()
main = do
result < runM . runState 5 $ foo'
print result
The type of runM . runState 5
is Num a => Sem '[State a, Lift IO] x > IO x
. But foo'
still wants a State Int
constraint, however, main
doesn’t have any givens! Instead, the wanted we see is of the form Find '[State a, Lift IO] (State Int)
, and so we’re justified in our logic above to unify State Int
with the head of the list.
Finally we can bundle everything up:
plugin :: Plugin
plugin = defaultPlugin
{ tcPlugin = const $ Just fundepPlugin
}
fundepPlugin :: TcPlugin
fundepPlugin = TcPlugin
{ tcPluginInit = findFindClass
, tcPluginSolve = solveFundep
, tcPluginStop = const $ pure ()
}
and voila, upon loading our module via the fplugin
flag, GHC will automatically start solving Member
constraints as though they were fundeps!
This isn’t the whole story; there are still a few kinks in the implementation for when your given is more polymorphic than your wanted (in which case they shouldn’t unify), but this is enough to get a feeling for the idea. As always, the full source code is on Github.
As we’ve seen, TC plugins are extraordinarily powerful for helping GHC solve domainspecific problems, and simultaneously quite easy to write. They’re not often the right solution, but they’re a great thing to keep in your tool belt!
]]>polysemy
. Getting it to be fast has been really hard. It’s clearly possible, but for the longest time I was afraid I’d need to fork the compiler. And that didn’t seem like a thing that would attract a largeuser base.
For example, polysemy
benefits greatly from a late specialization pass, and would benefit further from aggressive inlining after the late specialization pass. Unfortunately, GHC doesn’t do any inlining passes after flatespecialise
, so it feels like we’re stuck on this front.
Thankfully, the eternally helpful mpickering pointed me at the GHC plugin interface, which has support for directing the optimizer to do things it wouldn’t usually.
Today, I want to talk about how I made the polysemyplugin
run two optimizations that greatly benefit code written with polysemy
.
The gist of writing a GHC plugin is to import ghc:Plugins
, and to create an exported toplevel bind plugin :: Plugin
. Other code can use this plugin by specifying the fplugin=
option to point at this module.
Plugin
s have a field called installCoreToDos
with type [CommandLineOption] > [CoreToDo] > CoreM [CoreToDo]
. A CoreToDo
is GHC’s oddlynamed concept of a compiler pass over Core. This function receives the list of CoreToDo
s it was planning to do, and you can change that list if you want.
By default there’s a big flowchart of CoreToDo
s that the compiler will run through in order to compile a module. The optimization level (O
) effects which passes get run, as do many of the individual optimization flags.
By attaching our extra optimization passes to the end of this list, we can make GHC optimize harder than it usually would. But because most code won’t benefit from this extra work, we guard the new optimization passes behind two conditions. The user must be compiling with optimizations turned on, and the module being compiled must import Polysemy
.
Checking for the optimization level is simple enough, we can pull it out of the DynFlags
(GHC’s datatype that stores all of the crazy flags you might have set):
dflags < getDynFlags
case optLevel dflags of
0 >  corresponds to O0
1 >  corresponds to O
2 >  corresponds to O2
Checking, however, for presence of the Polysemy
module is less straightforward. Honestly I’m not sure what the “correct” solution to this problem is, but I’m pretty happy with the disgusting hack I came up with.
The CoreM
monad (which is what you’re running in when you install CoreToDo
s) doesn’t exactly have stellar documentation. It has access to the HscEnv
, which in turn has a hsc_mod_graph :: ModuleGraph
— which sounds like the sort of thing that might contain the modules currently in scope. Unfortunately this is not so; hsc_mod_graph
contains the modules defined in the package being defined.
If we could get our hands on the ModGuts
(GHC’s representation of a Haskell module), we could inspect its mg_deps :: Dependencies
field, which would surely have what we need. Unfortunately, I couldn’t find any easy way to get access to the ModGuts
in a CoreM
without jumping through several hoops.
But one thing caught my eye! There is an operation getVisibleOrphanMods :: CoreM ModuleSet
, which after some investigation, turns out to contain any module in scope (directly or otherwise) that defines an orphan instance.
It’s disgusting, but I made an internal module in polysemy
that contains the following definitions:
and the corresponding orphan instance in the module I wanted to track in my plugin:
{# OPTIONS_GHC fnowarnorphans #}
import Polysemy.Internal.PluginLookup
instance PluginLookup Plugin
I know, I know. But because the module that defines these things is internal, there’s no way for anyone else to define instances of this thing. So at least it’s a safe use of orphans.
Sure enough, this little gem is enough to get my module noticed by getVisibleOrphanMods
, and so I can check for the presence of my module via:
mods < moduleSetElts <$> getVisibleOrphanMods
if any ((== mkModuleName "Polysemy.Internal") . moduleName) mods
then ...
And voila, we’re now ready to install our extra CoreToDo
s. In this case, I just cargoculted a few from GHC’s existing passes list. Namely I added a CoreDoSpecialising
, a CoreDoStaticArgs
, yet another CoreDoSpecialising
, and a bevvy of simplification passes. The result might be overkill, but it’s sufficient to massage this scary core into this — and get roughly a 1000x runtime performance improvement in the process.
But this lack of optimization passes wasn’t the only thing slowly polysemy
down. The library depends on several library and userwritten functions that are complicated and necessarily selfrecursive.
GHC is understandably hesitant to inline recursive functions — the result would diverge — but as a sideeffect, it seems to refuse to optimize big recursive functions whatsoever. For my purposes, this meant that most of the crucial machinery in the library was being completely ignored by GHC’s best optimization pass.
I accidentally stumbled upon a fix. To illustrate, let’s pretend like the factorial
function is my complicated selfrecursive function. The optimizer would refuse to fire when the function was written like this:
factorial :: Int > Int
factorial 0 = 1
factorial n = n * factorial (n  1)
{# INLINE factorial #}
But, a minor syntactic tweak was enough to trick the compiler into optimizing it:
factorial :: Int > Int
factorial 0 = 1
factorial n = n * factorial' (n  1)
{# INLINE factorial #}
factorial' :: Int > Int
factorial' = factorial
{# NOINLINE factorial' #}
Now factorial
is no longer selfrecursive. It’s mutually recursive, and for some reason, the NO/INLINE
pragmas are enough to keep GHC off our back. This is an easy fix, but it’s annoying boilerplate. And I hate annoying boilerplate.
Early versions of polysemy
shipped with a function inlineRecursiveCalls :: Q [Dec] > Q [Dec]
which would use Template Haskell to transform our slow, selfrecursive factorial
above into the fast, mutuallyexclusive version below. While this worked, it was unsatisfactory; TH splices don’t play nicely with haddock or with text editors.
But this isn’t something that regular users should need to care about! Optimization concerns should lie solely in the responsibility of librarywriters — not in their users. It seemed like a good opportunity to write a custom optimization pass, and like any curious boy, I took it.
We can use the CoreDoPluginPass :: String > (ModGuts > CoreM ModGuts) > CoreToDo
constructor to inject our own ModGuts
transformation as an optimization pass. Recall that ModGuts
is GHC’s definition of a module. For our purposes, we’re interested in its mg_binds
field, which contains all of the valuelevel things in the module.
A mg_binds
is a [Bind CoreBndr]
, and a Bind CoreBndr
is a pair of a name and its corresponding expression definition. More specifically, the definition for Bind
is:
A nonrecursive binding is something like x = 5
, while a recursive binding is anything that is self or mutuallyrecursive.
So, if we want to transform selfrecursive calls into mutuallyrecursive calls, we first need to identify if a definition is selfrecursive. Fortunately, the incredible syb
library comes in handy here, as it lets us write small queries that get lifted over the entire datatype.
We can write containsName
using everywhere
, mkQ
and the Any
monoid to determine if the CoreBndr
name is used anywhere in the CoreExpr
^{1}.
containsName :: CoreBndr > CoreExpr > Bool
containsName n =
getAny .
everything
(<>)
(mkQ (Any False) matches)
where
matches :: CoreExpr > Any
matches (Var n')  n == n' = Any True
matches _ = Any False
If containsName b e
is True
for any (b, e)
in the mg_binds
, then that function is selfrecursive. As such, we’d like to generate a new NOINLINE
bind for it, and then replace the original selfcall to be to this new bind.
Replacing a call is just as easy as finding the recursion:
replace :: CoreBndr > CoreBndr > CoreExpr > CoreExpr
replace n n' = everywhere $ mkT go
where
go :: CoreExpr > CoreExpr
go v@(Var nn)
 nn == n = Var n'
 otherwise = v
go x = x
But creating the new binding is rather more work; we need to construct a new name for it, and then fiddle with its IdInfo
in order to set the inlining information we’d like.
loopbreaker :: Uniq > CoreBndr > CoreExpr > [(Var, CoreExpr)]
loopbreaker newUniq n e =
let Just info = zapUsageInfo $ idInfo n
info' = setInlinePragInfo info alwaysInlinePragma
n' = mkLocalVar
(idDetails n)
(mkInternalName newUniq (occName n) noSrcSpan)
(idType n)
$ setInlinePragInfo vanillaIdInfo neverInlinePragma
in [ (lazySetIdInfo n info', replace n n' e)
, (n', Var n)
]
First we use zapUsageInfo
to make GHC forget that this binding is selfrecursive^{2}, and then use setInlinePragInfo
to spiritually inject a {# INLINE n #}
pragma onto it. We then construct a new name (a nontrivial affair; loopbreaker
above is simplified in order to get the new Uniq
to ensure our variable is hygienic), and replace the selfrecursive call with a call to the new name. Finally, we need to spit out the two resulting binds.
There’s a little machinery to call loopbreaker
on the mg_guts
, but it’s uninteresting and this post is already long enough. If you’re interested, the full code is available on Github. In total, it’s a little less than 100 lines long; pretty good for adding a completely new optimization pass!
That’s enough about writing plugins for improving performance; in the next post we’ll discuss typechecker plugins, and how they can be used to extend GHC’s constraintsolving machinery. Stay tuned!
polysemy
to be optimized much better. The improvements will be available in 8.10.1.
All of this seems like a good opportunity to share what I’ve learned, so today let’s talk about GHC’s specialization pass. This optimization is more popularly known as “the reason why mtl
is so fast.”
At a high level, the specialization pass is responsible for optimizing away uses of adhoc polymorphism (typeclasses) in Haskell source code. When fspecialise
is enabled, GHC will make a monomorphic copy of every polymorphic method — one for every unique type it’s called with. The result should feel similar to anyone who’s written modern C++, as it’s completely analogous to how templates work.
While polymorphic functions are great for humans to write, they’re significantly slower for machines to execute, since you need to pass around vtables and perform dynamic dispatches, and all sorts of crazy things. This is exactly the purpose of GHC’s specialization pass, to simply get rid of all of that machinery and keep only the pieces that are explicitly used.
Let’s take an example. Consider the following program:
{# LANGUAGE FlexibleContexts #}
{# OPTIONS_GHC
ddumpsimpl
dsuppressidinfo
dsuppresscoercions
dsuppressmoduleprefixes
fforcerecomp
#}
import Control.Monad.State.Class
import qualified Control.Monad.Trans.State as S
countdown :: S.StateT Int IO ()
countdown = do
v < get
case v of
0 > pure ()
_ > do
put $ v  1
countdown
main :: IO ()
main = S.evalStateT countdown 10
When compiled via ghc Example.hs O fnospecialise
^{1}, we can look directly at the resulting Core of this program. If you’re unfamiliar with Core, it’s GHC’s intermediate language between sourcelevel Haskell and the generated machine code. Core differs in two notable ways from source Haskell: its evaluation is explicit via case
expressions, and both types and typeclass instances are explicitly passed around.
Anyway, here’s the relevant Core for our above program:
Rec {
 RHS size: {terms: 14, types: 13, coercions: 0, joins: 0/0}
$wcountdown
:: Int# > State# RealWorld > (# State# RealWorld, ((), Int) #)
$wcountdown
= \ (ww_s49L :: Int#) (w_s49I :: State# RealWorld) >
case ww_s49L of ds_X2I1 {
__DEFAULT > $wcountdown (# ds_X2I1 1#) w_s49I;
0# > (# w_s49I, lvl1_r4ap #)
}
end Rec }
 RHS size: {terms: 12, types: 29, coercions: 0, joins: 0/0}
main1 :: State# RealWorld > (# State# RealWorld, () #)
main1
= \ (s_a2Ks :: State# RealWorld) >
case $wcountdown 10# s_a2Ks of { (# ipv_a2Kv, ipv1_a2Kw #) >
(# ipv_a2Kv,
case ipv1_a2Kw of { (a1_a2I6, ds2_a2I7) > a1_a2I6 } #)
}
As you can see, this is very short and to the point. Reading Core is a bit of an art, but the gist of it is this: main1
calls $wcountdown
, which recursively calls itself, until the value of w_s49I
is 0#
when it stops. It’s probably exactly the same code you’d write by hand, if for some reason you were writing Core by hand.
Our program above is written directly against transformers
, but nobody actually writes code against transformers
in the real world. Choosing a concrete monad transformer stack is limiting, and at the same time, prevents you from restricting access to pieces of the stack. Instead, we’re encouraged to write code against abstract monad capabilities, traditionally mtl
.
So let’s subtly change the type of countdown
above:
Nothing else in the program needs to change. Let’s now compile this program again via ghc Example.hs O fnospecialise
. The result is horrendously worse Core:
Rec {
 RHS size: {terms: 35, types: 47, coercions: 0, joins: 0/2}
$wcountdown
:: forall (m :: * > *).
Applicative m =>
(forall a b. m a > (a > m b) > m b)
> (forall a b. m a > m b > m b)
> m Int
> (Int > m ())
> m ()
$wcountdown
= \ (@ (m_s4WK :: * > *))
(ww_s4WR :: Applicative m_s4WK)
(ww1_s4WS :: forall a b. m_s4WK a > (a > m_s4WK b) > m_s4WK b)
(ww2_s4WT :: forall a b. m_s4WK a > m_s4WK b > m_s4WK b)
(ww3_s4WX :: m_s4WK Int)
(ww4_s4WY :: Int > m_s4WK ()) >
let {
lvl6_s4W1 :: m_s4WK ()
lvl6_s4W1
= $wcountdown
@ m_s4WK ww_s4WR ww1_s4WS ww2_s4WT ww3_s4WX ww4_s4WY } in
let {
lvl7_s4W2 :: m_s4WK ()
lvl7_s4W2 = pure @ m_s4WK ww_s4WR @ () () } in
ww1_s4WS
@ Int
@ ()
ww3_s4WX
(\ (v_a192 :: Int) >
case v_a192 of { I# ds_d3xJ >
case ds_d3xJ of ds1_X3xT {
__DEFAULT >
ww2_s4WT @ () @ () (ww4_s4WY (I# (# ds1_X3xT 1#))) lvl6_s4W1;
0# > lvl7_s4W2
}
})
end Rec }
 RHS size: {terms: 17, types: 32, coercions: 21, joins: 0/0}
main1 :: State# RealWorld > (# State# RealWorld, () #)
main1
= \ (s_a3z5 :: State# RealWorld) >
case (((($wcountdown
@ (StateT Int IO)
lvl_r4VN
lvl1_r50i
lvl2_r50j
(lvl3_r50k `cast` <Co:13>)
lvl4_r50l)
`cast` <Co:4>)
lvl5_r50m)
`cast` <Co:4>)
s_a3z5
of
{ (# ipv_a3z8, ipv1_a3z9 #) >
(# ipv_a3z8,
case ipv1_a3z9 of { (a1_a3y3, ds2_a3y4) > a1_a3y3 } #)
}
Yikes! What a mess! It’s amazing how much of a difference that one type signature made! Our simple mtl
program above has turned into an unholy mess of passing around overlypolymorphic functions. We’ve paid an awful price to abstract away our monad stack, even though the actual program being run didn’t change!
Of course, this isn’t a real problem in the wild. Compile the program again, this time without the fnospecialise
flag^{2} — ghc Example.hs O
:
Rec {
 RHS size: {terms: 14, types: 13, coercions: 0, joins: 0/0}
$w$scountdown
:: Int# > State# RealWorld > (# State# RealWorld, ((), Int) #)
$w$scountdown
= \ (ww_s5dY :: Int#) (w_s5dV :: State# RealWorld) >
case ww_s5dY of ds_X3xU {
__DEFAULT > $w$scountdown (# ds_X3xU 1#) w_s5dV;
0# > (# w_s5dV, lvl1_r5jV #)
}
end Rec }
 RHS size: {terms: 12, types: 29, coercions: 0, joins: 0/0}
main1 :: State# RealWorld > (# State# RealWorld, () #)
main1
= \ (s_X3Bw :: State# RealWorld) >
case $w$scountdown 10# s_X3Bw of { (# ipv_a3z9, ipv1_a3za #) >
(# ipv_a3z9,
case ipv1_a3za of { (a1_a3y4, ds2_a3y5) > a1_a3y4 } #)
}
Whew! We’re back to the speedy program we started with. fspecialise
has done the hard work of transforming our abstract code into fast code for us — exactly as a good compiler should.
It’s amazing how drastic the differences are in the generated code, just from flipping a switch!
Before we can discuss exactly how this transformation helps, we need to first go over some details of how GHC implements a few sourcelevel Haskell features. The first is dictionaries, which are how typeclass dispatch works.
Consider the following program in sourcelevel Haskell:
class Eq a where
(==) :: a > a > Bool
instance Eq () where
(==) _ _ = True
equate :: Eq a => a > a > Bool
equate a1 a2 = a1 == a2
main :: IO ()
main = print $ equate () ()
Internally, GHC will generate the equivalent program:
data Eq a = Eq  #1
(a > a > Bool)
(==) :: Eq a > a > a > Bool
(==) dEq'a =  #2
case dEq'a of
Eq eqMethod > eqMethod
eqUnit :: Eq ()  # 3
eqUnit = Eq
(\_ _ > True)
equate :: Eq a > a > a > Bool  #4
equate dEq'a a1 a2 = (==) dEq'a a1 a2  #5
main :: IO ()
main = print $ equate eqUnit () ()  #6
Notably, the following changes occur:
class Eq a
is transformed into data Eq a
, with each class method becoming a field.(==)
receives a new Eq a
parameter, and becomes a function which pattern matches on it.instance Eq ()
becomes a toplevel declaration of an Eq ()
value.Eq a
constraint on equate
becomes a parameter of the new Eq a
datatype.(==)
in equate
receives the new dEq'a
parameter.equate
at type a ~ ()
in main
receives the new toplevel eqUnit :: Eq ()
value as an argument.We call the values eqUnit
and dEq'a
dictionaries. More precisely, a dictionary is any value whose type is a data type corresponding to a typeclass. Dictionaries do not exist in sourcelevel Haskell, only in the generated Core. In real Core, dictionaries have names that start with $d
, but we’ll omit the leading $
today, so we don’t get it confused with the ($)
operator.
From all of this that we see that, under the hood, class
definitions are just data
definitions, and that constraints are just invisible parameters.
Consider the following program:
Because we’re scrutinizing on a constant value here, the result of this expression must always be foo
. As such, it’s safe to replace the entire pattern match expression with foo
:
This transformation is known as the case of known constructor optimization. While humans would never write such a thing by hand, expressions like these often come up as the result of other optimizing transformations.
One final thing to discuss is GHC’s term rewriting mechanism, known as rewrite rules.
Rewrite rules are little statements that say “this thing can be written as that thing.” Whenever GHC encounters this thing, it will duly rewrite it as that thing. The motivating use case is to allow library authors to implement domainspecific optimizations — such as ensuring composing functions don’t generate intermediate structures. You might have heard of “list fusion,” which is implemented via rewrite rules.
Rewrite rules must preserve the type of the expression, but besides that are free to do anything they’d like. Just as an example, we can write a program which prints hello world
seemingly from nowhere:
Compiling this with O0
won’t print any message when run, but will print hello world
when compiled with O
. Spooky!
When XTypeApplications
is enabled, rewrite rules are allowed to match on types too! For example, the following program will print 2 1 1
:
{# LANGUAGE AllowAmbiguousTypes #}
{# LANGUAGE RankNTypes #}
{# LANGUAGE TypeApplications #}
magic :: forall b a. a > a
magic = id
{# NOINLINE magic #}
{# RULES "it's magic!"
forall (a :: Int).
magic @String a = a + 1
#}
main :: IO ()
main = do
print $ magic @String (1 :: Int)
print $ magic @Bool (1 :: Int)
print $ magic @String (1 :: Integer)
Of course, you shouldn’t abuse rewrite rules like this — make sure any rules you write are just more efficient versions of an equivalent program — but it’s helpful to demonstrate what’s going on.
Internally, GHC uses lots of rewrite rules itself! All of its constantfolding (e.g. replacing 2 + 3
with 5
at compile time) is done via rewrite rules, which helps separate that logic from the main compiler.
So with all of that background information out of the way, we’re finally ready to talk about how the specializer works.
Recall our our original mtl
program, transformed so it has its dictionaries explicitly passed:
countdown :: Monad m > MonadState Int m > m ()
 There is a `Monad m` constraint on `MonadState s m`, which is where this
 extra constraint comes from.
countdown dMonad'm dMonadState'm = do dMonad'm
v < get dMonadState'm
case v of
0 > pure dMonad'm ()
_ > do dMonad'm
put dMonadState'm $ v  1
countdown dMonad'm dMonadState'm
main :: IO ()
main =
S.evalStateT
(countdown
(dMonadStateT dMonadIO)
(dMonadStateStateT dMonadIO))
10
When fspecialise
is set, the specializer will look for any calls to polymorphic functions with all of their dictionaries saturated by “interesting” dictionaries. The dictionaries dMonad'm
and dMonadState'm
in countdown
aren’t interesting, since they’re just opaque dictionary variables; we don’t know anything about them.
However, GHC notices that countdown
is called with m ~ StateT Int IO
, and that all of its dictionaries are statically known. As such, it emits a specialized version of countdown
, monomorphized to StateT Int IO ()
:
scountdown_StateT :: StateT Int IO ()
scountdown_StateT = do (dMonadStateT dMonadIO)
v < get (dMonadStateStateT dMonadIO)
case v of
0 > pure (dMonadStateT dMonadIO) ()
_ > do (dMonadStateT dMonadIO)
put (dMonadStateStateT dMonadIO) $ v  1
scountdown_StateT
In addition, the specializer will emit a rewrite rule:
{# RULES "SPEC countdown @ (StateT Int IO)"
forall (dMonad'm :: Monad (StateT Int IO))
(dMonadState'm :: MonadState Int (StateT Int IO)).
countdown @(StateT Int IO) dMonad'm dMonadState'm =
scountdown_StateT
#}
This rewrite rule will find any call to countdown at m ~ StateT Int IO
, ignore the dictionaries passed to it, and replace the entire expression with the specialized scountdown_StateT
function.
In particular, this means that main
becomes:
The rule takes advantage of the fact that dictionaries are known to be consistent (all expressions for a dictionary of a given type eventually evaluate to the same record), so it can completely ignore its two dictionary arguments. However, in principle there’s absolutely no reason this same technique couldn’t be used to specialize on other, nondictionary, arguments!
Notice now that pure
, get
, and the two doblocks in scountdown_StateT
are now called with interesting dictionaries, so pure
, get
and >>=
can now all also be specialized at StateT Int IO
.
Eventually the concrete dictionaries and corresponding specializations have propagated throughout the entire program. The optimizer can take advantage of two other properties now, namely that class methods were already transformed into pattern matches, and that all of the dictionaries are statically known. Which means, we have created several places in which we can now case of known case!
For example, let’s consider the get
in countdown
. It now looks something like this:
v < case MonadState (StateT $ \s > implOfPureForIO (s, s)) ... of
MonadState getMethod _ _ > getMethod
which can obviously be simplified to
This is already a great improvement! But it gets better, recall that we’re binding in the StateT
monad, which in turn is calling bind in IO
. But bind in IO
is itself implemented as a pattern match, and so caseofknownconstructor applies there too!
The end result is that GHC spins for a while, alternatingly specializing, inlining, caseofknowncasing, and performing a few other optimizations. Each of these in turn opens up additional opportunities for the others to fire. After a few iterations of this, the resulting code is often orders of magnitude faster!
Everything described above is how the compiler behaves today in GHC 8.6.5 (and has, since like 2007 or something.) However, when digging into the performance of my free monad library polysemy
, I noticed that code written against my library wasn’t benefiting from the specialization pass! As a result, my library was performing anywhere between 10x and 1000x worse than mtl
, even though the eventual code being run was identical to mtl
.
Like our experiments above into mtl
, I was paying a performance cost for abstraction, even though the concrete program was identical.
Some investigation by the indefatigable mpickering pointed out that the specializer was failing to specialize. As it happens, the specializer is more than happy to optimize away dictionaries that are passed as the first nontype arguments to a function, but no others.
That means it will go home early if it runs into a function whose signature is of the form:
Again, humans would never write such a thing, but the optimizer is more than happy to spit these things out. Additionally, code like this often shows up whenever you use a newtype to get around GHC’s annoying error that it “does not (yet) support impredicative polymorphism”.
Anyway, all of this is to say that in 8.10.1, the specialization pass is now smart enough to specialize functions like foo
. As a result, we should see very real performance improvements in libraries like polysemy
and lens
, and, excitingly, any programs which use them!
I want to announce my new library polysemy
— it’s higherorder, noboilerplate, zerocost free monads.
I’ve been working nonstop on this for a few months now, so the commentary for what the heck is going on will come after a little vacation.
Stay tuned.
]]>As of today I have free, higherorder effects working. Unfortunately, they are not fast. I don’t think this is a fundamental limitation, merely that whatever code I’ve written isn’t amenable to GHC’s optimization process.
I’ve been hammering on this for about 50 hours now. It’s been driving me slowly crazy, and promised myself I’d stop if I hadn’t solved it by now. That being said, before putting this project to rest, I wanted to do a quick writeup detailing what I’ve learned, how everything fits together, and where I’m hoping someone will pick up the ball. Here’s the repository.
In the freersimple
model, effects are firstorder—meaning they are unable to embed Eff
computations inside of them. This is occasionally annoying, primarily when trying to write bracket
like effects.
You can sort of work around the problem by encoding your scoped computation as an interpretation of an effect, but this often comes at the cost of fixing the interpretations of the other effects you’re dealing with.
This fundamental limitation comes from the fact that freersimple
effects have kind * > *
. There’s nowhere in here to stick the Eff
stack you’re working in. You can kind of hack it in, but it never plays nicely.
The solution is given in the paper Effect Handlers in Scope, and implemented in the fusedeffects
package. The idea is to parameterize each of your effects with a monad—ie. they have kind (* > *) > * > *
. This parameter gets instantiated at the entire Eff
stack, as seen by the type of send :: Member e r => e (Eff r) a > Eff r a
While it’s an obvious insight, actually getting everything to play nicely is tricky. The primary issue is how do you push interpretations through these additional monadic contexts? For example, let’s consider a bracketesque effect:
data Bracket m x where
Bracket :: m a  ^ allocate
> (a > m ())  ^ deallocate
> (a > m x)  ^ use
> Bracket m x
Assume we want to push a State
effect through these m
s. What are the correct semantics for how the state is accumulated? Should any state changed in the deallocate
block count outside of the bracket? What should happen in the use
case if an exception is thrown and the rest of the block is shortcircuited?
Not only are we concerned with the semantics, but also the actual mechanism of propagating this state throughout the computation.
Effect Handlers in Scope introduces weave
in a typeclass, which is responsible for this statepropagation behavior. Statefulness for an effect is encoded as some arbitrarily chosen functor f
, and weave
describes how the effect should move that state through the effect. Behold:
class Effect e where
weave
:: Functor f
=> f ()
> (forall x. f (m x) > n (f x))
> e m a
> e n (n (f a))
The f ()
parameter is the current “state of the world”, and the rank2 thing is this distribution law. You can intuit the m
parameter being an effect stack with all of the effects present, and the n
parameter as being the same effect stack—but with the top of it taken off. To clarify, we could monomorphize it thusly:
weave
:: Functor f
=> f ()
> (forall x. f (Eff (g ': r) x) > Eff r (f x))
> e (Eff (g ': r)) a
> e (Eff r) (Eff r (f a))
This janky return type: e (Eff r) (Eff r (f a))
comes from the fact that Effect Handlers in Scope describes a traditional “free” (as opposed to freer) approach to a free monad. The last parameter of an effect is actually a continuation for the next piece of the computation. By mangling it, we’re ensuring the caller (which will be library code) respects the evaluation semantics.
weave
implicitly defines the evaluation semantics of an effect—it pins how state propagates through them, which in turn defines which pieces of the effect are observable to the outside.
Warning: This next section describes a jankyass solution to the problem. It’s clearly a hack and clearly not the right answer. But maybe by writing out what I did, someone with a keen eye can point out where I went wrong.
So this is all well and good. It works, but requires a lot of boilerplate. As presented in the paper, a new effect requires:
Functor
instanceMFunctor
instance (providing hoist :: forall x. (f x > g x) > e f a > e g a
)Effect
instance as above, and an additional method not described hereIf you’re following in the fusedeffects
tradition, for each interpretation you additionally need a new Carrier
type, with its own Functor
, Applicative
and Monad
instances, and then another typeclass tying the effect to its carrier.
fusedeffects
improves the \(O(n^2)\) MTL instance problem to \(O(n)\)—albeit with a big constant factor :(
This is a huge amount of work! I’ve said it before and I’ll say it again: ain’t nobody got time for that. If it feels like too much work, people aren’t going to do it. A solution that depends on humans not being lazy isn’t one that’s going to take off.
So wouldn’t it be nice if we could just all of this effect stuff for free?
Here’s where I admittedly went a little off the rails. The first step towards getting a freer Functor
less Monad
instance for Eff
is to define it in terms of its final encoding. I made the obvious changes to last time without thinking too much about it:
newtype Freer f a = Freer
{ runFreer
:: forall m
. Monad m
=> (forall x. f (Freer f) x > m x)
> m a
}
I have no idea if this is right, but at least it gives a Monad
instance for free. One limitation you’ll notice is that the continuation in runFreer
is a natural transformation, and thus it’s unable to change its return type.
That means interpretations like runError :: Eff (Error e ': r) a > Eff r (Either e a)
are surprisingly difficult to implement. More on this later—I just wanted to point out this flaw.
From here I followed Oleg and implemented Eff
as a type synonym, making sure to tie the knot and instantiate the m
parameter correctly:
But how can we get a free implementation for weave
?
It’s this mental thing I came up with, which is sort of like Coyoneda
but for weave
:
data Yo e m a where
Yo :: (Monad n, Functor f)
=> e m a
> f ()
> (forall x. f (m x) ~> n (f x))
> (f a > b)
> Yo e n b
In retrospect, I would not spend any more time on this approach—I’d just make people give an instance of
weave
for higherorder effects, and machinery to derive it automatically for firstorder ones.But then how can we get an
MFunctor
instance for free? You can’t just derive it generically—lots of higherorder effects want existentials, and thus can’t haveGeneric
instances.
This Yo
thing mirrors the definition of weave
pretty closely. The idea is that it can accumulate arbitrarily many weave
s into a single Yo
, and then dispatch them all simultaneously.
Some interesting points to note are that the state functor f
is existentialized, and that there is this final f a > b
parameter to make it play nicely with the Union
(more on this in a second.) We can implement weave
now by replacing the existing state functor with a Compose
of the new one and the old one.
weave
:: (Monad m, Monad n, Functor f)
=> f ()
> (forall x. f (m x) > n (f x))
> Union r m a
> Union r n (f a)
weave s' distrib (Union w (Yo e s nt f)) =
Union w $
Yo e (Compose $ s <$ s')
(fmap Compose . distrib . fmap nt . getCompose)
(fmap f . getCompose)
We can also use Yo
to get a free MFunctor
instance:
OK, all of this works I guess. But what’s with this weird f a > b
thing in Yo
that I mentioned earlier? Well recall the type of runFreer
, when instantiated at Union
:
The only way we can produce an m a
is via this rank2 thing, which is a natural transformation from Union r (Eff r)
to m
. In other words, it’s not allowed to change the type. We can’t just stuff the f
into the result and return an m (f a)
instead—this thing doesn’t form a Monad
! Fuck!
All of this comes to a head when we ask ourselves how to actually get the state out of such a contraption. For example, when we call runState
we want the resulting state at the end of the day!
The trick is the same one I used in the last post—we’re able to instantiate the m
inside runFreer
at whatever we like, so we just choose StateT s (Eff r)
and then run that thing afterwards. Again, this is very clearly a hack.
Because weave
is given freely, interpretations must eventually actually decide what that thing should look like. Some combinators can help; for example, this is the interface I came up with for implementing runBracket
:
runBracket
:: Member (Lift IO) r
=> (Eff r ~> IO)
> Eff (Bracket ': r) a
> Eff r a
runBracket finish = deep $ \start continue > \case
Bracket alloc dealloc use > sendM $
X.bracket
(finish $ start alloc)
(finish . continue dealloc)
(finish . continue use)
The deep
combinator gives you start
and continue
(which are the cleverly disguised results of weave
), and asks you to give a natural transformation from your effect into Eff r
.
The actual implementation of deep
isn’t going to win any awards in understandability, or in inlineability:
deep
:: (∀ m f y
. Functor f
=> (forall x. m x > (Eff r (f x)))
> (∀ i o. (i > m o) > f i > Eff r (f o))
> e m y
> Eff r (f y)
)
> Eff (e ': r) a
> Eff r a
deep transform (Freer m) = m $ \u >
case decomp u of
Left x > liftEff $ hoist (deep transform) x
Right (Yo eff state nt f) > fmap f $
transform
(deep transform . nt . (<$ state))
(\ff > deep transform . nt . fmap ff)
eff
Notice that whoever implements
transform
needs to give an equivalent of an implementation ofweave
anyway. Except that instead of only writing it once per effect, they need to write it per interpretation.
We can also give an implementation for runState
in terms of a StateT s
:
runState :: forall s r a. s > Eff (State s ': r) a > Eff r (s, a)
runState s = flip runStateT s . go
where
go :: forall x. Eff (State s ': r) x > StateT s (Eff r) x
go (Freer m) = m $ \u >
case decomp u of
 W T F
Left x > StateT $ \s' >
liftEff . weave (s', ())
(uncurry (flip runStateT))
$ hoist go x
Right (Yo Get state nt f) > fmap f $ do
s' < get
go $ nt $ pure s' <$ state
Right (Yo (Put s') state nt f) > fmap f $ do
put s'
go $ nt $ pure () <$ state
This is also completely insane. Notice the aptlynamed section W T F
, where for no reason I can discern other than satisfying the typechecker, we convert from a StateT s
to a (s, ())
and back again. But why?? Because this is what weave
wants—and we need to satisfy weave
because it’s the only way to change the type of a Union
—and we need to do that in order to reinterpret everything as a StateT s
.
There is a similarly WTF implementation for runError
. But worse is the combinator I wrote that generalizes this pattern for running an effect in terms of an underlying monad transformer:
shundle
:: ∀ a f t e r
. ( MonadTrans t
, ∀ m. Monad m => Monad (t m)
, Functor f
)
=> (∀ x. Eff r (f x) > t (Eff r) x)
> (∀ x. t (Eff r) x > Eff r (f x))
> (∀ x. f (t (Eff r) x) > Eff r (f x))
> f ()
> (∀ m tk y
. Functor tk
=> (∀ x. f () > tk (m x) > Eff r (f (tk x)))
> tk ()
> e m y
> t (Eff r) (tk y)
)
> Eff (e ': r) a
> Eff r (f a)
shundle intro finish dist tk zonk = finish . go
where
go :: ∀ x. Eff (e ': r) x > t (Eff r) x
go (Freer m) = m $ \u >
case decomp u of
Left x > intro . liftEff . weave tk dist $ hoist go x
Right (Yo e sf nt f) > fmap f $
zonk (\r > shundle intro finish dist r zonk . nt) sf e
Don’t ask why it’s called shundle
or why it has a variable called zonk
. It was funny to me at the time and I needed whatever smiles I could get to continue making progress. Believe it or not, this abomination does indeed generalize runState
:
statefully
:: (∀ m. e m ~> StateT s (Eff r))
> s
> Eff (e ': r) a
> Eff r (s, a)
statefully f s =
shundle
(StateT . const)
(flip runStateT s)
(uncurry $ flip runStateT)
(s, ()) $ \_ tk > fmap (<$ tk) . f
runState :: s > Eff (State s ': r) a > Eff r (s, a)
runState = statefully $ \case
Get > get
Put s' > put s'
runState
is actually quite reasonable! At this point I was willing to concede “worse is better”—that most of the time people only really care about StateT
esque state in their effects. And really the only thing they’ve been missing is bracket
. And we now have bracket
, and an easy way of doing StateT
esque state.
So I was going to hand it in to one library or another—if not for full marks, then at least for the participation trophy.
But then I ran my benchmark, and saw that it performs 10x slower than freersimple
. Even for effect stacks that don’t need to weave.
SHAME.
SORROW.
And I guess this is where I leave it. The machinery is clearly wrong, but amazingly it actually does what it says on the tin. Unfortunately it does it so slowly that I think the people who complain about the performance of free monads might actually have a point this time.
I’ve got my heart crossed that someone will swoop in here and say “here’s what you’ve done wrong” and it will be a minor change and then everything will optimize away and we will make merry and the freer monad revolution will be complete.
I’m very tired.
I think I have enough familiarity with the problem at this point to know what a solution would look like—even if I can’t find it myself:
runFreer
could produce a f a
if you asked for itWeave
would be a typeclass that effects authors would need to implement. But they could derive it for free if the m
parameter was unused. This would be the only instance necessary to implement by hand.handleRelayS
esque interface for defining interpreters.cata
.I’d also be happy if an existing solution (probably fusedeffects
) were given the tools to satisfy the above list. In particular, the handleRelayS
thing.