Proving Equivalence of Polysemy Interpreters
Let’s talk more about polysemy-check
. Last week we looked at how to do property-testing for a polysemy
effects’ laws. Today, we’ll investigate how to show that two interpretations are equivalent.
To continue with last week’s example, let’s say we have an effect that corresponds to having a Stack
that we can push and pop:
data Stack s m a where
Push :: s -> Stack s m ()
Pop :: Stack s m (Maybe s)
RemoveAll :: Stack s m ()
Size :: Stack s m Int
deriving instance Show s => Show (Stack s m a)
'Stack
deriveGenericK '
'Stack makeSem '
Since we’d like to prove the equivalence of two interpretations, we’ll need to first write two interpretations. But, to illustrate, we’re going simulate multiple interpreters via a single interpretation, parameterized by which bugs should be present in it.
purposes of brevity, we’ll write a single interpretation of Stack s
in terms of State [s]
, and then interpret that in two different ways. In essence, what we’re really testing here is the equivalence of two State
interpretations, but it’s good enough for an example.
We’ll start with the bugs:
data Bug
= PushTwice
| DontRemove
deriving stock (Eq, Ord, Show, Enum, Bounded)
instance Arbitrary Bug where
= elements [minBound..maxBound]
arbitrary
hasBug :: [Bug] -> Bug -> Bool
= flip elem hasBug
The PushTwice
bug, as you might expect, dispatched a Push
command so that it pushes twice onto the stack. The DontRemove
bug causes RemoveAll
to be a no-op. Armed with our bugs, we can write a little interpreter for Stack
that translates Stack s
commands into State [s]
commands, and then immediately runs the State
effect:
runStack :: [Bug]
-> Sem (Stack s ': r) a
-> Sem r ([s], a)
=
runStack bugs .) $ reinterpret $ \case
(runState [] Push s -> do
:)
modify (s PushTwice) $
when (hasBug bugs :)
modify (s
Pop -> do
<- gets listToMaybe
r drop 1)
modify (pure r
RemoveAll ->
DontRemove) $
unless (hasBug bugs
put []
Size ->
length gets
For our efforts we are rewarded: runState
gives rise to four interpreters for the price of one. We can now ask whether or not these interpreters are equivalent. Enter propEquivalent
:
With these interpreters out of the way, it’s time to answer our original question: are pureStack
and ioStack
equivalent? Which is to say, do they get the same answer for every possible program? Enter propEquivalent
:
prepropEquivalent :: forall effs r1 r2 f
. ( forall a. Show a => Show (f a)
forall a. Eq a => Eq (f a)
,
)=> ( Inject effs r1
Inject effs r2
, Arbitrary (Sem effs Int)
,
)=> (forall a. Sem r1 a -> IO (f a))
-> (forall a. Sem r2 a -> IO (f a))
-> Property
All of the functions in polysemy-check
have fun type signatures like this one. But despite the preponderance of forall
s, it’s not as terrible as you might think. The first ten lines here are just constraints. There are only two arguments to prepropEquivalent
, and they are the two interpreters you’d like to test.
This type is crazy, and it will be beneficial to understand it. There are four type variables, three of which are effect rows. We can distinguish between them:
effs
: The effect(s) you’re interested in testing. In our case, our interpreter handlesStack s
, so we leteffs ~ Stack s
.r1
: The effects handled by interpreter 1. Imagine we had an interpreter forStack s
that ran it viaIO
instead. In that case,r1 ~ '[State s, Embed IO]
.r2
The effects handled by interpreter 2.
The relationships that must between effs
, r1
and r2
are \(effs \subset r1\) and \(effs \subset r2\). When running prepropEquivalent
, you must type-apply effs
, because Haskell isn’t smart enough to figure it out for itself.
The other type variable to prepropEquivalent
is f
, which allows us to capture the “resulting state” of an interpreter. In runStack :: [Bug] -> Sem (Stack s ': r) a -> Sem r ([s], a)
, you’ll notice we transform a program returning a
into one returning ([s], a)
, and thus f ~ (,) [s]
. If your interpreter doesn’t produce any resulting state, feel free to let f ~ Identity
.
We’re finally ready to test our interpreters! For any equivalence relationship, we should expect something to be equivalent to itself. And this is true regardless of which bugs we enable:
prop_reflexive :: Property
= do
prop_reflexive <- arbitrary
bugs pure $
@'[Stack Int]
prepropEquivalent pure . run . runStack bugs) -- pure is getting us into IO
(pure . run . runStack bugs) (
So what’s happening here? Internally, prepropEquivalent
is generating random programs of type Sem '[Stack Int] Int
, and lifting that into Sem r1 Int
and Sem r2 Int
, and then running both interpreters and ensuring the result is the same for every program. Note that this means any fundamental non-determinism in your interpretation will break the test! Make sure to use appropriate interpreters for things like clocks and random values!
To strengthen our belief in prepropEquivalent
, we can also check that runStack
is not equivalent to itself if different bugs are enabled:
prop_bugsNotEquivalent :: Property
=
prop_bugsNotEquivalent $
expectFailure @'[Stack Int]
prepropEquivalent pure . run . runStack [PushTwice])
(pure . run . runStack []) (
Running this test will give us output like:
+++ OK, failed as expected. Falsified (after 3 tests):
([0,0],1) /= ([0],1)
The counterexample here isn’t particularly helpful (I haven’t yet figured out how to show
the generated program that fails,) but you can get a hint here by noticing that the stack (the [0,0]
) is twice as big in the first result as in the second.
Importantly, by specifying @'[Stack Int]
when calling prepropEquivalent
, we are guaranteed that the generated program will only use actions from Stack Int
, so it’s not too hard to track down. This is another win for polysemy
in my book — that we can isolate bugs with this level of granularity, even if we can’t yet perfectly point to them.
All of today’s code (and more!) is available as a test in polysemy-check
, if you’d like to play around with it. But that’s all for now. Next week we’ll investigate how to use polysemy-check
to ensure that the composition of your effects themselves is meaningful. Until then!