Review: Sorting with Bialgebras and Distributive Laws
Today’s paper is Sorting with Bialgebras and Distributive Laws by Hinze et al. The main thrust of the paper is that many well-known sorting algorithms are categorical duals of one another! As seems to be usual for the papers I review, there’s a lot of recursion scheme stuff here, so go read that first if you’re unfamiliar with it.
Something that’s stymied me while working through Sorting with Bialgebras is that whatever it is we’re doing here, it’s not observable. All sorting functions are extentionally equal — so the work being done here is necessarily below the level of equality. This doesn’t jive well with how I usually think about programming, and has made it very hard for me to see exactly what the purpose of all of this is. But I digress.
Bubblesort and Naive Insertion Sort
Hinze et al. begin by showing us that insertion sort and bubble sort have terse implementations:
insertSort :: Ord a => [a] -> [a]
= foldr insert []
insertSort
selectSort :: forall a. Ord a => [a] -> [a]
= unfoldr select
selectSort where
select :: [a] -> Maybe (a, [a])
= Nothing
select [] =
select as let x = minimum as
= delete x as
xs in Just (x, xs)
and that there are two dualities here, foldr
is dual to unfoldr
, and insert :: Ord a => a -> [a] -> [a]
is dual to select :: Ord a => [a] -> Maybe (a, [a])
.
The rest of the paper is pulling on this thread to see where it goes. As a first step, it’s noted that foldr
and unfoldr
are hiding a lot of interesting details, so instead we will divide the sorting problem into two halves: a catamorphism to tear down the unsorted list, and an anamorphism to build up the sorted version.
Begin by defining Mu
and Nu
, which are identical in Haskell. The intention here is that we can tear down Mu
s, and build up Nu
s:
newtype Mu f = Mu { unMu :: f (Mu f) }
newtype Nu f = Nu { unNu :: f (Nu f) }
as witnessed by cata
and ana
:
cata :: Functor f => (f a -> a) -> Mu f -> a
= f . fmap (cata f) . unMu
cata f
ana :: Functor f => (a -> f a) -> a -> Nu f
= Nu . fmap (ana f) . f ana f
We’ll also need a pattern functor to talk about lists:
data ListF (t :: Tag) a k = Nil | a :> k
deriving (Eq, Ord, Show, Functor)
infixr 5 :>
This Tag
thing is of my own devising, it’s a phantom type to track whether or not our list is sorted:
data Tag = UnsortedTag | SortedTag
type Unsorted = ListF 'UnsortedTag
type Sorted = ListF 'SortedTag
Note that in Haskell, nothing ensures that Sorted
values are actually sorted! This is just some extra machinery to get more informative types.
With everything in place, we can now write the type of a sorting function:
type SortingFunc a = Ord a => Mu (Unsorted a) -> Nu (Sorted a)
that is, a sorting function is something that tears down an unsorted list, and builds up a sorted list in its place. Makes sense, and the extra typing helps us keep track of which bits are doing what.
Most of the paper stems from the fact that we can implement a SortingFunc
in two ways. We can either:
- write a
cata
that tears down theMu
by building up aNu
viaana
, or - write an
ana
that builds up theNu
that tears down theMu
viacata
Let’s look at the first case:
naiveInsertSort :: SortingFunc a
= cata $ ana _ naiveInsertSort
this hole has type
Unsorted a (Nu (Sorted a))
-> Sorted a (Unsorted a (Nu (Sorted a)))
which we can think of as having stuck an element on the front of an otherwise sorted list, and then needing to push that unsortedness one layer deeper. That does indeed sound like insertion sort: take a sorted list, and then traverse through it, sticking the unsorted element in the right place. It’s “naive” because the recursion doesn’t stop once it’s in the right place — since the remainder of the list is already sorted, it’s OK to stop.
The paper deals with this issue later.
Let’s write a function with this type:
naiveIns :: Ord a
=> Unsorted a (Nu (Sorted a))
-> Sorted a (Unsorted a (Nu (Sorted a)))
Nil = Nil
naiveIns :> Nu Nil) = a :> Nil
naiveIns (a :> Nu (b :> x))
naiveIns (a | a <= b = a :> b :> x
| otherwise = b :> a :> x
The first two cases are uninteresting. But the cons-cons case is — we need to pick whichever of the two elements is smaller, and stick it in front. In doing so, we have sorted the first element in the list, and pushed the unsortedness deeper.
This all makes sense to me. But I find the dual harder to think about. Instead of making a cata . ana
, let’s go the other way with an ana . cata
:
bubbleSort :: SortingFunc a
= ana $ cata _ bubbleSort
this hole now has type:
Unsorted a (Sorted a (Mu (Unsorted a)))
-> Sorted a (Mu (Unsorted a))
which is now an unsorted element in front of a sorted element, in front of the remainder of an unsorted list. What does it mean to be a single sorted element? Well, it must be the smallest element in the otherwise unsorted list. Thus, the smallest element in a list bubbles its way to the front.
On my first reading of this, I thought to myself “that sure sounds a lot like selection sort!” But upon close reading later, it’s not. Insertion sort knows where to put the smallest element it’s found, and does that in constant time. Bubble sort instead swaps adjacent elements, slowly getting the smallest element closer and closer to the front.
Let’s implement a function with this type:
bub :: Ord a
=> Unsorted a (Sorted a (Mu (Unsorted a)))
-> Sorted a (Mu (Unsorted a))
Nil = Nil
bub :> Nil) = a :> Mu Nil
bub (a :> b :> x)
bub (a | a <= b = a :> Mu (b :> x)
| otherwise = b :> Mu (a :> x)
While naiveIns
pushes unsorted elements inwards, bub
pulls sorted elements outwards. But, when you look at the implementations of bub
and naiveIns
, they’re awfully similar! This is the main thrust of the paper — we can factor out a common core of naiveIns
and bub
:
swap :: Ord a
=> Unsorted a (Sorted a x)
-> Sorted a (Unsorted a x)
Nil = Nil
swap :> Nil) = a :> Nil
swap (a :> b :> x)
swap (a | a <= b = a :> b :> x
| otherwise = b :> a :> x
It wasn’t immediately clear to me why this works, since the types of bub
and ins
seem to be more different than this. But when we compare them, this is mostly an artifact of the clunky fixed-point encodings:
-- type of bub
Unsorted a (Sorted a (Mu (Unsorted a)))
-> Sorted a (Mu (Unsorted a))
-- unroll a Mu:
Unsorted a (Sorted a (Mu (Unsorted a)))
-> Sorted a (Unsorted a (Mu (Unsorted a)))
-- let x ~ Mu (Unsorted a)
Unsorted a (Sorted a x)
-> Sorted a (Unsorted a x)
-- let x ~ Nu (Sorted a)
Unsorted a (Sorted a (Nu (Sorted a))
-> Sorted a (Unsorted a (Nu (Sorted a)))
-- unroll a Nu
Unsorted a (Sorted a (Nu (Sorted a)))
-> Sorted a (Unsorted a (Nu (Sorted a)))
-- type of naiveIns
The only difference here is we are no longer packing Mu
s and unpacking Nu
s. We can pull that stuff out:
bubbleSort'' :: SortingFunc a
= ana $ cata $ fmap Mu . swap
bubbleSort''
naiveInsertSort'' :: SortingFunc a
= cata $ ana $ swap . fmap unNu naiveInsertSort''
and thus have shown that bubbleSort''
and naiveInsertSort''
are duals of one another.
Bialgebras
Allegedly, this stuff is all “just a bialgebra.” So, uh, what’s that? The authors draw a bunch of cool looking commutative diagrams that I would love to try to prove, but my attempts to do this paper in Agda were stymied by Mu
and Nu
being too recursive. So instead we’ll have to puzzle through it like peasants instead.
The universal mapping property of initial algebras (here, Mu
) is the following:
. Mu = f . fmap (cata f) cata f
and dually, for terminal coalgebras (Nu
):
. ana f = fmap (ana f) . f unNu
Let’s work on the cata
diagram, WLOG. This UMP gives us:
fmap (cata bub)
Unsorted (Mu Unsorted) ---------> Unsorted (Sorted (Mu Unsorted))
| |
Mu | | bub
v v
Mu Unsorted ----------------------> Sorted (Mu Unsorted)
cata bub
but as we saw in bubbleSort''
, bub = fmap Mu . swap
, thus:
fmap (cata bub)
Unsorted (Mu Unsorted) ---------> Unsorted (Sorted (Mu Unsorted))
| |
| | swap
| v
Mu | Sorted (Unsorted (Mu Unsorted))
| |
| | fmap Mu
v v
Mu Unsorted ----------------------> Sorted (Mu Unsorted)
cata bub
If we let c = cata bub
and a = Mu
, this diagram becomes
fmap c
Unsorted (Mu Unsorted) ---------> Unsorted (Sorted (Mu Unsorted))
| |
| | swap
| v
a | Sorted (Unsorted (Mu Unsorted))
| |
| | fmap a
v v
Mu Unsorted ----------------------> Sorted (Mu Unsorted)
c
and allgedly, this is the general shape of an f
-bialgebra:
. a = fmap a . f . fmap c c
where a : forall x. F x -> x
and c : forall x. x -> G x
, thus f : forall x. F (G x) -> G (F x)
. In Agda:
record Bialgebra
{F G : Set → Set}
{F-functor : Functor F}
{G-functor : Functor G}
(f : {X : Set} → F (G X) → G (F X)) : Set where
field
: {X : Set} → F X → X
a : {X : Set} → X → G X
c
bialgebra-proof: {X : Set}
→ c {X} ∘ a ≡ map G-functor a ∘ f ∘ map F-functor c
where we can build two separate Bialgebra swap
s:
: Bialgebra swap
bubbleSort .a bubbleSort = cata bub
Bialgebra.c bubbleSort = Mu
Bialgebra.bialgebra-proof bubbleSort = -- left as homework Bialgebra
and
: Bialgebra swap
naiveInsertSort .a naiveInsertSort = unNu
Bialgebra.c naiveInsertSort = ana bub
Bialgebra.bialgebra-proof naiveInsertSort = -- left as homework Bialgebra
I’m not entirely confident about this, since as said earlier, I don’t have this formalized in Agda. It’s a shame, because this looks like it would be a lot of fun to do. We’re left with a final diagram, equaqting cata (ana naiveIns)
and ana (cata bub)
:
?fmap (cata (ana naiveIns))?
Unsorted (Mu Unsorted) - - - - -> Unsorted (Nu Sorted)
| |
Mu | | ana naiveIns
v cata (ana naiveIns) v
Mu Unsorted - - - - - -|| - - - - -> Nu Sorted
| ana (cata bub) |
cata bub | | unNu
v v
Sorted (Mu Unsorted) - - - - - -> Sorted (Nu Sorted)
?fmap (ana (cata bub)?
The morphisms surrounded by question marks aren’t given in the paper, but I’ve attempted to fill them in. The ones I’ve given complete the square, but they’re the opposite of what I’d expect from the initial algebra / terminal coalgebra UMPs. This is something to come back to, I think, but is rather annoying since Agda would just tell me the damn answer.
Paramorphisms and Apomorphisms
Standard recursion scheme machinery:
para :: Functor f => (f (Mu f, a) -> a) -> Mu f -> a
= f . fmap (id &&& para f) . unMu
para f
apo :: Functor f => (a -> f (Either (Nu f) a)) -> a -> Nu f
= Nu . fmap (either id (apo f)) . f apo f
The idea is that para
s can look at all the structure that hasn’t yet been folded, while apo
s can exit early by giving a Left
.
Insertion and Selection Sort
The paper brings us back to insertion sort. Instead of writing the naive version as a cata . ana
, we will now try writing it as a cata . apo
. Under this new phrasing, we get the type:
Unsorted a (Nu (Sorted a))
-> Sorted a (Either (Nu (Sorted a))
Unsorted a (Nu (Sorted a)))) (
which is quite a meaningful type. Now, our type can signal that the resuling list is already sorted all the way through, or that we had to push an unsorted value inwards. As a result, ins
looks exactly like bub
, except that we can stop early in most cases, safe in the knowledge that we haven’t changed the sortedness of the rest of the list. The b < a
case is the only one which requires further recursion.
ins :: Ord a
=> Unsorted a (Nu (Sorted a))
-> Sorted a (Either (Nu (Sorted a))
Unsorted a (Nu (Sorted a))))
(Nil = Nil
ins :> Nu Nil) = a :> Left (Nu Nil)
ins (a :> Nu (b :> x))
ins (a | a <= b = a :> Left (Nu (b :> x)) -- already sorted!
| otherwise = b :> Right (a :> x)
Let’s think now about selection sort. Which should be an ana . para
by duality, with the resulting type:
Unsorted a ( Mu (Unsorted a)
Sorted a (Mu (Unsorted a))
,
)-> Sorted a (Mu (Unsorted a))
It’s much harder for me to parse any sort of meaning out of this type. Now our input has both all the unsorted remaining input, as well as a single term bubbling up. I actually can’t figure out how this helps us; presumably it’s something about laziness and not needing to do something with the sorted side’s unsorted tail? But I don’t know. Maybe a reader can drop a helpful comment.
Anyway, the paper gives us sel
which implements the type:
sel
:: Ord a
=> Unsorted a ( Mu (Unsorted a)
, Sorted a (Mu (Unsorted a))
)
-> Sorted a (Mu (Unsorted a))
sel Nil = Nil
sel (a :> (x, Nil)) = a :> x
sel (a :> (x, b :> x'))
| a <= b = a :> x
| otherwise = b :> Mu (a :> x')
Getting an intution here as to why the otherwise
case uses x'
instead of x
is an exercise left to the reader, who can hopefully let me in on the secret.
As before, we can pull a bialgebra out of ins
and sel
. This time, the input side uses the (,)
, and the output uses Either
, and I suppose we get the best of both worlds: early stopping, and presumably whatever caching comes from (,)
:
swop :: Ord a
=> Unsorted a (x, Sorted a x)
-> Sorted a (Either x (Unsorted a x))
Nil = Nil
swop :> (x, Nil)) = a :> (Left x)
swop (a :> (x, (b :> x')))
swop (a | a <= b = a :> Left x
| otherwise = b :> Right (a :> x')
This time our bialgebras are:
: Bialgebra swop
insertSort .a insertSort = apo ins
Bialgebra.c insertSort = id &&& unNu
Bialgebra.bialgebra-proof insertSort = -- left as homework Bialgebra
and
: Bialgebra swop
selectSort .a selectSort = para sel
Bialgebra.c selectSort = either id Mu
Bialgebra.bialgebra-proof selectSort = -- left as homework Bialgebra
Quicksort and Treesort
Lots of the same techniques, and I’m running out of time, so we’ll go quickly. The key insight thus far is that select sort and insert sort both suck. How do we go faster than \(O(n^2)\)? Quicksort, and Treesort!
What’s interesting to me is I never considered Quicksort to be a tree-sorting algorithm. But of course it is; it’s recursively dividing an array in half, sorting each, and then putting them back together. But that fact is obscured by all of this “pivoting” nonsense; it’s just a tree algorithm projected onto arrays.
Hinze et al. present specialized versions of Quicksort and Treesort, but we’re just going to skip to the bialgebra bits:
data Tree a k = Empty | Node k a k
deriving (Eq, Ord, Show, Functor)
sprout :: Ord a
=> Unsorted a (x, Tree a x)
-> Tree a (Either x (Unsorted a x))
Nil = Empty
sprout :> (x, Empty)) = Node (Left x) a (Left x)
sprout (a :> (x, (Node l b r)))
sprout (a | a <= b = Node (Right (a :> l)) b (Left r)
| otherwise = Node (Left l) b (Right (a :> r))
This is the creation of a binary search tree. Left
trees don’t need to be manipulated, and Right
ones need to have the new unsorted element pushed down. The other half of the problem is to extract elements from the BST:
wither :: Tree a (x, Sorted a x)
-> Sorted a (Either x (Tree a x))
Empty = Nil
wither Node (_, Nil) a (r, _)) = a :> Left r
wither (Node (_, b :> l') a (r, _)) = b :> Right (Node l' a r) wither (
I think I understand what’s going on here. We have a tree with nodes a
and “subtrees” Sorted a x
, where remember, x
ties the knot. Thus, in the first level of the tree, our root node is the pivot, and then the left “subtree” is the subtree itself, plus a view on it corresponding to the smallest element in it. That is, in (x, Sorted a x)
, the fst
is the tree, and the snd
is the smallest element that has already been pulled out.
So, if we have a left cons, we want to return that, since it’s necessarily smaller than our root. But we continue (via Right
) with a new tree, using the same root and right sides, letting the recursion scheme machinery reduce that into its smallest term.
But I must admit that I’m hand-waving on this one. I suspect better understanding would come from getting better intutions behind para
and apo
.
Let’s tie things off then, since I’ve clearly hit my limit of understanding on this paper for this week. While having a deadline is a nice forcing function to actually go through papers, it’s not always the best for deeply understanding them! Alas, something to think about for the future.
We’re given two implementations of grow
:
grow' :: Mu Unsorted -> Nu Tree
grow,= ana . para $ fmap (either id unNu) . sprout
grow = cata . apo $ sprout . fmap (id &&& Mu) grow'
as well as two for flatten
:
flatten' :: Mu Tree -> Nu Sorted
flatten,= cata . apo $ wither . fmap (id &&& unNu)
grow = ana . para $ fmap (either id Mu) . wither grow'
and then finally, give us quickSort
and treeSort
:
treeSort :: SortingFunc a
quickSort,= flatten . downcast . grow
quickSort = flatten' . downcast . grow' treeSort
where downcast
was given earlier as:
downcast :: Functor f => Nu f -> Mu f
= Mu . fmap downcast . unNu downcast
This is interesting, but comes with an obvious questions: what if we intermix flatten
with grow'
, and vice versa? Rather unexcitingly, they still sort the list, and don’t seem to have different asymptotics. As a collorary, we must thus be excited, and assume that these are two “new” sorting functions, at least, ones without names. I guess that’s not too surprising; there are probably infinite families of sorting functions.
Conclusions / Notes to Self
What a fun paper! I did a bad thing by jumping into Agda too quickly, hoping it would let me formalize the “this is a sorted list” stuff. But that turned out to be premature, since the Sorted
wrapper is only ever a pair, and exists only to signal some information to the reader. Thus, I spent six hours working through the Agda stuff before realizing my deadline was coming up sooner than later.
Implicit in that paragraph is that I started implementing before I had read through the entire paper, which was unwise, as it meant I spent a lot of time on things that turned out to be completely unrelated. Note to self to not do this next time.
Also, it turns out I’m not as firm on recursion schemes as I thought! It’d be valuable for me to go through para
s in much more depth than I have now, and to work harder at following the stuff in this paper. How do the authors keep everything straight? Do they just have more experience, or are they using better tools than I am?