Contrast this to the internet of yore. By virtue of being hard to access, the internet filtered away the mass appeal it has today. It was hard and expensive to get on, and in the absence of authoring tools, you were only creating internet content if you *had something to say.* Which meant that, as a consumer, if you found something, you had good reason to believe it was well-informed. Why would someone go through the hassle of making a website about something they weren’t interested in?

In 2022, we have a resoundingly sad answer to that question: advertising. The primary purpose of the web today is “engagement,” which is Silicon Valley jargon for “how many ads can we push through someone’s optical nerve?” Under the purview of engagement, it makes sense to publish webpages on every topic imaginable, regardless of whether or not you know what you’re talking about. In fact, engagement goes up if you *don’t* know what you’re talking about; your poor reader might mistakenly believe that they’ll find the answer they’re looking for elsewhere on your site. That’s twice the advertising revenue, baby!

But the spirit of the early web isn’t gone: the bookmarks I’ve kept these long decades mostly still work, and many of them still receive new content. There’s still weird, amateur, passion-project stuff out there. It’s just hard to find. Which brings us to our main topic: search.

Google is inarguably the front page of the internet. Maybe you already know where your next destination is, in which case you probably search for the website on Google and click on the first link, rather than typing in the address yourself. Or maybe you don’t already know your destination, and you search for it. Either way, you hit Google first.

When I say the internet is getting worse, what I really mean is that the Google search results are significantly less helpful than they used to be. This requires some qualification. Google has gotten exceedingly good at organizing everyday life. It reliably gets me news, recipes, bus schedules, tickets for local events, sports scores, simple facts, popular culture, official regulations, and access to businesses. It’s essentially the yellow pages and the newspaper put together. For queries like this, which are probably 95% of Googles traffic, Google does an excellent job.

The difficulties come in for that other 5%, the so-called “long tail.” The long tail is all those other things we want to know about. Things without well-established, factual answers. Opinions. Abstract ideas. Technical information. If you’re cynical, perhaps it’s all the stuff that doesn’t have wide-enough appeal to drive engagement. Whatever the reason, the long tail is the stuff that’s hard to find on the modern internet.

Notice that the long-tail is exactly the stuff we need search for. Mass-appeal queries are, almost by definition, not particularly hard to find. If I need a bus schedule, I know to talk to my local transit authority. If I’m looking to keep up with the Kardashians, I’m not going to have any problems (at least, no *search* problems.) On the other hand, it’s much less clear where to get information on why my phone starts overheating when I open the chess app.

So what happens if you search for the long tail on Google? If you’re like me, you flail around for ten minutes wasting your time reading crap articles before you remember that Google is awful for the long tail, and you come away significantly more frustrated, not having found what you were looking for in the first place.

Lets look at some examples. One of my favorite places in the world is Koh Lanta, Thailand. When traveling, I’m always on the lookout for places that give off the Koh Lanta vibe. What does that mean? Hard to say, exactly, but having tourist amenities without being touristy. Charming, slow, cheap. I don’t know exactly; if I did, it’d be easier to find. Anyway, forgetting that Google is bad at long tails, I search for `what is the koh lanta of croatia?`

and get:

- Koh-Lanta - Wikipedia [note: not the island, the game show]
- Top 15 Unique Things to Do in Koh Lanta
- Visit Koh Lanta on a trip to Thailand
- Beautiful places to travel, Koh lanta, Sunset
- Holiday Vacation to Koh Lanta: Our favourite beaches and …
- Koh Lanta Activities: 20 Best Things to Do
- etc

With the exception of “find a flight from Dubrovnik to Koh Lanta” on page two, you need to get to page five before you see any results that even acknowledge I *also* searched for `croatia`

. Not very impressive.

When you start paying attention, you’ll notice it on almost every search — Google isn’t actually giving you answers to things you searched for. Now, maybe the reason here is that there *aren’t* any good results for the query, but that’s a valuable thing to know as well. Don’t just hit me with garbage, it’s an insult to my intelligence and time.

I wanted to figure out why exactly the internet is getting worse. What’s going on with Google’s algorithm that leads to such a monotonous, boring, corporate internet landscape? I thought I’d dig into search engine optimization (SEO) — essentially, techniques that improve a website’s ranking in Google searches. I’d always thought SEO was better at selling itself than it was at improving search results, but my god was I wrong.

SEO techniques are extremely potent, and their widespread adoption is what’s wrong with the modern web.

For example, have you ever noticed that the main content of most websites is something like 70% down the page? Every recipe site I’ve ever seen is like this — nobody cares about how this recipe was originally your great-grandmother’s. Just tell us what’s in it. Why is this so prevalent on the web?

Google rewards a website for how long a user stays on it, with the reasoning being that a bad website has the user immediately hit the back button. Seems reasonable, until you notice the problem of incentives here. Websites aren’t being rewarded for having good content under this scheme, they’re rewarded for wasting your time and making information hard to find. Outcome: websites that answer questions, but hide the information somewhere on a giant (ad-filled) page.

Relatedly, have you noticed how every website begins with a stupid paragraph overviewing the thing you’re searching for? It’s always followed by a stupid paragraph describing why you should care about the thing. For example, I just searched for `garden irrigation`

, and the first result is:

Water is vital to plant health, but watering by hand can be a hassle. You have to drag hoses between gardens, move sprinklers around, or take the time to water each plant. Our innovative watering systems take the hassle out of watering. They’re the easiest way to give plants the consistent moisture they need for your biggest harvest and most beautiful blooms.

*Water is vital to plant health.* Wow, who knew! Why in god’s name would I be searching for garden irrigation if I didn’t know that water was vital to plant health. Why is copy like this so prevalent on the web?

Things become clearer when you look at some of the context of this page:

Url: https://[redacted]/how-to/how-to-choose-a-watering-system/8747.html

Title: How to Choose a Garden Irrigation System

Heading: Soak, Drip or Spray: Which is right for you?

Subheading: Choose the best of our easy, customizable, irrigation systems to help your plants thrive and save water

As it happens, Google rewards websites which use keywords in their url, title, headings, and first 100 words. Just by eyeballing, we can see that this particular website is targeting the keywords “water”, “system”, “irrigation”, and “garden”. Pages like these hyper-optimized to come up for particular searches. The stupid expository stuff exists only to pack “important keywords” into the first 100 words.

But keyword targeting doesn’t stop there. As I was reading through this SEO stuff (that is, the first page of a Google search for `seo tricks`

,) every single page offered 15-25 great, technical SEO tricks. And then, without fail, the final point on each page was “but really, the best SEO strategy is having great content!” That’s weird. “Great content” isn’t something an algorithm can identify; if it were, you wouldn’t be currently reading the ravings of a madman, angry about the state of the internet.

So, why do all of these highly-optimized SEO pages ubiquitously break form, switching from concrete techniques to platitudes? You guessed it, it’s a SEO technique! Google offers a keyword dashboard, where you can see which keywords group together, and (shudder) which keywords are *trending.* Google rewards you for having other keywords in the group on your page. And it extra rewards you for having trending keywords. You will not be surprised to learn that “quality content” is a keyword that clusters with “seo,” nor that it is currently a trending keyword.

Think about that for a moment. Under this policy, Google is incentivizing pages to become *less focused,* by adding text that is only tangentially related. But, how do related keywords come about? The only possible answer here is to find keywords that often cluster on other pages. But this is a classic death spiral, pulling every page in a topic to have the same content.

Another way of looking at it is that if you are being incentivized, you are being *disincentivized.* Webpages are being penalized for including original information, because original information can’t possibly be in the keyword cluster.

There are a multitude of perverse incentives from Google, but I’ll mention only two more. The first is that websites are penalized for having low-ranking pages. The conventional advice here is to delete “underperforming” pages, which only makes the search problem worse — sites are being rewarded for deleting pages that don’t align with the current search algorithm.

My last point: websites are penalized for even *linking* to low-ranking pages!

It’s not hard to put all of the pieces together and see why the modern web is so bland and monotonous. Not only is the front-page of the internet aggressively penalizing websites which *aren’t* bland and monotonous, it’s also punishing any site which has the audacity to link to more interesting parts of the web.

So the discoverable part of web sucks. But is that really Google’s fault? I’d argue no. By virtue of being the front-page, Google’s search results are under extreme scrutiny. In the eyes of the non-technical population, especially the older generations, the internet and Google are synonymous. The fact is that Google gets unfairly targeted by legislation because it’s a big, powerful tech company, and we as a society are uncomfortable with that.

Worse, the guys doing the regulation don’t exactly have a grasp on how internet things work.

Society at large has been getting very worried about disinformation. Who’s problem is that? Google’s — duh. Google is how we get information on the internet, so it’s up to them to defend us from disinformation.

Unfortunately it’s really hard to spot disinformation. Sometimes even the *government* lies to us (gasp!). I can think of two ways of avoiding getting in trouble with respect to disinformation. One: link only to *official sites,* thus changing the problem of trustworthiness to one of authority. If there is no authority, just give back the consensus. Two: don’t return any information whatsoever.

Google’s current strategy seems to be somewhere between one and two. For example, we can try a controversialish search like `long covid doesn't exist`

. The top results at time of writing are:

- The search for Long Covid (science.org)
- Small Study Finds No Obvious Physical Causes for Long COVID (medscape.com)
- Fact Check-‘Long COVID’ is not fake, quoted French study did … (reuters.com)
- Harvard Medical School expert explains ‘long COVID’ (harvard.edu)
- Claim that French study showed long COVID doesn’t exist … (healthfeedback.org)
- What doctors wish patients knew about long COVID (ama-assn.org)

I’m not particularly in the know, but I recognize most of these organizations. Science.org sounds official. Not only is one of the pages from Harvard, but also it’s from a Harvard Medical School *expert.* I especially like the fifth one, the metadata says:

Claim: Long COVID is “mostly a mental disease”; the condition long COVID is solely due to a person’s belief, not actual disease; long COVID doesn’t exist

Fact check by Health Feedback: Inaccurate

Every one of these websites comes off as *authoritative* — not in sense of “knowing what they’re talking about” because that’s hard to verify — but in the sense of being the sort of organization we’d trust to answer this question for us. Or, in the case of number five, at least telling us that they fact checked it.

Let’s try a search for something requiring less authority, like “best books.” In the past I would get a list of books considered the best. But now I get:

- The Greatest Books: The Best Books of All Time - 1 to 50
- The Best Books of All Time | chapters.indigo.ca
- 100 Best Books of All Time - Reader’s Digest
- Best Book Lists - Goodreads
- Best Books 2022: Books We Love : NPR

You’ll notice there are no actual books here. There are only *lists* of best books. Cynical me notes that if you were to actually list a book, someone could find it controversial. Instead, you can link to institutional websites, and let them take the controversy for their picks.

This isn’t the way the web needs to be. Google could just as well given me personal blogs of people talking about long covid and their favorite books, except (says cynical me) that these aren’t authoritative sources, and thus, linking to them could be considered endorsement. And the web is too big and too fast moving to risk linking to anything that hasn’t been vetted in advance. It’s just too easy to accidentally give a *good* result to a controversial topic, and have the law makers pounce on you. Instead, punt the problem back to authorities.

The web promised us a democratic, decentralized public forum, and all we got was the stinking yellow pages in digital format. I hope the crypto people can learn a lesson here.

Anyway, all of this is to say that I think lawmakers and liability concerns are the real reason the web sucks. All things being equal, Google would like to give us good results, but it prefers making boatloads of money, and that would be hard to do if it got regulated into nothingness.

Google isn’t the only search engine around. There are others, but it’s fascinating that none of them compete on the basis of providing better results. DDG claims to have better privacy. Ecosia claims to plant trees. Bing exists to keep Microsoft relevant post-2010, and for some reason, ranks websites for being highly-shared on social media (again, things that are, by definition, not hard to find.)

Why don’t other search engines compete on search results? It can’t be hard to do better than Google for the long tail.

It’s interesting to note that the problems of regulatory-fear and SEO-capture are functions of Google’s cultural significance. If Google were smaller or less important, there’d be significantly less negative-optimization pressure on it. Google is a victim of its own success.

That is to say, I don’t think all search engines are doomed to fail in the same way that Google has. A small search engine doesn’t need to be authoritative, because nobody is paying attention to it. And it doesn’t have to worry about SEO for the same reason — there’s no money to be made in manipulating its results.

What I dream of is Google circa 2006. A time where a search engine searched what you asked for. A time before aggressive SEO. A time before social media, when the only people on the internet had a reason to be there. A time before sticky headers and full-screen modal pop-ups asking you to subscribe to a newsletter before reading the article. A time before click-bait and subscription-only websites which tease you with a paragraph before blurring out the rest of the content.

These problems are all solvable with by a search engine. But that search engine isn’t going to be Google. Let’s de-rank awful sites, and boost personal blogs of people with interesting things to say. Let’s de-rank any website that contains ads. Let’s not index any click-bait websites, which unfortunately in 2022 includes most of the news.

What we need is a search engine, by the people, and for the people. Fuck the corporate interests and the regulatory bullshit. None of this is hard to do. It just requires someone to get started.

]]>A few months ago, the excellent David Rusu gave me an impromptu lecture on ring signatures, which are a way of signing something as an anonymous member of a group. That is, you can show someone in the signing pool was actually responsible for signing the thing, but can’t determine *which member of the pool actually signed it.* David walked me through all the math as to how that actually happens, but I was unable to follow it, because the math was hard and, perhaps more importantly, it felt like hand-compiling a proof.

What do I mean by “hand-compiling” a proof? Well, we have some mathematical object, something like

postulate Identity : Set Message : Set SignedBy : Message → Identity → Set use-your-imagination : {A : Set} → A record SignedMessage {n : ℕ} (pool : Vec Identity n) : Set where field message : Message @erased signer : Fin n signature : SignedBy message (lookup pool signer)

where `@erased`

is Agda’s runtime irrelevance annotation, meaning the signer field won’t exist at runtime. In fact, attempting to write a function that would extract it results in the following error:

Identifier

`signer`

is declared erased, so it cannot be used here

when checking that the expression`signer x`

has type`Fin n`

Nice one Agda!

Hand-compiling this thing is thus constructing some object that has the desired properties, but doing it in a way that requires BEING VERY SMART, and throwing away any chance at composability in the process. For example, it’d be nice to have the following:

open SignedMessage weakenL : ∀ {n pool new-id} → SignedMessage {n} pool → SignedMessage (new-id ∷ pool) weakenL x = use-your-imagination weakenR : ∀ {n pool new-id} → SignedMessage {n} pool → SignedMessage (pool ++ [ new-id ]) weakenR x = use-your-imagination

which would allow us to arbitrarily extend the pool of a signed message. Then, we could trivially construct one:

sign : Message → (who : Identity) → SignedMessage [ who ] message (sign msg who) = msg signer (sign msg who) = zero signature (sign msg who) = use-your-imagination

and then obfuscate who signed by some random choice of subsequent weakenLs and weakenRs.

Unfortunately, this is not the case with ring signatures. Ring signatures require you to “bake in” the signing pool when you construct your signature, and you can never again change that pool, short of doing all the work again. This behavior is non-composable, and thus, in my reckoning, unlikely to be a true solution to the problem.

The paper I chose to review this week is Proof-Carrying Code by George Necula, in an attempt to understand if the PL literature has anything to say about this problem.

PCC is an old paper (from 1997, egads!) but it was the first thing I found on the subject. I should really get better at vetting my literature before I go through the effort of going through it, but hey, what are you going to do?

The idea behind PCC is that we want to execute some untrusted machine code. But we don’t want to sacrifice our system security to do it. And we don’t want to evaluate some safe language into machine code, because that would be too slow. Instead, we’ll send the machine code, as well as a safety proof that verifies it’s safe to execute this code. The safety proof is tied to the machine code, such that you can’t just generate a safety proof for an unrelated problem, and then attach it to some malicious code. But the safety proof isn’t obfuscated or anything; the claim is that if you can construct a safety proof for a given program, that program is necessarily safe to run.

On the runtime side, there is a simple algorithm for checking the safety proof, and it is independent of the arguments that the program is run with; therefore, we can get away with checking code once and evaluating it many times. It’s important that the algorithm be simple, because it’s a necessarily trusted piece of code, and it would be bad news if it were to have bugs.

PCC’s approach is a bit… unimaginative. For every opcode we’d like to allow in the programs, we attach a safety precondition, and a postcondition. Then, we map the vector of opcodes we’d like to run into its pre/post conditions, and make sure they are confluent. If they are, we’re good to go. This vector of conditions is called the vector VC in the paper.

So, the compiler computes the VC and attaches it to the code. Think of the VC as a proposition of safety (that is, a type), and a proof of that proposition (the VC itself.) In order to validate this, the runtime does a safety typecheck, figuring out what the proposition of safety would have to be. It compares this against the attached proof, and if they match, it typechecks the VC to ensure it has the type it says. If it does, our code is safe.

The PCC paper is a bit light on details here, so it’s worth thinking about exactly what’s going on here. Presumably determining the safety preconditions is an easy problem if we can do it at runtime, but proving some code satisfies it is hard, *or else we could just do that at runtime too.*

I’m a bit hesitant to dive into the details here, because I don’t really care about determining whether some blob of machine code is safe to run. It’s a big ball of poorly typed typing judgments about memory usage. Why do I say poorly typed? Well consider one of the rules from the paper:

$\frac{m \vdash e : \tau \text{list} \quad \quad e \neq 0} {m \vdash e : \text{addr} \wedge \ldots}$

Here we have that from `e : List τ`

(and that `e`

isn’t 0) we can derive `e : addr`

. At best, if we are charitable in assuming $e \neq 0$ means that `e`

isn’t `nil`

, there is a type preservation error here. If we are less charitable, there is also some awful type error here involving 0, which might be a null check or something? This seems sufficiently messy that I don’t care enough to decipher it.

How applicable is any of this to our original question around ring signatures? Not very, I think, unfortunately. We already have the ring signature math if we’d like to encode a proof, and the verification of it is easy enough. But it’s still not very composable, and I doubt this paper will add much there. Some more promising approaches would be to draw the mystery commutative diagrams ala Adders and Arrows, starting from a specification and deriving a chain of proofs that the eventual implementation satisfies the specification. The value there is in all the intermediary nodes of the commutative diagram, and whether we can prove weakening lemmas there.

But PCC isn’t entirely a loss; I learned about `@erased`

in Agda.

I was describing my idea from last week to automatically optimize programs to Colin, who pointed me towards Syntax-Guided Synthesis by Alur et al.

Syntax-Guided Synthesis is the idea that free-range program synthesis is really hard, so instead, let’s constrain the search space with a grammar of allowable programs. We can then enumerate those possible programs, attempting to find one that satisfies some constraints. The idea is quite straightforward when you see it, but that’s not to say it’s unimpressive; the paper has lots of quantitative results about exactly how well this approach does.

The idea is we want to find programs with type I `→`

O, that satisfy some specification. We’ll do that by picking some Language of syntax, and trying to build our programs there.

All of this is sorta moot, because we assume we have some oracle which can tell us if our program satisfies the spec. But the oracle is probably some SMT solver, and is thus expensive to call, so we’d like to try hard not to call it if possible.

Let’s take an example, and say that we’d like to synthesize the `max`

of two `Nat`

s. There are lots of ways of doing that! But we’d like to find a function that satisfies the following:

data MaxSpec (f : ℕ × ℕ → ℕ) : ℕ × ℕ → Set where is-max : {x y : ℕ} → x ≤ f (x , y) → y ≤ f (x , y) → ((f (x , y) ≡ x) ⊎ (f (x , y) ≡ y)) → MaxSpec f (x , y)

If we can successfully produce an element of MaxSpec `f`

, we have a proof that `f`

is an implementation of `max`

. Of course, actually producing such a thing is rather tricky; it’s equivalent to determining if MaxSpec `f`

is Decidable for the given input.

In the first three cases, we have some conflicting piece of information, so we are unable to produce a MaxSpec:

decideMax : (f : ℕ × ℕ → ℕ) → (i : ℕ × ℕ) → Dec (MaxSpec f i) decideMax f i@(x , y) with f i | inspect f i ... | o | [ fi≡o ] with x ≤? o | y ≤? o ... | no ¬x≤o | _ = no λ { (is-max x≤o _ _) → contradiction (≤-trans x≤o (≤-reflexive fi≡o)) ¬x≤o } ... | yes _ | no ¬y≤o = no λ { (is-max x y≤o x₂) → contradiction (≤-trans y≤o (≤-reflexive fi≡o)) ¬y≤o } ... | yes x≤o | yes y≤o with o ≟ x | o ≟ y ... | no x≠o | no y≠o = no λ { (is-max x x₁ (inj₁ x₂)) → contradiction (trans (sym fi≡o) x₂) x≠o ; (is-max x x₁ (inj₂ y)) → contradiction (trans (sym fi≡o) y) y≠o }

Otherwise, we have a proof that `o`

is equal to either `y`

or `x`

:

... | no proof | yes o≡y = yes (is-max (≤-trans x≤o (≤-reflexive (sym fi≡o))) (≤-trans y≤o (≤-reflexive (sym fi≡o))) (inj₂ (trans fi≡o o≡y))) ... | yes o≡x | _ = yes (is-max (≤-trans x≤o (≤-reflexive (sym fi≡o))) (≤-trans y≤o (≤-reflexive (sym fi≡o))) (inj₁ (trans fi≡o o≡x)))

MaxSpec is a proof that our function is an implementation of `max`

, and decideMax is a proof that “we’d know one if we saw one.” So that’s the specification taken care of. The next step is to define the syntax we’d like to guard our search.

The paper presents this syntax as a BNF grammar, but my thought is why use a grammar when we could instead use a type system? Our syntax is a tiny little branching calculus, capable of representing Terms and branching Conditionals:

mutual data Term : Set where var-x : Term var-y : Term const : ℕ → Term if-then-else : Cond → Term → Term → Term data Cond : Set where leq : Term → Term → Cond and : Cond → Cond → Cond invert : Cond → Cond

All that’s left for our example is the ability to “compile” a Term down to a candidate function. Just pattern match on the constructors and push the inputs around until we’re done:

mutual eval : Term → ℕ × ℕ → ℕ eval var-x (x , y) = x eval var-y (x , y) = y eval (const c) (x , y) = c eval (if-then-else c t f) i = if evalCond c i then eval t i else eval f i evalCond : Cond → ℕ × ℕ → Bool evalCond (leq m n) i = Dec.does (eval m i ≤? eval n i) evalCond (and c1 c2) i = evalCond c1 i ∧ evalCond c2 i evalCond (invert c) i = not (evalCond c i)

So that’s most of the idea; we’ve specified what we’re looking for, via MaxSpec, what our syntax is, via Term, and a way of compiling our syntax into functions, via eval. This is the gist of the technique; the rest is just algorithms.

The paper presents several algorithms and evaluates their performances. But one is clearly better than the others in the included benchmarks, so we’ll just go through that one.

Our algorithm to synthesize code corresponding to the specification takes a few parameters. We’ve seen the first few:

module Solver {Lang I O : Set} (spec : (I → O) → I → Set) (decide : (f : I → O) → (i : I) → Dec (spec f i)) (compile : Lang → I → O)

However, we also need a way of synthesizing terms in our Language. For that, we’ll use enumerate, which maps a natural number to a term:

(enumerate : ℕ → Lang)

Although it’s not necessary for the algorithm, we should be able to implement exhaustive over enumerate, which states every Lang is eventually produced by enumerate:

(exhaustive : (x : Lang) → Σ[ n ∈ ℕ ] (enumerate n ≡ x))

Finally, we need an oracle capable of telling us if our solution is correct. This might sound a bit like cheating, but behind the scenes it’s just a magic SMT solver. The idea is that SMT can either confirm that our program is correct, or produce a counterexample that violates the spec. The type here is a bit crazy, so we’ll take it one step at a time.

An oracle is a function that takes a Lang…

(oracle : (exp : Lang)

and either gives back a function that can produce a `spec (compile exp)`

for every input:

→ ((i : I) → spec (compile exp) i)

or gives back some input which is not a `spec (compile exp)`

:

⊎ Σ[ i ∈ I ] ¬ spec (compile exp) i) where

The algorithm here is actually quite clever. The idea is that to try each enumerated value in order, attempting to minimize the number of calls we make to the oracle, because they’re expensive. So instead, well keep a list of every counterexample we’ve seen so far, and ensure that our synthesized function passes all of them before sending it off to the oracle. First, we’ll need a data structure to store our search progress:

record SearchState : Set where field iteration : ℕ cases : List I open SearchState

The initial search state is one in which we start at the beginning, and have no counterexamples:

start : SearchState iteration start = 0 cases start = []

We can try a function by testing every counterexample:

try : (I → O) → List I → Bool try f = all (Dec.does ∘ decide f)

and finally, can now attempt to synthesize some code. Our function check takes a SearchState, and either gives back the next step of the search, or some program, and a proof that it’s what we’re looking for.

check : SearchState → SearchState ⊎ (Σ[ exp ∈ Lang ] ((i : I) → spec (compile exp) i)) check ss

We begin by getting and compiling the next enumerated term:

with enumerate (iteration ss) ... | exp with compile exp

check if it passes all the previous counterexamples:

... | f with try f (cases ss)

if it doesn’t, just fail with the next iteration:

... | false = inj₁ (record { iteration = suc (iteration ss) ; cases = cases ss })

Otherwise, our proposed function might just be the thing we’re looking for, so it’s time to consult the oracle:

... | true with oracle exp

which either gives a counterexample that we need to record:

... | inj₂ (y , _) = inj₁ (record { iteration = suc (iteration ss) ; cases = y ∷ cases ss })

or it confirms that our function satisfies the specification, and thus that were done:

... | inj₁ x = inj₂ (exp , x)

Pretty cool! The paper gives an optimization that caches the result of every counterexample on every synthesized program, and reuses these whenever that program appears as a subprogram of a larger one. The idea is that we can trade storage so we only ever need to evaluate each subprogram once — important for expensive computations.

Of course, pumping check by hand is annoying, so we can instead package it up as solve which takes a search depth, and iterates check until it runs out of gas or gets the right answer:

solve : ℕ → Maybe (Σ[ exp ∈ Lang ] ((i : I) → spec (compile exp) i)) solve = go start where go : SearchState → ℕ → Maybe (Σ Lang (λ exp → (i : I) → spec (compile exp) i)) go ss zero = nothing go ss (suc n) with check ss ... | inj₁ x = go ss n ... | inj₂ y = just y]]>

Today we’re heading back into the Elliottverse — a beautiful world where programming is principled and makes sense. The paper of the week is Conal Elliott’s Generic Parallel Functional Programming, which productively addresses the duality between “easy to reason about” and “fast to run.”

Consider the case of a right-associated list, we can give a scan of it in linear time and constant space:

module ExR where data RList (A : Set) : Set where RNil : RList A _◁_ : A → RList A → RList A infixr 5 _◁_ scanR : ⦃ Monoid A ⦄ → RList A → RList A scanR = go mempty where go : ⦃ Monoid A ⦄ → A → RList A → RList A go acc RNil = RNil go acc (x ◁ xs) = acc ◁ go (acc <> x) xs

This is a nice functional algorithm that runs in $O(n)$ time, and requires $O(1)$ space. However, consider the equivalent algorithm over left-associative lists:

module ExL where data LList (A : Set) : Set where LNil : LList A _▷_ : LList A → A → LList A infixl 5 _▷_ scanL : ⦃ Monoid A ⦄ → LList A → LList A scanL = proj₁ ∘ go where go : ⦃ Monoid A ⦄ → LList A → LList A × A go LNil = LNil , mempty go (xs ▷ x) = let xs' , acc = go xs in xs' ▷ acc , x <> acc

While scanL is also $O(n)$ in its runtime, it is not amenable to tail call optimization, and thus also requires $O(n)$ *space.* Egads!

You are probably not amazed to learn that different ways of structuring data lead to different runtime and space complexities. But it’s a more interesting puzzle than it sounds; because RList and LList are isomorphic! So what gives?

Reed’s pithy description here is

Computation time doesn’t respect isos

Exploring that question with him has been very illuminating. Math is deeply about extentionality; two mathematical objects are equivalent if their abstract interfaces are indistinguishable. Computation… doesn’t have this property. When computing, we care a great deal about runtime performance, which depends on fiddly implementation details, even if those aren’t externally observable.

In fact, as he goes on to state, this is the whole idea of denotational design. Figure out the extensional behavior first, and then figure out how to implement it.

This all harkens back to my review of another of Elliott’s papers, Adders and Arrows, which starts from the extensional behavior of natural addition (encoded as the Peano naturals), and then derives a chain of proofs showing that our everyday binary adders preserve this behavior.

Anyway, let’s switch topics and consider a weird fact of the world. Why do so many parallel algorithms require gnarly array indexing? Here’s an example I found by googling for “parallel c algorithms cuda”:

```
void stencil_1d(int *in, int *out) {
__global__ int temp[BLOCK_SIZE + 2 * RADIUS];
__shared__ int gindex = threadIdx.x + blockIdx.x * blockDim.x;
int lindex = threadIdx.x + RADIUS;
[lindex] = in[gindex];
tempif (threadIdx.x < RADIUS) {
[lindex - RADIUS] = in[gindex - RADIUS];
temp[lindex + BLOCK_SIZE] =
temp[gindex + BLOCK_SIZE];
in}
();
__syncthreadsint result = 0;
for (int offset = -RADIUS ; offset <= RADIUS ; offset++)
+= temp[lindex + offset];
result [gindex] = result;
out}
```

and here’s another, expressed as an “easy induction” recurrence relation, from Richard E Ladner and Michael J Fischer. Parallel prefix computation:

Sweet lord. No wonder we’re all stuck pretending our computer machines are single threaded behemoths from the 1960s. Taking full advantage of parallelism on modern CPUs must require a research team and five years!

But it’s worth taking a moment and thinking about what all of this janky indexing must be doing. Whatever algorithm is telling the programmer which indices to write where necessarily must be providing a view on the data. That is, the programmer has some sort of “shape” in mind for how the problem should be subdivided, and the indexing is an implementation of accessing the raw array elements in the desired shape.

At risk of beating you on the head with it, this array indexing is *a bad implementation of a type system.* Bad because it’s something the implementer needed to invent by hand, and is not in any form that the compiler can help ensure the correctness of.

That returns us to the big contribution of *Generic Function Parallel Algorithms,* which is a technique for decoupling the main thrust of an algorithm from extentionally-inconsequential encodings of things. The idea is to implement the algorithm on lots of trivial data structures, and then compose those small pieces together to get a *class* of algorithms.

The first step is to determine which trivial data structures we need to support. Following the steps of Haskell’s `GHC.Generics`

module, we can decompose any Haskell98 data type as compositions of the following pieces:

data Rep : Set₁ where V : Rep U : Rep K : Set → Rep Par : Rep Rec : (Set → Set) → Rep _:+:_ : Rep → Rep → Rep _:*:_ : Rep → Rep → Rep _:∘:_ : Rep → Rep → Rep

which we can embed in Set via Represent:

open import Data.Empty open import Data.Sum open import Data.Unit hiding (_≤_) record Compose (F G : Set → Set) (A : Set) : Set where constructor compose field composed : F (G A) open Compose Represent : Rep → Set → Set Represent V a = ⊥ Represent U a = ⊤ Represent (K x) a = x Represent Par a = a Represent (Rec f) a = f a Represent (x :+: y) a = Represent x a ⊎ Represent y a Represent (x :*: y) a = Represent x a × Represent y a Represent (x :∘: y) a = Compose (Represent x) (Represent y) a

If you’ve ever worked with `GHC.Generics`

, none of this should be very exciting. We can bundle everything together, plus an iso to transform to and from the Represented type:

record Generic (F : Set → Set) : Set₁ where field RepOf : Rep from : F A → Represent RepOf A to : Represent RepOf A → F A open Generic ⦃ ... ⦄ GenericRep : (F : Set → Set) → ⦃ Generic F ⦄ → Set → Set GenericRep _ = Represent RepOf

Agda doesn’t have any out-of-the-box notion of `-XDeriveGeneric`

, which seems like a headache at first blush. It means we need to explicitly write out a RepOf and from/to pairs by hand, *like peasants.* Surprisingly however, needing to implement by hand is beneficial, as it reminds us that RepOf *is not uniquely determined!*

A good metaphor here is the number 16, which stands for some type we’d like to generify. A RepOf for 16 is an equivalent representation for 16. Here are a few:

- $2+(2+(2+(2+(2+(2+(2+2))))))$
- $((2+2)*2)+(((2+2)+2)+2)$
- $2 \times 8$
- $8 \times 2$
- $(4 \times 2) \times 2$
- $(2 \times 4) \times 2$
- $4 \times 4$
- $2^4$
- $2^{2^2}$

And there are lots more! Each of $+$, $\times$ and exponentiation corresponds to a different way of building a type, so every one of these expressions is a distinct (if isomorphic) type with 16 values. Every single possible factoring of 16 corresponds to a different way of dividing-and-conquering, which is to say, a different (but related) algorithm.

The trick is to define our algorithm inductively over each Set that can result from Represent. We can then pick different algorithms from the class by changing the specific way of factoring our type.

Let’s consider the case of left scans. I happen to know it’s going to require Functor capabilities, so we’ll also define that:

record Functor (F : Set 𝓁 → Set 𝓁) : Set (lsuc 𝓁) where field fmap : {A B : Set 𝓁} → (A → B) → F A → F B record LScan (F : Set → Set) : Set₁ where field overlap ⦃ func ⦄ : Functor F lscan : ⦃ Monoid A ⦄ → F A → F A × A open Functor ⦃ ... ⦄ open LScan ⦃ ... ⦄

What’s with the type of lscan? This thing is an exclusive scan, so the first element is always mempty, and thus the last elemenet is always returned as proj₂ of lscan.

We need to implement LScan for each Representation, and because there is no global coherence requirement in Agda, we can define our Functor instances at the same time.

The simplest case is void which we can scan because we have a ⊥ in negative position:

instance lV : LScan (\a → ⊥) lV .func .fmap f x = ⊥-elim x lV .lscan ()

⊤ is also trivial. Notice that there isn’t any `a`

inside of it, so our final accumulated value must be mempty:

lU : LScan (\a → ⊤) lU .func .fmap f x = x lU .lscan x = x , mempty

The identity functor is also trivial. Except this time, we *do* have a result, so it becomes the accumulated value, and we replace it with how much we’ve scaned thus far (nothing):

lP : LScan (\a → a) lP .func .fmap f = f lP .lscan x = mempty , x

Coproducts are uninteresting; we merely lift the tag:

l+ : ⦃ LScan F ⦄ → ⦃ LScan G ⦄ → LScan (\a → F a ⊎ G a) l+ .func .fmap f (inj₁ y) = inj₁ (fmap f y) l+ .func .fmap f (inj₂ y) = inj₂ (fmap f y) l+ .lscan (inj₁ x) = let x' , y = lscan x in inj₁ x' , y l+ .lscan (inj₂ x) = let x' , y = lscan x in inj₂ x' , y

And then we come to the interesting cases. To scan the product of `F`

and `G`

, we notice that every left scan of `F`

is a prefix of `F × G`

(because `F`

is on the left.) Thus, we can use `lscan F`

directly in the result, and need only adjust the results of `lscan G`

with the accumulated value from `F`

:

l* : ⦃ LScan F ⦄ → ⦃ LScan G ⦄ → LScan (\a → F a × G a) l* .func .fmap f x .proj₁ = fmap f (x .proj₁) l* .func .fmap f x .proj₂ = fmap f (x .proj₂) l* .lscan (f-in , g-in) = let f-out , f-acc = lscan f-in g-out , g-acc = lscan g-in in (f-out , fmap (f-acc <>_) g-out) , f-acc <> g-acc

l* is what makes the whole algorithm parallel. It says we can scan `F`

and `G`

in parallel, and need only a single join node at the end to stick `f-acc <>_`

on at the end. This parallelism is visible in the `let`

expression, where there is no data dependency between the two bindings.

Our final generic instance of LScan is over composition. Howevef, we can’t implement LScan for every composition of functors, since we require the ability to “zip” two functors together. The paper is pretty cagey about exactly what `Zip`

is, but after some sleuthing, I think it’s this:

record Zip (F : Set → Set) : Set₁ where field overlap ⦃ func ⦄ : Functor F zip : {A B : Set} → F A → F B → F (A × B) open Zip ⦃ ... ⦄

That looks a lot like being an applicative, but it’s missing `pure`

and has some weird idempotent laws that are not particularly relevant today. We can define some helper functions as well:

zipWith : ∀ {A B C} → ⦃ Zip F ⦄ → (A → B → C) → F A → F B → F C zipWith f fa fb = fmap (uncurry f) (zip fa fb) unzip : ⦃ Functor F ⦄ → {A B : Set} → F (A × B) → F A × F B unzip x = fmap proj₁ x , fmap proj₂ x

Armed with all of this, we can give an implementation of lscan over functor composition. The idea is to lscan each inner functor, which gives us an `G (F A × A)`

. We can then unzip that, whose second projection is then the totals of each inner scan. If we scan these *totals*, we’ll get a running scan for the whole thing; and all that’s left is to adjust each.

instance l∘ : ⦃ LScan F ⦄ → ⦃ LScan G ⦄ → ⦃ Zip G ⦄ → LScan (Compose G F) l∘ .func .fmap f = fmap f l∘ .lscan (compose gfa) = let gfa' , tots = unzip (fmap lscan gfa) tots' , tot = lscan tots adjustl t = fmap (t <>_) in compose (zipWith adjustl tots' gfa') , tot

And we’re done! We now have an algorithm defined piece-wise over the fundamental ADT building blocks. Let’s put it to use.

Let’s pretend that Vecs are random access arrays. We’d like to be able to build array algorithms out of our algorithmic building blocks. To that end, we can make a typeclass corresponding to types that are isomorphic to arrays:

open import Data.Nat open import Data.Vec hiding (zip; unzip; zipWith) record ArrayIso (F : Set → Set) : Set₁ where field Size : ℕ deserialize : Vec A Size → F A serialize : F A → Vec A Size -- also prove it's an iso open ArrayIso ⦃ ... ⦄

There are instances of ArrayIso for the functor building blocks (though none for :+: since arrays are big records.) We can now use an ArrayIso and an LScan to get our desired parallel array algorithms:

genericScan : ⦃ Monoid A ⦄ → (rep : Rep) → ⦃ d : ArrayIso (Represent rep) ⦄ → ⦃ LScan (Represent rep) ⦄ → Vec A (Size ⦃ d ⦄) → Vec A (Size ⦃ d ⦄) × A genericScan _ ⦃ d = d ⦄ x = let res , a = lscan (deserialize x) in serialize ⦃ d ⦄ res , a

I think this is the first truly dependent type I’ve ever written. We take a Rep corresponding to how we’d like to divvy up the problem, and then see if the Represent of that has ArrayIso and LScan instances, and then give back an algorithm that scans over arrays of the correct Size.

Finally we’re ready to try this out. We can give the RList implementation from earlier:

▷_ : Rep → Rep ▷_ a = Par :*: a _ : ⦃ Monoid A ⦄ → Vec A 4 → Vec A 4 × A _ = genericScan (▷ ▷ ▷ Par)

or the LList instance:

_◁ : Rep → Rep _◁ a = a :*: Par _ : ⦃ Monoid A ⦄ → Vec A 4 → Vec A 4 × A _ = genericScan (Par ◁ ◁ ◁)

But we can also come up with more interesting strategies as well. For example, we can divvy up the problem by left-associating the first half, and right-associating the second:

_ : ⦃ Monoid A ⦄ → Vec A 8 → Vec A 8 × A _ = genericScan ((Par ◁ ◁ ◁) :*: (▷ ▷ ▷ Par))

This one probably isn’t an *efficient* algorithm, but it’s cool that we can express such a thing so succinctly. Probably of more interest is a balanced tree over our array:

_ : ⦃ Monoid A ⦄ → Vec A 16 → Vec A 16 × A _ = let ⌊_⌋ a = a :*: a in genericScan (⌊ ⌊ ⌊ ⌊ Par ⌋ ⌋ ⌋ ⌋)

The balanced tree over products is interesting, but what if we make a balanced tree over *composition?* In essence, we can split the problem into chunks of $2^(2^n)$ amounts of work via Bush:

{-# NO_POSITIVITY_CHECK #-} data Bush : ℕ → Set → Set where twig : A × A → Bush 0 A bush : {n : ℕ} → Bush n (Bush n A) → Bush (suc n) A

Which we won’t use directly, but can use it’s Rep:

_ : ⦃ Monoid A ⦄ → Vec A 16 → Vec A 16 × A _ = let pair = Par :*: Par in genericScan ((pair :∘: pair) :∘: (pair :∘: pair))

The paper compares several of these strategies for dividing-and-conquering. In particular, it shows that we can minimize total work via a left-associated ⌊_⌋ strategy, but maximize parallelism with a *right*-associated ⌊_⌋. And using the `Bush`

from earlier, we can get a nice middle ground.

The paper follows up, applying this approach to implementations of the fast fourier transform. There, the Bush approach gives constant factor improvments for both *work* and *parallelism,* compared to all previously known algorithms.

Results like these are strong evidence that Elliott is *actually onto something* with his seemingly crazy ideas that computation should be elegant and well principled. Giving significant constant factor improvements to well-known, extremely important algorithms *mostly for free* is a true superpower, and is worth taking extremely seriously.

Andrew McKnight and I tried to use this same approach to get a nice algorithm for sorting, hoping that we could get well-known sorting algorithms to fall out as special cases of our more general functor building blocks. We completely failed on this front, namely because we couldn’t figure out how to give an instance for product types. Rather alarmingly, we’re not entirely sure *why* the approach failed there; maybe it was just not thinking hard enough.

Another plausible idea is that sorting requires branching, and that this approach only works for statically-known codepaths.

Andrew and I spent a good chunk of the week thinking about this problem, and we figure there are solid odds that you could *automatically* discover these generic algorithmic building blocks from a well-known algorithm. Here’s the sketch:

Use the well-known algorithm as a specification, instantiate all parameters at small types and see if you can find instances of the algorithm for the functor building blocks that agree with the spec. It seems like you should be able to use factorization of the input to target which instances you’re looking for.

Of course, once you have the algorithmic building blocks, conventional search techniques can be used to optimize any particular goal you might have.

]]>We might as well dive in. Since all of this complexity analysis stuff shouldn’t *change* anything at runtime, we really only need to stick the analysis in the types, and can erase it all at runtime.

The paper thus presents its main tools in an `abstract`

block, which is a new Agda feature for me. And wow, does Agda ever feel like it’s Haskell but from the future. An `abstract`

block lets us give some definitions, which *inside* the `abstract`

block can be normalized. But outside the block, they are opaque symbols that are just what they are. This is a delightful contrast to Haskell, where we need to play a game of making a new module, and carefully not exporting things in order to get the same behavior. And even then, in Haskell, we can’t give opaque `type`

synonyms or anything like that.

Anyway, the main type in the paper is Thunk, which tracks how many computation steps are necessary to produce an eventual value:

abstract Thunk : ℕ → Set → Set Thunk n a = a

Because none of this exists at runtime, we can just ignore the `n`

argument, and use the `abstract`

ion barrier to ensure nobody can use this fact in anger. Thunk is a *graded* monad, that is, a monad parameterized by a monoid, which uses `mempty`

for `pure`

, and `mappend`

for binding. We can show that Thunk does form a graded monad:

pure : a → Thunk 0 a pure x = x infixl 1 _>>=_ _>>=_ : Thunk m a → (a → Thunk n b) → Thunk (m + n) b x >>= f = f x infixr 1 _=<<_ _=<<_ : (a → Thunk n b) → Thunk m a → Thunk (m + n) b f =<< x = f x

We’ll omit the proofs that Thunk really is a monad, but it’s not hard to see; Thunk is truly just the identity monad.

Thunk is also equipped with two further operations; the ability to mark a computation cycle, and the ability to extract the underlying value by throwing away the complexity analysis:

infixr 0 !_ !_ : Thunk n a → Thunk (suc n) a !_ a = a force : {a : Set} → Thunk n a → a force x = x

Here, !_ is given a low, right-spanning precedence, which means it’s relatively painless to annotate with:

_ : Thunk 3 ℕ _ = ! ! ! pure 0

Our definitions are “opt-in,” in the sense that the compiler won’t yell at you if you forget to call !_ somewhere a computational step happens. Thus, we require users to follow the following conventions:

- Every function body must begin with a call to !_.
- force may not be used in a function body.
- None of pure, _>>=_ nor !_ may be called partially applied.

The first convention ensures we count everything that should be counted. The second ensures we don’t cheat by discarding complexity information before it’s been counted. And the third ensures we don’t accidentally introduce uncounted computation steps.

The first two are pretty obvious, but the third is a little subtler. Under the hood, partial application gets turned into a lambda, which introduces a computation step to evaluate. But that step won’t be ticked via !_, so we will have lost the bijection between our programs and their analyses.

The paper shows us how to define a lazy vector. VecL `a c n`

is a vector of `n`

elements of type `a`

, where the cost of forcing each subsequent tail is `c`

:

{-# NO_POSITIVITY_CHECK #-} data VecL (a : Set) (c : ℕ) : ℕ → Set where [] : VecL a c 0 _∷_ : a → Thunk c (VecL a c n) → VecL a c (suc n) infixr 5 _∷_

Let’s try to write fmap for VecL. We’re going to need a helper function, which delays a computation by artificially inflating its number of steps:

abstract wait : {n : ℕ} → Thunk m a → Thunk (n + m) a wait m = m

(the paper follows its own rules and ensures that we call !_ every time we wait, thus it comes with an extra suc in the type of wait. It gets confusing, so we’ll use this version instead.)

Unfortunately, the paper also plays fast and loose with its math. It’s fine, because the math is right, but the code presented in the paper doesn’t typecheck in Agda. As a workaround, we need to enable rewriting:

open import Agda.Builtin.Equality.Rewrite {-# REWRITE +-suc +-identityʳ #-}

We’ll also need to be able to lift equalities over the `Thunk`

time bounds:

cast : m ≡ n → Thunk m a → Thunk n a cast eq x rewrite eq = x

Finally, we can write fmap:

fmap : {c fc : ℕ} → (a → Thunk fc b) → VecL a c n → Thunk (2 + fc) (VecL b (2 + fc + c) n) fmap f [] = wait (pure []) fmap {c = c} f (x ∷ xs) = ! f x >>= \x' → ! pure (x' ∷ cast (+-comm c _) (xs >>= fmap f))

This took me about an hour to write; I’m not convinced the approach here is as “lightweight” as claimed. Of particular challenge was figuring out the actual time bounds on this thing. The problem is that we usually reason about asymptotics via Big-O notation, which ignores all of these constant factors. What would be nicer is the hypothetical type:

```
fmap
: {c fc : ℕ}
Thunk (O fc) b)
→ (a → VecL a c n
→ Thunk (O c) (VecL b (O (fc + c)) n) →
```

where every thunk is now parameterized by `O x`

saying our asymptotics are bounded by `x`

. We’ll see about fleshing this idea out later. For now, we can power through on the paper, and write vector insertion. Let’s assume we have a constant time comparison function for a:

postulate _<=_ : a → a → Thunk 1 Bool

First things first, we need another waiting function to inflate the times on every tail:

waitL : {c' : ℕ} {c : ℕ} → VecL a c' n → Thunk 1 (VecL a (2 + c + c') n) waitL [] = ! pure [] waitL (x ∷ xs) = ! pure (x ∷ wait (waitL =<< xs))

and a helper version of if_then_else_ which accounts in Thunk:

if_then_else_ : Bool → a → a → Thunk 1 a if false then t else f = ! pure f if true then t else f = ! pure t infixr 2 if_then_else_

we can thus write vector insertion:

insert : {c : ℕ} → a → VecL a c n → Thunk 4 (VecL a (4 + c) (suc n)) insert x [] = wait (pure (x ∷ wait (pure []))) insert x (y ∷ ys) = ! x <= y >>= \b → ! if b then x ∷ wait (waitL (y ∷ ys)) else y ∷ (insert x =<< ys)

The obvious followup to insert is insertion sort:

open import Data.Vec using (Vec; []; _∷_; tail) sort : Vec a n → Thunk (1 + 5 * n) (VecL a (4 * n) n) sort [] = ! pure [] sort (x ∷ xs) = ! insert x =<< sort xs

This thing looks linear, but insertion sort is $O(n^2)$, so what gives? The thing to notice is that the cost of each *tail* is linear, but we have $O(n)$ tails, so forcing the whole thing indeed works out to $O(n^2)$. We can now show head runs in constant time:

head : {c : ℕ} → VecL a c (suc n) → Thunk 1 a head (x ∷ _) = ! pure x

and that we can find the minimum element in linear time:

minimum : Vec a (suc n) → Thunk (8 + 5 * n) a minimum xs = ! head =<< sort xs

Interestingly, Agda can figure out the bounds on minimum by itself, but not any of our other functions.

The paper goes on to show that we can define last, and then get a quadratic-time `maximum`

using it:

last : {c : ℕ} → VecL a c (suc n) → Thunk (1 + suc n * suc c) a last (x ∷ xs) = ! last' x =<< xs where last' : {c : ℕ} → a → VecL a c n → Thunk (1 + n * suc c) a last' a [] = ! pure a last' _ (x ∷ xs) = ! last' x =<< xs

Trying to define `maximum`

makes Agda spin, probably because of one of my rewrite rules. But here’s what it should be:

```
maximum : Vec a (suc n) → Thunk (13 + 14 * n + 4 * n ^ 2) a
maximum xs = ! last =<< sort xs
```

The paper goes on to say some thinks about partially evaluating thunks, and then shows its use to measure some popular libraries. But I’m more interested in making the experience better.

Clearly this is all too much work. When we do complexity analysis by hand, we are primarily concerned with *complexity classes,* not exact numbers of steps. How hard would it be to generalize all of this so that `Thunk`

takes a function bounding the runtime necessary to produce its value?

First, a quick refresher on what big-O means. A function $f : \mathbb{N} \to \mathbb{N}$ is said to be in $O(g)$ for some $g : \mathbb{N} \to \mathbb{N}$ iff:

$\exists (C k : \mathbb{N}). \forall (n : \mathbb{N}, k \leq n). f(n) \leq C \cdot g(n)$

That is, there is some point $k$ at which $g(n)$ stays above $f(n)$. This is the formal definition, but in practice we usually play rather fast and loose with our notation. For example, we say “quicksort is $O(n\cdot\log{n})$ in the length of the list”, or “$O(n\cdot\log{m})$ , where $m$ is the size of the first argument.”

We need to do a bit of elaboration here to turn these informal statements into a formal claim. In both cases, there should are implicit binders inside the $O(-)$, binding $n$ in the first, and $m, n$ in the second. These functions then get instantiated with the actual sizes of the lists. It’s a subtle point, but it needs to be kept in mind.

The other question is how the hell do we generalize that definition to multiple variables? Easy! We replace $n : \mathbb{N}, k \leq n$ with a vector of natural numbers, subject to the constraint that they’re *all* bigger than $k$.

OK, let’s write some code. We can give the definition of O:

open import Data.Vec.Relation.Unary.All using (All; _∷_; []) renaming (tail to tailAll) record O {vars : ℕ} (g : Vec ℕ vars → ℕ) : Set where field f : Vec ℕ vars → ℕ C : ℕ k : ℕ def : (n : Vec ℕ vars) → All (k ≤_) n → f n ≤ C * g n

The generality of O is a bit annoying for the common case of being a function over one variable, so we can introduce a helper function O':

hoist : {a b : Set} → (a → b) → Vec a 1 → b hoist f (x ∷ []) = f x O' : (ℕ → ℕ) → Set O' f = O (hoist f)

We can trivially lift any function `f`

into O `f`

:

O-build : {vars : ℕ} → (f : Vec ℕ vars → ℕ) → O f O-build f .O.f = f O-build f .O.C = 1 O-build f .O.k = 0 O-build f .O.def n x = ≤-refl

and also trivially weaken an O into using more variables:

O-weaken : ∀ {vars} {f : Vec ℕ vars → ℕ} → O f → O (f ∘ tail) O-weaken o .O.f = o .O.f ∘ tail O-weaken o .O.C = o .O.C O-weaken o .O.k = o .O.k O-weaken o .O.def (_ ∷ x) (_ ∷ eq) = o .O.def x eq

More interestingly, we can lift a given O' into a higher power, witnessing the fact that eg, something of $O(n^2)$ is also $O(n^3)$:

O-^-suc : {n : ℕ} → O' (_^ n) → O' (_^ suc n) O-^-suc o .O.f = o .O.f O-^-suc o .O.C = o .O.C O-^-suc o .O.k = suc (o .O.k) O-^-suc {n} o .O.def xs@(x ∷ []) ps@(s≤s px ∷ []) = begin f xs ≤⟨ def xs (≤-step px ∷ []) ⟩≤ C * (x ^ n) ≤⟨ *-monoˡ-≤ (x ^ n) (m≤m*n C (s≤s z≤n)) ⟩≤ (C * x) * (x ^ n) ≡⟨ *-assoc C x (x ^ n) ⟩≡ C * (x * (x ^ n)) ∎ where open O o open ≤-Reasoning

However, the challenge is and has always been to simplify the construction of Thunk bounds. Thus, we’d like the ability to remove low-order terms from Os. We can do this by eliminating $n^k$ whenever there is a $n^{k'}$ term around with $k \leq k'$:

postulate O-drop-low : {z x y k k' : ℕ} → k ≤ k' → O' (\n → z + x * n ^ k + y * n ^ k') → O' (\n → z + n ^ k')

The `z`

variable here lets us compose O-drop-low terms, by subsequently instantiating

As a special case, we can eliminate constant terms via O-drop-low by first expanding constant terms to be coefficients of $n^0$:

O-drop-1 : {x y k : ℕ} → O' (\n → x + y * n ^ k) → O' (\n → n ^ k) O-drop-1 {x} {y} {k} o rewrite sym (*-identityʳ x) = O-drop-low {0} {x} {y} {k = 0} {k} z≤n o

With these functions, we can now easily construct O' values for arbitrary one-variable functions:

_ : O' (_^ 1) _ = O-drop-1 {4} {5} {1} $ O-build $ hoist \n → 4 + 5 * n ^ 1 _ : O' (_^ 2) _ = O-drop-1 {4} {1} {2} $ O-drop-low {4} {5} {3} {1} {2} (s≤s z≤n) $ O-build $ hoist \n → 4 + 5 * n ^ 1 + 3 * n ^ 2

Finally, we just need to build a version of Thunk that is adequately lifted over the same functions we use for O:

abstract OThunk : {vars : ℕ} → (Vec ℕ vars → ℕ) → Set → Set OThunk _ a = a OThunk' : (ℕ → ℕ) → Set → Set OThunk' f = OThunk (hoist f)

The limit function can be used to lift a Thunk into an OThunk:

limit : {vars : ℕ} {f : Vec ℕ vars → ℕ} {a : Set} → (v : Vec ℕ vars) → (o : O f) → Thunk (o .O.f v) a → OThunk f a limit _ _ x = x

and we can now give an asymptotic bound over sort:

```
: O' (_^ 1)
o2 = O-drop-1 {1} {5} {1} $ O-build $ hoist \n -> 1 + 5 * n
o2
: Vec a n → OThunk' (_^ 1) (VecL a (4 * n) n)
linearHeadSort = n} v = limit (n ∷ []) o2 $ sort v linearHeadSort {n
```

I’m traveling right now, and ran out of internet on publication day, which means I don’t have a copy of the paper in front of me as I write this (foolish!) Overall, the paper is slightly interesting, though I don’t think there’s anything especially novel here. Sticking the runtime behavior into the type is pretty much babby’s first example of graded monads, and we don’t even get asymptotics out of it! Instead we need to push big polynomials around, and explicitly call wait to make different branches work out.

The O stuff I’ve presented here alleviates a few of those problems; as it allows us to relatively-easily throw away the polynomials and just work with the highest order terms. A probably better approach would be to throw away the functions, and use a canonical normalizing-form to express the asymptotes. Then we could define a $\lub$ operator over OThunks, and define:

`>>=_ : OThunk f a → (a → OThunk g b) → OThunk (f ⊔ g) b _`

to let us work compositionally in the land of big O.

My biggest takeaway here is that the techniques described in this paper are probably not powerful enough to be used in anger. Or, at least, not if you actually want to get any work done. Between the monads, polynomials, and waiting, the experience could use a lot of TLC.

]]>A while back I reviewed some paper (maybe codata? — too lazy to check) and came away thinking “I should learn more about presheaves.” The first paper I found is A Very Elementary Introduction to Sheaves by Mark Agrios, and mildly interestingly, was published less than three weeks ago.

The paper is called “very elementary,” and in the first sentence states it “is a very non-rigorous, loose, and extremely basic introduction to sheaves,” and it delivers on these promises. There is a section on metaphorically what a sheaf is, and then two somewhat-worked examples.

After reading through the paper, I feel like I have a very rough idea of what a sheaf is, and thought that this would be an excellent opportunity to flex my category theory muscles. That is, can I correctly generalize from these two examples to a solid category theoretical definition of a sheaf? I’m not sure, but this is a unique opportunity, so it’s worth a shot.

The central metaphor of the paper is that a sheaf enriches some mathematical structure, much like a garden enriches a plot of dirt. There are lots of gardens you could make on a plot of dirt, and then you can harvest things from them. I guess this makes sense to the author, but it doesn’t particularly help me. I suspect this is an example of the monad tutorial fallacy in the wild: after thinking really hard about an idea for a while, the author came up with a metaphor that really works for them. But, this metaphor is more an artifact of their thinking process than it is descriptive of the idea itself. Anyway, either way, I wasn’t able to extract much meaning here.

We can build a (pre-?)sheaf over a graph. By playing fast and loose with our types like mathematicians are so wont to do, we can model the edge $e_{ij} : V_i \to V_j$ in a graph as an “intersection of the nodes it connects.” The paper writes $e_{ij} < v_i, v_j$. I’m not super sure what that means, but I think it’s saying that given some graph $G = (V, E)$, we can say $e_{ij} \subseteq v_i \cup v_j$? Except that this doesn’t typecheck, since `v_i`

is an element of a set, not a set itself. I don’t know.

Anyway, the important thing here seems to be that there is a preorder between edges and vertices. So let’s quickly define a `Preorder`

:

record Preorder : Set where field Carrier : Set _<_ : Carrier → Carrier → Set <-refl : (a : Carrier) → a < a <-trans : {a b c : Carrier} → a < b → b < c → a < c

and then just forget about the whole graph thing, because I am not convinced it is a meaningful presentation. Instead, we’ll cheat, and just build exactly the object we want to discuss.

data Ex : Set where v1 : Ex v2 : Ex e12 : Ex

corresponding to this rather boring graph:

We can then build a Preorder on Ex with explicit cases for e12 being less than its vertices:

data Ex< : Ex → Ex → Set where e12<v1 : Ex< e12 v1 e12<v2 : Ex< e12 v2

and two cases to satisfy the preorder laws:

ex<-refl : (x : Ex) → Ex< x x

and then mechanically hook everything up:

module _ where open Preorder ex-preorder : Preorder ex-preorder .Carrier = Ex ex-preorder ._<_ = Ex< ex-preorder .<-refl = ex<-refl ex-preorder .<-trans e12<v1 (ex<-refl .v1) = e12<v1 ex-preorder .<-trans e12<v2 (ex<-refl .v2) = e12<v2 ex-preorder .<-trans (ex<-refl _) e12<v1 = e12<v1 ex-preorder .<-trans (ex<-refl _) e12<v2 = e12<v2 ex-preorder .<-trans (ex<-refl x) (ex<-refl _) = ex<-refl x

The paper goes on to say we have some sheaf `F`

, which maps Exs to “just about anything,” this codomain being called the *stalk.* For now, let’s assume it’s to Set.

Furthermore, the sheaf `F`

also has a “second mechanism,” which in our example maps an edge $e_{ij} : v_i \to v_j$ to two functions:

$F_{v_i;e_{ij}} : F(v_i) \to F(e_{ij}) \\ F_{v_j;e_{ij}} : F(v_j) \to F(e_{ij})$

This is where some of the frustration in only being given examples comes in. Why are these in the definition of a sheaf? The only thing that could possibly make any sense to me is that this comes from a more general construction:

`restrict : (x y : Ex) → x < y → Stalk y → Stalk x`

which states we have a mapping from $F(y)$ to $F(x)$ if and only if we have $x < y$. These `restrict`

things are called *restriction maps*.

What’s further confusing is the following point:

Since each stalk is a vector space, it is natural to have our restriction maps be linear transformations described by matrices.

Why linear transformations, and not just arbitrary functions? When I hear “linear transformation” I think homomorphism, or more probably, morphism in some category. Which then probably means the `Stalk`

isn’t a function to Set, it’s a mapping into a category.

OK, so that all seems straightforward enough. Let’s try to formalize it.

module Sheaf (pre : Preorder) (C : Category) where open Preorder pre open Category C record Sheaf : Set where field Stalk : Carrier → Obj restrict : {x y : Carrier} → x < y → Stalk y ~> Stalk x

which seems reasonable. The paper now gives us a specific sheaf, with restrict e12<v1 being the linear map encoded by the matrix:

$\begin{bmatrix} 1 & -1 \\ 0 & 2 \end{bmatrix}$

which we can write as a morphism in LIN (the category of linear algebra, with objects as vector spaces, and morphisms as linear maps):

e12~>v1 : 2 ~> 2 e12~>v1 .linmap (x ∷ y ∷ []) = (x - y) ∷ (+ 2 * y) ∷ [] e12~>v1 .preserves-+ u v = trustMe e12~>v1 .preserves-* a v = trustMe

and restrict e12<v2 being the linear map encoded by the matrix:

$\begin{bmatrix} 3 & 1 & -1 \\ 2 & 0 & 2 \end{bmatrix}$

written as:

e12~>v2 : 3 ~> 2 e12~>v2 .linmap (x ∷ y ∷ z ∷ []) = (+ 3 * x + y - z) ∷ (+ 2 * x + + 2 * z) ∷ [] e12~>v2 .preserves-+ u v = trustMe e12~>v2 .preserves-* a v = trustMe

Thus, we can finally build the example `Sheaf`

:

ex : Sheaf ex .Stalk v1 = 2 ex .Stalk v2 = 3 ex .Stalk e12 = 2 ex .restrict e12<v1 = e12~>v1 ex .restrict e12<v2 = e12~>v2 ex .restrict (ex<-refl z) = id

What’s with the Stalk of v1 being 2, you might ask? Remember, the stalk is an object in some category, in this case LIN. Objects in LIN are natural numbers, corresponding to the length of vectors.

Here’s where our categorical generalization of the paper goes a bit haywire. The paper defines a *section* as picking an element from each Stalk of the sheaf. He picks, for example:

$\begin{bmatrix} 2 \\ 1 \end{bmatrix} \in \text{Stalk } v1$

$\begin{bmatrix} 3 \\ -1 \\ 0 \end{bmatrix} \in \text{Stalk } v2$

and

$\begin{bmatrix} 1 \\ -1 \end{bmatrix} \in \text{Stalk } e12$

which is all fine and dandy, except that when we categorize, our objects no longer have internal structure. Fortunately, we can use “generalized elements,” a.k.a., morphisms out of the terminal object.

Section : Carrier → Set Section c = terminal ~> Stalk c

That is, a Section is a mapping from every element in the Preorder to a generalized element of its Stalk. We can evaluate a Section by checking the commutativity of all restricts. That is, we’d like the following diagram to commute:

Doing this in Agda is hard because it wants lots of dumb arithmetic proofs, so instead we’ll make ourselves content with some by-hand math:

$r \circ S v1 = \begin{bmatrix} 1 & -1 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} 2 \\ 1 \end{bmatrix} = \begin{bmatrix} 1 \\ 2 \end{bmatrix} \neq \begin{bmatrix} 1 \\ -1 \end{bmatrix}$

So, our chosen Section doesn’t commute. That is, it doesn’t respect the global equalities, thus it is not a *global section.* Sounds like something worth formalizing:

record GlobalSection : Set where field section : forall (c : Carrier) → Section c commutes : {x y : Carrier} → (x<y : x < y) → restrict x<y ∘ section y ≈ section x

All that’s left is to find a GlobalSection of our weird graph category:

Unfortunately, this formalization doesn’t quite work out; there are no interesting arrows out of terminal:

boring-arrows : (f : 0 ~> 1) → (x : Vec ℤ 0) → f .linmap x ≡ + 0 ∷ [] boring-arrows f [] with f .linmap [] in eq ... | x ∷ [] rewrite sym eq = begin f .linmap [] ≡⟨⟩ f .linmap (map (+ 0 *_) []) ≡⟨ f .preserves-* (+ 0) _ ⟩≡ map (+ 0 *_) (f .linmap []) ≡⟨ cong (map (+ 0 *_)) eq ⟩≡ map (+ 0 *_) (x ∷ []) ≡⟨⟩ (+ 0 * x) ∷ [] ≡⟨ cong (_∷ []) (*-zeroˡ +0) ⟩≡ +0 ∷ [] ∎ where open Eq.≡-Reasoning

So, that’s no good. We’ve modeled Section incorrectly, as the generalized element approach doesn’t work, since we are unable to follow the example.

What are some other ways to go from an Obj to a Set? Maybe we could try modeling this as a functor to SET instead:

ex-func : LIN => SET ex-func .F-Obj x = Vec ℤ x ex-func .F-map f = f .linmap ex-func .F-map-id _ _ = refl ex-func .F-map-∘ g f a = refl

And we can try again with `Section`

s:

and then we can say a `Section`

is an element of the action of Func:

Section : Carrier → Set Section c = F-Obj (Stalk c)

and a `GlobalSection`

, which recall, is a globally-coherent assignment of sections:

record GlobalSection : Set where field section : forall (c : Carrier) → Section c commutes : {x y : Carrier} → (x<y : x < y) → F-map (restrict x<y) (section y) ≡ section x

soln : GlobalSection soln .section v1 = + 2 ∷ + 1 ∷ [] soln .section v2 = -[1+ 1 ] ∷ + 10 ∷ + 3 ∷ [] soln .section e12 = + 1 ∷ + 2 ∷ [] soln .commutes e12<v1 = refl soln .commutes e12<v2 = refl soln .commutes (ex<-refl _) = refl

Sure enough, this was a global section:

$\begin{bmatrix} 2 \\ 1 \end{bmatrix} \in \text{Stalk } v1$

$\begin{bmatrix} -2 \\ 10 \\ 3 \end{bmatrix} \in \text{Stalk } v2$

and

$\begin{bmatrix} 1 \\ 2 \end{bmatrix} \in \text{Stalk } e12$

The paper presents a second example as well. Maybe it’s just that I’m less well-versed in the subject matter, but this example feels significantly more incoherent than the first. I tried to work through it, and the formalization above was sufficiently powerful to do what I needed, but I didn’t understand the example or what it was trying to accomplish. There was some Abelian group stuff that never actually got used.

Rather than clean this section up, I’m instead going to spend the time before my publication deadline writing about what I learned about pre-sheafs after hitting the wall, and asking for help.

So let’s talk about what all of this sheaf business above is trying to do. The ever helpful Reed Mullanix came to my rescue with a few helpful intuitions. To paraphrase him (if there are any mistakes in the following, they are my mistakes, not his):

Think about a sensor network. You have some physical space, with a series of sensors attached in specific places. Maybe you have a microphone in the hallway, and a camera at the front door, and a thermometer in the bedroom. Each of these sensors is

locally correct, that is, we can be reasonably sure that if the thermometer says 37C, it is in fact 37C.A presheaf is a mapping from this collection of sensors to a world in which we can reason about the total space. For example, we might want to get an idea of what’s going on in the basement, where we have no sensors, but which is part of our house nevertheless.

And a global section over that presheaf is a globally consistent take on the system. It’s some mapping into the hypothesis space that

agrees with all of the measurements.If we know it’s 37C in the bedroom, we’re probably not going to see snow in the front-door camera.

Okay, so what’s all this preorder stuff about? I think it’s actually just a poor man’s category. We can lift any preorder into a category by considering the `<`

relationship to be a morphism:

module PreorderToCategory (P : Preorder) where open Preorder P open Category open import Data.Unit using (⊤; tt) cat : Category cat .Obj = Carrier cat ._~>_ = _<_ cat ._≈_ f g = ⊤ cat .≈-equiv = sorry cat .id {A = A} = <-refl A cat ._∘_ g f = <-trans f g cat .∘-cong = λ _ _ → tt cat .id-r f = tt cat .id-l f = tt cat .∘-assoc h g f = tt

and now that we have a Category, we can avoid the whole Sheaf / GlobalSection by giving a functor into SET. Well, almost, because restrict goes the opposite direction. So instead, we can build an opposite category:

module Op (C : Category) where open Category data OpArr : Obj C → Obj C → Set where reverse : {X Y : Obj C} → C [ X , Y ] → OpArr Y X op : Category op .Obj = C .Obj op ._~>_ = OpArr op ._≈_ (reverse f) (reverse g) = C ._≈_ f g op .≈-equiv {A} {B} = sorry op .id = reverse (C .id) op ._∘_ (reverse g) (reverse f) = reverse (C ._∘_ f g) op .∘-cong = sorry op .id-r (reverse f) = C .id-l f op .id-l (reverse f) = C .id-r f op .∘-assoc (reverse h) (reverse g) (reverse f) = setoid C .isEquivalence .S.IsEquivalence.sym (C .∘-assoc f g h) where open import Relation.Binary.Bundles using (Setoid) open Setoid using (isEquivalence) import Relation.Binary.Structures as S

Now, we can express a presheaf as a functor:

module _ where open import Category.MyFunctor open Op Presheaf : Category → Set Presheaf C = op C => SET

or our specific example from earlier:

module _ where open PreorderToCategory ex-preorder open _=>_ open import Data.Nat using (ℕ) open Op Z : ℕ → Set Z = Vec ℤ ex' : Presheaf cat ex' .F-Obj v1 = Z 2 ex' .F-Obj v2 = Z 3 ex' .F-Obj e12 = Z 2 ex' .F-map (reverse e12<v1) = e12~>v1 .linmap ex' .F-map (reverse e12<v2) = e12~>v2 .linmap ex' .F-map (reverse (ex<-refl _)) a = a ex' .F-map-id A a = refl ex' .F-map-∘ (reverse e12<v1) (reverse (ex<-refl _)) a = refl ex' .F-map-∘ (reverse e12<v2) (reverse (ex<-refl _)) a = refl ex' .F-map-∘ (reverse (ex<-refl _)) (reverse e12<v1) a = refl ex' .F-map-∘ (reverse (ex<-refl _)) (reverse e12<v2) a = refl ex' .F-map-∘ (reverse (ex<-refl _)) (reverse (ex<-refl _)) a = refl

which leaves only the question of what a `GlobalSection`

is under this representation.

I got stumped on this one for a while too, but again, Reed to the rescue, who points out that in our preorder, `<`

corresponds to a “smaller” space. Thus, we want to find a mapping out of the biggest space, which corresponds to a top element in the order, or a terminal object in the category. The terminal object is going to be the “total space” in consideration (in our sensor example, eg.) and the functor laws will ensure consistency.

GlobalSection : {C : Category} → (pre : Presheaf C) → (t : HasTerminal C) → Set GlobalSection pre t = pre ._=>_.F-Obj (t .HasTerminal.terminal)

Unfortunately, this is a problem for our worked example — we don’t *have* a terminal object! But that’s OK, it’s easy to trivially construct one by just adding a top:

and by picking an object in SET to map it to for our presheaf. There are some interesting choices here; we could just pick ⊤, which is interesting in how boring a choice it is. Such a thing trivially satisfies all of the requirements, but it doesn’t tell us much about the world. This is the metaphorical equivalent of explaining our sensors’ readings as “anything is possible!”

More interestingly, we could pick `F-Obj terminal`

to be $\mathbb{Z}^2 × \mathbb{Z}^3 × \mathbb{Z}^2$, corresponding to the product of `F-Obj v1`

, `F-Obj v2`

and `F-Obj e12`

. We can satisfy the functor laws by projecting from the `F-Obj term`

down to one of its components. And, best of all, it gives us a place to stick the values from our worked example.

I’d love to code this up in more detail, but unfortunately I’m out of time. That’s the flaw of trying to get through one paper a week, the deadline is strict whether you’re ready for it or not.

This whole post is a literate Agda file.

]]>`(a + b) * (a + b) = a^2 + 2*a*b + b^2`

, which is rather amazing if you think about it. I got curious about how this is possible, and came across AaEIPEiA, quickly skimmed it for the rough approach, and then decided to write my own ring solver. As a result, this post is certainly inspired by AaEIPiA, but my implementation is extremely naive compared to the one presented in the paper. Kidney’s paper is very good, and I apologize for not doing it justice here.
So, some background. Agda lets you write types that correspond to equalities, and values of those types are proofs of those equalities. For example, we can write the following type:

`(x : ℕ) → (x + 1) * (x + 1) ≡ (x * x) + (1 + 1) * x + 1`

You probably wouldn’t write this for its own sake, but it might come up as a lemma of something else you’re trying to prove. However, actually proving this equality is a huge amount of busywork, that takes forever, and isn’t actually interesting because we all know that this equality holds. For example, the proof might look something like this:

```
begin(x + 1) * (x + 1)
(x + 1) x 1 ⟩
≡⟨ *-+-distrib (x + 1) * x + (x + 1) * 1
(\φ -> ((x + 1) * x + φ)) $ *-1-id-r (x + 1) ⟩
≡⟨ cong (x + 1) * x + (x + 1)
(\φ -> φ + (x + 1)) $ *-comm (x + 1) x ⟩
≡⟨ cong (x + 1) + (x + 1)
x * (\φ -> φ + (x + 1)) $ *-+-distrib x x 1 ⟩
≡⟨ cong (x * x + x * 1) + (x + 1)
≡⟨ ? ⟩-- kill me
≡⟨ ? ⟩(x * x) + (1 + 1) * x + 1
∎
```

It’s SO MUCH WORK to do *nothing!* This is not an interesting proof! A ring solver lets us reduce the above proof to:

```
begin(x + 1) * (x + 1)
≡⟨ solve ⟩(x * x) + (1 + 1) * x + 1
∎
```

or, even more tersely:

` solve`

So that’s the goal here. Automate stupid, boring proofs so that we as humans can focus on the interesting bits of the problem.

Why is this called a ring solver? I don’t exactly know, but a ring is some math thing. My guess is that it’s the abstract version of an algebra containing addition and multiplication, with all the usual rules.

And looking at it, sure enough! A ring is a set with two monoids on it, one corresponding to addition, and the other to multiplication. Importantly, we require that multiplication distributes over addition.

Rings technically have additive inverses, but I didn’t end up implementing (or needing them.) However, I did require commutativity of both addition and multiplication — more on this later.

The ring laws mean that algebra works in the way we expect arithmetic to work. We can shuffle things around, and probably all have enough experience solving these sorts of problems with pen and paper. But what’s the actual algorithm here?

At first blush, this sounds like a hard problem! It feels like we need to see if there’s a way to turn some arbitrary expression into some other arbitrary expression. And that is indeed true, but it’s made easier when you realize that polynomials have a normal form as a sum of products of descending powers. For example, this is in normal form:

`5*x^2 - 3*x + 0`

The problem thus simplifies to determining if two expressions have the same normal form. Thus, we can construct a proof that each expression is equal to its normal form, and then compose those proofs together to show the unnormalized forms are equal.

My implementation is naive, and only works for expressions with a single variable, but I think the approach generalizes if you can find a suitable normal form for multiple variables.

All of this sounds like a good tack, but the hard part is convincing ourselves (and perhaps more importantly, Agda,) that the stated relationship holds. As it happens, we require three equivalent types:

`A`

, the ring we’re actually trying to solve`Poly`

, a syntactic representation of the ring operations`Horner`

, the type of`A`

-normal forms

`Poly`

and `Horner`

are indexed by `A`

, but I’ve left that out for presentation purposes. Furthermore, they’re also both indexed by the degree of the polynomial, that is, the biggest power they contain. I’m not sure this was necessary, but it helped me make sure my math was right when I was figuring out how to multiply `Horner`

s.

At a high level, solving a ring equality is really a statement about how `A`

is related to `Poly`

and `Horner`

. We can construct an A-expression by substituting an `A`

for all the variables in a `Poly`

:

`: {n : ℕ} → Poly n → A → A construct `

and we can normalize any syntactic expression:

`: {n : ℕ} → Poly n → Horner n normalize `

thus we can solve a ring equation by hoisting a proof of equality of its normal forms into a proof of equality of its construction:

```
solve: {n : ℕ}
→ (x y : Poly n)
→ normalize x ≡ normalize y
→ (a : A)
→ construct x a ≡ construct y a
```

This approach is a bit underwhelming, since we need to explicitly construct syntactic objects (in `Poly`

) corresponding to the expressions we’re trying to solve (in `A`

). But this is something we can solve with Agda’s macro system, by creating the `Poly`

s by inspecting the actual AST, so we’ll consider the approach good enough. Today’s post is about understanding how to do ring solving, not about how to engineer a nice user-facing interface.

The actual implementation of `solve`

is entirely straight-forward:

```
=
solve x y eq a
begin
construct x a ≡⟨ construct-is-normal x a ⟩(normalize x) a ≡⟨ cong (\φ → evaluate φ a) eq ⟩
evaluate (normalize y) a ≡⟨ sym $ construct-is-normal y a ⟩
evaluate
construct y a ∎
```

given a lemma that `construct`

is equal to evaluating the normal form:

```
construct-is-normal: {N : ℕ}
→ (x : Poly N)
→ (a : A)
→ construct x a ≡ evaluate (normalize x) a
```

The implementation of this is pretty straightforward too, requiring only that we have `+`

and `*`

homomorphisms between `Horner`

and `A`

:

```
+A-+H-homo: ∀ {m n} j k a
→ evaluate {m} j a +A evaluate {n} k a ≡ evaluate (j +H k) a
*A-*H-homo: ∀ {m n} j k a
→ evaluate {m} j a *A evaluate {n} k a ≡ evaluate (j *H k) a
```

These two lemmas turn out to be the hard part.

Before we get into all of that, let’s first discuss what each of the types looks like. We have `Poly`

, which again, is an initial encoding of the ring algebra:

```
data Poly : ℕ → Set where
: A → Poly 0
con : Poly 1
var _:+_ : {m n : ℕ} → Poly m → Poly n → Poly (m ⊔ n)
_:*_ : {m n : ℕ} → Poly m → Poly n → Poly (m + n)
```

We can reify the meaning of `Poly`

by giving a transformation into `A`

:

```
: {N : ℕ} → Poly N → A → A
construct (con x) a = x
construct = a
construct var a (p :+ p2) a = construct p a +A construct p2 a
construct (p :* p2) a = construct p a *A construct p2 a construct
```

Our other core type is `Horner`

, which is an encoding of the Horner normal form of a polynomial:

```
data Horner : ℕ → Set where
: A → Horner 0
PC : {n : ℕ} → A → Horner n → Horner (suc n) PX
```

`Horner`

requires some discussion. Horner normal form isn’t the same normal form presented earlier, instead, it’s a chain of linear multiplications. For example, we earlier saw this:

`5*x^2 - 3*x + 0`

in Horner normal form, this would be written as

`0 + x * (3 + x * 5)`

The idea is we can write any polynomial inductively by nesting the bigger terms as sums inside of multiplications against `x`

. We can encode the above as a `Horner`

like this:

`0 (PX 3 (PC 5)) PX `

and then reify the meaning of `Horner`

with respect to `A`

via `evaluate`

:

```
: {n : ℕ} → Horner n → A → A
evaluate (PC x) v = x
evaluate (PX x xs) v = x +A (v *A evaluate xs v) evaluate
```

We can define addition over `Horner`

terms, which is essentially `zipWith (+A)`

:

```
_+H_ : {m n : ℕ} → Horner m → Horner n → Horner (m ⊔ n)
_+H_ (PC x) (PC y) = PC (x +A y)
_+H_ (PC x) (PX y ys) = PX (x +A y) ys
_+H_ (PX x xs) (PC y) = PX (x +A y) xs
_+H_ (PX x xs) (PX y ys) = PX (x +A y) (xs +H ys)
```

We can also implement scalar transformations over `Horner`

, which is exactly a monomorphic `fmap`

:

```
: {m : ℕ} → (A → A) → Horner m → Horner m
scalMapHorner (PC x) = PC (f x)
scalMapHorner f (PX x xs) = PX (f x) (scalMapHorner f xs) scalMapHorner f
```

and finally, we can define multiplication over `Horner`

terms:

```
_*H_ : {m n : ℕ} → Horner m → Horner n → Horner (m + n)
_*H_ (PC x) y = scalMapHorner (x *A_) y
_*H_ (PX {m} x xs) (PC y) = scalMapHorner (_*A y) (PX x xs)
_*H_ (PX {m} x xs) yy =
(x *A_) yy +H PX #0 (xs *H yy) scalMapHorner
```

The first two cases here are straightforward, just `scalMapHorner`

-multiply in the constant value and go on your way. The `PX-PX`

case is rather complicated however, but corresponds to the `*-+-distrib`

law:

`*-+-distrib : ∀ x xs yy → (x + xs) * yy ≡ x * yy +A xs * yy`

We take advantage of the fact that we know `x`

is a scalar, by immediately multiplying it in via `scalMapHorner`

.

As alluded to earlier, all that’s left is to show `evaluate`

-homomorphisms for `+H`

/`+A`

and `*H`

/`*A`

:

```
+A-+H-homo: ∀ {m n} j k a
→ evaluate {m} j a +A evaluate {n} k a ≡ evaluate (j +H k) a
*A-*H-homo: ∀ {m n} j k a
→ evaluate {m} j a *A evaluate {n} k a ≡ evaluate (j *H k) a
```

There’s nothing interesting in these proofs, it’s just three hundred ironic lines of tedious, boring proofs, of the sort that we are trying to automate away.

Given these, we can implement `construct-is-normal`

```
construct-is-normal: {N : ℕ}
→ (x : Poly N)
→ (a : A)
→ construct x a ≡ evaluate (normalize x) a
(con x) a = refl
construct-is-normal = refl
construct-is-normal var a (x :+ y) a
construct-is-normal rewrite construct-is-normal x a
| construct-is-normal y a
| +A-+H-homo (normalize x) (normalize y) a
= refl
(x :* y) a
construct-is-normal rewrite construct-is-normal x a
| construct-is-normal y a
| *A-*H-homo (normalize x) (normalize y) a
= refl
```

Nice!

The homomorphism proofs are left as an exercise to the reader, or you can go look at the code if you want to skip doing it.

My implementation isn’t 100% complete, I still need to prove that `*H`

is commutative:

`: ∀ j k → j *H k ≡ k *H j *H-comm `

which shouldn’t be hard, because it *is* commutative. Unfortunately, Agda has gone into hysterics, and won’t even typecheck the type of `*H-comm`

, because it can’t figure out that `m + n = n + m`

(the implicit indices on the result of `*H`

). As far as I can tell, there is no easy fix here; there’s some weird `cong`

-like thing for types called `subst`

, but it seems to infect a program and push these weird-ass constraints everywhere.

This is extremely frustrating, because it’s literally the last thing to prove after 300 grueling lines of proof. And it’s also true and isn’t even hard to show. It’s just that I can’t get Agda to accept the type of the proof because it’s an idiot that doesn’t know about additive commutativity. After a few hours of fighting with getting this thing to typecheck, I just said fuck it and postulated `*H-comm`

.

Stupid Agda.

If you know what I’ve done wrong to deserve this sort of hell, please let me know. It would be nice to be able to avoid problems like this in the future, or resolve them with great ease.

So, that’s it! Modulo a postulate, we’ve managed to implement a ring-solver by showing the equivalence of three different representations of the same data. Just to convince ourselves that it works:

```
: Poly 2
test-a = (var :+ con #1) :* (var :+ con #1)
test-a
: Poly 2
test-b = var :* var :+ two :* var :+ con #1
test-b where
= con #1 :+ con #1
two
success: (x : A)
→ (x +A #1) *A (x +A #1) ≡ (x *A x) +A (#1 +A #1) *A x +A #1
= solve test-a test-b refl x success x
```

which Agda happily accepts!

I don’t exactly know offhand how to generalize this to multivariate polynomials, but I think the trick is to just find a normal form for them.

As usual, the code for this post is available on Github.

]]>So anyway, today we’re looking at codata. What’s that? Essentially, lazy records. By virtue of being lazy, Haskell makes the differentiation between data and codata rather hard to spot. The claim is that functional languages are big on data, object-oriented languages really like codata, and that everything you can do with one can be emulated by the other, which is useful if you’d like to compile FP to OOP, or vice versa.

Codata, like the name implies, have a lot of duals with regular ol’ data. The paper introduces a bunch of parallels between the two:

Data | Codata |
---|---|

Concerned with construction | Concerned with destruction |

Define the types of constructors | Define the types of destructors |

Directly observable | Observable only via their interface |

Common in FP | Common in OOP |

Initial algebras | Terminal coalgebras |

Algebraic data structures | Abstract data structures |

`data` |
`class` |

The paper’s claim is that codata is a very useful tool for doing real-world work, and that we are doing ourselves a disservice by not making it first-class:

While codata types can be seen in the shadows behind many examples of programming—often hand-compiled away by the programmer—not many functional languages have support for them.

That’s a particularly interesting claim; that we’re all already using codata, but it’s hidden away inside of an idiom rather than being a first-class citizen. I’m always excited to see the ghosts behind the patterns I am already using.

The paper gives a big list of codata that we’re all already using without knowing it:

Instead of writing

```
data Bool where
True :: Bool
False :: Bool
```

I can instead do the usual Church encoding:

```
Bool where
codata if :: Bool -> a -> a -> a
```

which I might express more naturally in Haskell via:

```
ifThenElse :: Bool -> a -> a -> a
True t _ = t
ifThenElse False _ f = f ifThenElse
```

(I suspect this is that “hand-compiling away” that the authors were talking about)

However, in the codata presentation, I can recover `true`

and `false`

by building specific objects that fiddle with their arguments just right (using copatterns from a few weeks ago):

```
True : Bool
if True t _ = t
False : Bool
if False _ f = f
```

That’s neat, I guess!

As a follow-up, we can try talking about `Tree`

s. Rather than the usual `data`

definition:

```
data Tree t where
Leaf :: t -> Tree t
Branch :: Tree t -> Tree t -> Tree t
walk :: (t -> a) -> (a -> a -> a) -> Tree t -> a
```

we can do it in codata:

```
Tree t where
codata walk :: Tree t -> (t -> a) -> (a -> a -> a) -> a
```

and reconstruct the “constructors:”

```
Leaf x :: t -> Tree t
Leaf t) mk _ = mk t
walk (
Branch :: Tree t -> Tree t -> Tree t
Branch l r) mk comb = comb (walk l mk comb) (walk r mk comb) walk (
```

The presentation in the paper hand-compiles `Tree!data`

into two declarations:

```
TreeVisitor t a where
codata visitLeaf :: TreeVisitor t a -> t -> a
{ visitBranch :: TreeVisitor t a -> a -> a -> a
,
}
Tree t where
codata walk :: Tree t -> TreeVisitor t a -> a
```

which is the same thing, but with better named destructors.

You know the problem. You’re programming some search, and want to have a stopping depth. Maybe you’re writing a chessai and don’t want to wait until the ends of time for the search to finish. Easy enough, right? Just add an integer that counts down whenever you recurse:

```
search :: Int -> Position -> [Position]
0 _ = []
search = -- do lots of work search n as
```

So you set `n`

to something that seems reasonable, and get your moves back. But then you realize you had more time to kill, so you’d like to resume the search where you left off. But there’s no good way to do this, and starting back from the beginning would involve wasting a lot of effort. You can certainly program around it, but again, it’s hand-compiling away codata.

Instead, we can express the problem differently:

```
Rose a where
codata node :: Rose a -> a
{ children :: Rose a -> [Rose a]
, }
```

Recall that codata is built-in lazy, so by repeatedly following `children`

we can further explore the tree state. In OOP I guess we’d call this a generator or an iterator or something. Probably a factory of some sort.

But once we have `Rose`

we can implement pruning:

```
prune :: Int -> Rose Position -> Rose Position
= node t
node (prune n t) 0 t) = []
children (prune = fmap (prune (n - 1)) $ children t children (prune n t)
```

I *really* like copattern matching.

You know how we have extentional and intentional definitions for sets? Like, compare:

```
newtype Set a = Set { unSet :: [a] }
lookup :: Set a -> a -> Bool
lookup s t = elem t $ unset s
```

vs

`newtype Set a = Set { lookup :: a -> Bool }`

That latter version is the Church-encoded version. Instead we can give an interface for both sorts of sets as codata, defined by their *interface* as sets. This is everyday OOP stuff, but a little weird in FP land:

```
Set a where
codata isEmpty :: Set a -> Bool
{ lookup :: Set a -> a -> Bool
, insert :: Set a -> a -> Set a
, union :: Set a -> Set a -> Set a
, }
```

My dudes this is just an interface for how you might want to interact with a Set. We can implement the listy version from above:

```
listySet :: [a] -> Set a
= null ls
isEmpty (listySet ls) lookup (listySet ls) a = elem a ls
= listSet (a : ls)
insert (listySet ls) a = foldr insert s ls union (listySet ls) s
```

but we can also implement an infinitely big set akin to our functiony-version:

```
evensUnion :: Set Int -> Set Int
= False
isEmpty (evensUnion s) lookup (evensUnion s) a = mod a 2 == 0 || lookup a s
= evensUnion $ insert s a
insert (evensUnion s) a = evensUnion $ union s s' union (evensUnion s) s'
```

This thing is a little odd, but `evensUnion`

is the set of the even numbers unioned with some other set. The built-in unioning is necessary to be able to extend this thing. Maybe we might call it a decorator pattern in OOP land?

One last example, using type indices to represent the state of something. The paper gives sockets:

```
data State = Raw | Bound | Live
type Socket :: State -> Type
Socket i where
codata bind :: Socket 'Raw -> String -> Socket 'Bound
{ connect :: Socket 'Bound -> Socket 'Live
, send :: Socket 'Live -> String -> ()
, recv :: Socket 'Live -> String
, close :: Socket 'Live -> ()
, }
```

The type indices here ensure that we’ve bound the socket before connecting to it, and connected to it before we can send or receive.

Contrast this against what we can do with GADTs, which tell us how something was built, not how it can be used.

Unsurprisingly, data and codata are two sides of the same coin: we can compile one to the other and vice versa.

Going from data to codata is giving a final encoding for the thing; as we’ve seen, this corresponds to the Boehm-Berarducci encoding. The trick is to replace the type with a function. Each data constructor corresponds to an argument of the function, the type of which is another function that returns `a`

, and as arguments takes each argument to the data constructor. To tie the knot, replace the recursive bits with `a`

.

Let’s take a look at a common type:

`data List a = Nil | Cons a (List a)`

We will encode this as a function, that returns some new type variable. Let’s call it `x`

:

`... -> x`

and then we need to give eliminators for each case:

`-> elim_cons -> x elim_nil `

and then replace each eliminator with a function that takes its arguments, and returns `x`

. For `Nil`

, there are no arguments, so it’s just:

`-> elim_cons -> x x `

and then we do the same thing for `Cons :: a -> List a -> List a`

:

`-> (a -> List a -> x) -> x x `

of course, there is no `List a`

type anymore, so we replace that with `x`

too:

`-> (a -> x -> x) -> x x `

And thus we have our codata-encoded list. For bonus points, we can do a little shuffling and renaming:

`-> b -> b) -> b -> b (a `

which looks very similar to our old friend `foldr`

:

`foldr :: (a -> b -> b) -> b -> [a] -> b`

In fact, a little more reshuffling shows us that `foldr`

is exactly the codata transformation we’ve been looking for:

`foldr :: [a] -> ((a -> b -> b) -> b -> b)`

Cool. The paper calls this transformation the “visitor pattern” which I guess makes sense; in order to call this thing we need to give instructions for what to do in every possible case.

This is an encoding of the type itself! But we also need codata encodings for the data constructors. The trick is to just ignore the “handlers” in the type that don’t correspond to your constructor. For example:

```
Nil :: (a -> b -> b) -> b -> b
Nil _ nil = nil
Cons
:: a
-> ((a -> b -> b) -> b -> b)
-> (a -> b -> b)
-> b
-> b
Cons head tail cons nil = cons nil (tail cons nil)
```

Really, these write themselves once you have an eye for them. One way to think about it is that the handlers are “continuations” for how you want to continue. This is the dreaded CPS transformation!

Let’s go the other way too. Appropriately, we can use codata streams:

```
Stream a where
codata head :: Stream a -> a
{ tail :: Stream a -> Stream a
, }
```

I’m winging it here, but it’s more fun to figure out how to transform this than to get the information from the paper.

The obvious approach here is to just turn this thing directly into a record by dropping the `Stream a ->`

part of each field:

```
data Stream a = Stream
head :: a
{ tail :: Stream a
, }
```

While this works in Haskell, it doesn’t play nicely with strict languages. So, we can just lazify it by sticking each one behind a function:

```
data Stream a = Stream
head :: () -> a
{ tail :: () -> Stream a
, }
```

Looks good to me. But is this what the paper does? It mentions that we can `tabulate`

a function, e.g., represent `Bool -> String`

as `(String, String)`

. It doesn’t say much more than this, but we can do our own research. Peep the `Representable`

class from adjunctions:

```
class Distributive f => Representable f where
type Rep f :: *
tabulate :: (Rep f -> a) -> f a
index :: f a -> Rep f -> a
```

This thing is exactly the transformation we’re looking for; we can “represent” some structure `f a`

as a function `Rep f -> a`

, and tabulating gets us back the thing we had in the first place.

So the trick here is then to determine `f`

for the `Rep f`

that corresponds to our `codata`

structure. Presumably that thing is exactly the record we worked out above.

What’s interesting about this approach is that it’s exactly scrap-your-typeclasses. And it’s exactly how typeclasses are implemented in Haskell. And last I looked, it’s the approach that Elm recommends doing instead of having typeclasses. Which makes sense why it’s annoying in Elm, because the language designers are forcing us to hand-compile our code! But I don’t need to beat that dead horse any further.

Something that piqued my interest is a quote from the paper:

Functional langauges are typically rich in data types … but a paucity of codata types (usually just function types.)

This is interesting, because functions are the only non-trivial source of contravariance in Haskell. Contravariance is the co- version of (the poorly named, IMO) covariance. Which is a strong suggestion that functions are a source of contravariance *because they are codata,* rather than contravariance being a special property of functions themselves.

I asked my super smart friend Reed Mullanix (who also has a great podcast episode), and he said something I didn’t understand about presheafs and functors. Maybe presheafs would make a good next paper.

This was a helpful paper to to wrap my head around all this codata stuff that smart people in my circles keep talking about. None of it is *new,* but as a concept it helps solidify a lot of disparate facts I had rattling around in my brain. Doing this final tagless encoding of data types gives us a fast CPS thing that is quick as hell to run because it gets tail-optimized and doesn’t need to build any intermediary data structures, and gets driven by its consumer. The trade-off is that CPS stuff is a damn mind-melter.

At Zurihac 2018, I met some guy (whose name I can’t remember, sorry!) who was working on a new language that supported this automatic transformation between data and codata. I don’t remember anything about it, except he would just casually convert between data and codata whenever was convenient, and the compiler would do the heavy lifting of making everything work out. It was cool. I wish I knew what I was talking about.

]]>I started by really trying to wrap my head around how exactly the `ana . cata`

pattern works. So I wrote out a truly massive number of trace statements, and stared at them until they made some amount of sense. Here’s what’s going on:

`ana`

takes an `a`

and unfolds it into an `F a`

, recursively repeating until it terminates by producing a non-inductive `F`

-term. So here `F`

is a `Sorted`

. And then we need to give a folding function for `cata`

. This fold happens in `Unsorted`

, and thus has type `Unsorted (Sorted (Mu Unsorted)) -> Sorted (Mu Unsorted)`

. The idea here is that the `cata`

uses its resulting `Sorted`

to pull forward the smallest element it’s seen so far. Once the `cata`

is finished, the `ana`

gets a term `Sorted (Mu Unsorted)`

, where the `Sorted`

term is the head of the synthesized list, and the `Mu Unsorted`

is the next “seed” to recurse on. This `Mu Unsorted`

is one element smaller than it was last time around, so the recursion eventually terminates.

OK, so that’s all well and good. But what does `ana . para`

do here? Same idea, except that the fold also gets a `Mu Unsorted`

term, corresponding to the unsorted tail of the list — aka, before it’s been folded by `para`

.

The paper doesn’t have much to say about `para`

:

in a paramorphism, the algebra also gets the remainder of the list. This extra parameter can be seen as a form of an as-pattern and is typically used to match on more than one element at a time or to detect that we have reached the final element.

That’s all well and good, but it’s unclear how this can help us. The difference between `naiveIns`

and `ins`

is:

```
naiveIns :: Ord a
=> Unsorted a (Sorted a x)
-> Sorted a (Unsorted a x)
UNil = SNil
naiveIns :> SNil) = a :! UNil
naiveIns (a :> b :! x)
naiveIns (a | a <= b = a :! b :> x
| otherwise = b :! a :> x
ins :: Ord a
=> Unsorted a (c, Sorted a x)
-> Sorted a (Either c (Unsorted a x))
UNil = SNil
ins :> (x, SNil)) = a :! Left x
ins (a :> (x, b :! x'))
ins (a | a <= b = a :! Left x
| otherwise = b :! Right (a :> x')
```

Ignore the `Left/Right`

stuff. The only difference here is whether we use `x`

or `x'`

in the last clause, where `x`

is the original, unsorted tail, and `x'`

is the somewhat-sorted tail. It’s unclear to me how this can possibly help improve performance; we still need to have traversed the entire tail in order to find the smallest element. Maybe there’s something about laziness here, in that we shouldn’t need to rebuild the tail, but we’re going to be sharing the tail-of-tail regardless, so I don’t think this buys us anything.

And this squares with my confusion last week; this “caching” just doesn’t seem to do anything. In fact, the paper doesn’t even say it’s caching. All it has to say about our original `naiveIns`

:

Why have we labelled our insertion sort as naïve? This is because we are not making use of the fact that the incoming list is ordered— compare the types of

`bub`

and`naiveIns`

. We will see how to capitalise on the type of`naiveIns`

in Section 5.

and then in section 5:

The sole difference between sel and bub (Section 3) is in the case where a 6 b:

`sel`

uses the remainder of the list, supplied by the paramorphism, rather than the result computed so far. This is why`para sel`

is the true selection function, and fold bub is the naïve variant, if you will.

OK, fair, that checks out with what came out of my analysis. The `ana . para`

version does use the tail of the original list, while `ana . cata`

uses the version that might have already done some shuffling. But this is work we needed to do anyway, and moves us closer to a sorted list, so it seems insane to throw it away!

The best argument I can come up with here is that the `ana . para`

version is dual to `cata . apo`

, which signals whether the recursion should stop early. That one sounds genuinely useful to me, so maybe the paper does the `ana . para`

thing just out of elegance.

Unfortunately, `cata . apo`

doesn’t seem to be a performance boost in practice. In fact, both `cata . ana`

and `ana . cata`

perform significantly better than `cata . apo`

and `ana . para`

. Even more dammingly, the latter two perform better when they ignore the unique abilities that `apo`

and `para`

provide.

Some graphs are worth a million words:

These are performance benchmarks for `-00`

, using `Data.List.sort`

as a control (“sort”). The big numbers on the left are the size of the input. “bubble” is the naive version of “selection.” Additionally, the graphs show the given implementations of `quickSort`

and `treeSort`

, as well as the two variations I was wondering about in the last post (here called `quickTree`

and `treeQuick`

.)

The results are pretty damming. In *all* cases, bubble-sort is the fastest of the algorithms presented in the paper. That’s, uh, not a good sign.

Furthermore, the “no caching” versions of “insertion” and “selection” both perform better than their caching variants. They are implemented by just ignoring the arguments that we get from `apo`

and `para`

, and simulating being `ana`

and `cata`

respectively. That means: whatever it is that `apo`

and `para`

are doing is *strictly worse* than not doing it.

Not a good sign.

But maybe this is all just a result of being run on `-O0`

. Let’s try turning on optimizations and seeing what happens:

About the same. Uh oh.

I don’t know what to blame this on. Maybe the constant factors are bad, or it’s a runtime thing, or I fucked up something in the implementation, or maybe the paper just doesn’t do what it claims. It’s unclear. But here’s my code, in case you want to take a look and tell me if I screwed something up. The criterion reports are available for `-O0`

and `-O2`

(slightly different than in the above photos, since I had to rerun them.)

Something that’s stymied me while working through *Sorting with Bialgebras* is that whatever it is we’re doing here, it’s not observable. All sorting functions are extentionally equal — so the work being done here is necessarily below the level of equality. This doesn’t jive well with how I usually think about programming, and has made it very hard for me to see exactly what the purpose of all of this is. But I digress.

Hinze et al. begin by showing us that insertion sort and bubble sort have terse implementations:

```
insertSort :: Ord a => [a] -> [a]
= foldr insert []
insertSort
selectSort :: forall a. Ord a => [a] -> [a]
= unfoldr select
selectSort where
select :: [a] -> Maybe (a, [a])
= Nothing
select [] =
select as let x = minimum as
= delete x as
xs in Just (x, xs)
```

and that there are two dualities here, `foldr`

is dual to `unfoldr`

, and `insert :: Ord a => a -> [a] -> [a]`

is dual to `select :: Ord a => [a] -> Maybe (a, [a])`

.

The rest of the paper is pulling on this thread to see where it goes. As a first step, it’s noted that `foldr`

and `unfoldr`

are hiding a lot of interesting details, so instead we will divide the sorting problem into two halves: a catamorphism to tear down the unsorted list, and an anamorphism to build up the sorted version.

Begin by defining `Mu`

and `Nu`

, which are identical in Haskell. The intention here is that we can tear down `Mu`

s, and build up `Nu`

s:

```
newtype Mu f = Mu { unMu :: f (Mu f) }
newtype Nu f = Nu { unNu :: f (Nu f) }
```

as witnessed by `cata`

and `ana`

:

```
cata :: Functor f => (f a -> a) -> Mu f -> a
= f . fmap (cata f) . unMu
cata f
ana :: Functor f => (a -> f a) -> a -> Nu f
= Nu . fmap (ana f) . f ana f
```

We’ll also need a pattern functor to talk about lists:

```
data ListF (t :: Tag) a k = Nil | a :> k
deriving (Eq, Ord, Show, Functor)
infixr 5 :>
```

This `Tag`

thing is of my own devising, it’s a phantom type to track whether or not our list is sorted:

```
data Tag = UnsortedTag | SortedTag
type Unsorted = ListF 'UnsortedTag
type Sorted = ListF 'SortedTag
```

Note that in Haskell, nothing ensures that `Sorted`

values are actually sorted! This is just some extra machinery to get more informative types.

With everything in place, we can now write the type of a sorting function:

`type SortingFunc a = Ord a => Mu (Unsorted a) -> Nu (Sorted a)`

that is, a sorting function is something that tears down an unsorted list, and builds up a sorted list in its place. Makes sense, and the extra typing helps us keep track of which bits are doing what.

Most of the paper stems from the fact that we can implement a `SortingFunc`

in two ways. We can either:

- write a
`cata`

that tears down the`Mu`

by building up a`Nu`

via`ana`

, or - write an
`ana`

that builds up the`Nu`

that tears down the`Mu`

via`cata`

Let’s look at the first case:

```
naiveInsertSort :: SortingFunc a
= cata $ ana _ naiveInsertSort
```

this hole has type

```
Unsorted a (Nu (Sorted a))
-> Sorted a (Unsorted a (Nu (Sorted a)))
```

which we can think of as having stuck an element on the front of an otherwise sorted list, and then needing to push that unsortedness one layer deeper. That does indeed sound like insertion sort: take a sorted list, and then traverse through it, sticking the unsorted element in the right place. It’s “naive” because the recursion doesn’t stop once it’s in the right place — since the remainder of the list is already sorted, it’s OK to stop.

The paper deals with this issue later.

Let’s write a function with this type:

```
naiveIns :: Ord a
=> Unsorted a (Nu (Sorted a))
-> Sorted a (Unsorted a (Nu (Sorted a)))
Nil = Nil
naiveIns :> Nu Nil) = a :> Nil
naiveIns (a :> Nu (b :> x))
naiveIns (a | a <= b = a :> b :> x
| otherwise = b :> a :> x
```

The first two cases are uninteresting. But the cons-cons case is — we need to pick whichever of the two elements is smaller, and stick it in front. In doing so, we have sorted the first element in the list, and pushed the unsortedness deeper.

This all makes sense to me. But I find the dual harder to think about. Instead of making a `cata . ana`

, let’s go the other way with an `ana . cata`

:

```
bubbleSort :: SortingFunc a
= ana $ cata _ bubbleSort
```

this hole now has type:

```
Unsorted a (Sorted a (Mu (Unsorted a)))
-> Sorted a (Mu (Unsorted a))
```

which is now an unsorted element in front of a sorted element, in front of the remainder of an unsorted list. What does it mean to be a single sorted element? Well, it must be the smallest element in the otherwise unsorted list. Thus, the smallest element in a list bubbles its way to the front.

On my first reading of this, I thought to myself “that sure sounds a lot like selection sort!” But upon close reading later, it’s not. Insertion sort knows where to put the smallest element it’s found, and does that in constant time. Bubble sort instead swaps adjacent elements, slowly getting the smallest element closer and closer to the front.

Let’s implement a function with this type:

```
bub :: Ord a
=> Unsorted a (Sorted a (Mu (Unsorted a)))
-> Sorted a (Mu (Unsorted a))
Nil = Nil
bub :> Nil) = a :> Mu Nil
bub (a :> b :> x)
bub (a | a <= b = a :> Mu (b :> x)
| otherwise = b :> Mu (a :> x)
```

While `naiveIns`

pushes unsorted elements inwards, `bub`

pulls sorted elements outwards. But, when you look at the implementations of `bub`

and `naiveIns`

, they’re awfully similar! This is the main thrust of the paper — we can factor out a common core of `naiveIns`

and `bub`

:

```
swap :: Ord a
=> Unsorted a (Sorted a x)
-> Sorted a (Unsorted a x)
Nil = Nil
swap :> Nil) = a :> Nil
swap (a :> b :> x)
swap (a | a <= b = a :> b :> x
| otherwise = b :> a :> x
```

It wasn’t immediately clear to me why this works, since the types of `bub`

and `ins`

seem to be more different than this. But when we compare them, this is mostly an artifact of the clunky fixed-point encodings:

```
-- type of bub
Unsorted a (Sorted a (Mu (Unsorted a)))
-> Sorted a (Mu (Unsorted a))
-- unroll a Mu:
Unsorted a (Sorted a (Mu (Unsorted a)))
-> Sorted a (Unsorted a (Mu (Unsorted a)))
-- let x ~ Mu (Unsorted a)
Unsorted a (Sorted a x)
-> Sorted a (Unsorted a x)
-- let x ~ Nu (Sorted a)
Unsorted a (Sorted a (Nu (Sorted a))
-> Sorted a (Unsorted a (Nu (Sorted a)))
-- unroll a Nu
Unsorted a (Sorted a (Nu (Sorted a)))
-> Sorted a (Unsorted a (Nu (Sorted a)))
-- type of naiveIns
```

The only difference here is we are no longer packing `Mu`

s and unpacking `Nu`

s. We can pull that stuff out:

```
bubbleSort'' :: SortingFunc a
= ana $ cata $ fmap Mu . swap
bubbleSort''
naiveInsertSort'' :: SortingFunc a
= cata $ ana $ swap . fmap unNu naiveInsertSort''
```

and thus have shown that `bubbleSort''`

and `naiveInsertSort''`

are duals of one another.

Allegedly, this stuff is all “just a bialgebra.” So, uh, what’s that? The authors draw a bunch of cool looking commutative diagrams that I would love to try to prove, but my attempts to do this paper in Agda were stymied by `Mu`

and `Nu`

being too recursive. So instead we’ll have to puzzle through it like peasants instead.

The universal mapping property of initial algebras (here, `Mu`

) is the following:

`. Mu = f . fmap (cata f) cata f `

and dually, for terminal coalgebras (`Nu`

):

`. ana f = fmap (ana f) . f unNu `

Let’s work on the `cata`

diagram, WLOG. This UMP gives us:

```
fmap (cata bub)
Unsorted (Mu Unsorted) ---------> Unsorted (Sorted (Mu Unsorted))
| |
Mu | | bub
v v
Mu Unsorted ----------------------> Sorted (Mu Unsorted)
cata bub
```

but as we saw in `bubbleSort''`

, `bub = fmap Mu . swap`

, thus:

```
fmap (cata bub)
Unsorted (Mu Unsorted) ---------> Unsorted (Sorted (Mu Unsorted))
| |
| | swap
| v
Mu | Sorted (Unsorted (Mu Unsorted))
| |
| | fmap Mu
v v
Mu Unsorted ----------------------> Sorted (Mu Unsorted)
cata bub
```

If we let `c = cata bub`

and `a = Mu`

, this diagram becomes

```
fmap c
Unsorted (Mu Unsorted) ---------> Unsorted (Sorted (Mu Unsorted))
| |
| | swap
| v
a | Sorted (Unsorted (Mu Unsorted))
| |
| | fmap a
v v
Mu Unsorted ----------------------> Sorted (Mu Unsorted)
c
```

and allgedly, this is the general shape of an `f`

-*bialgebra*:

`. a = fmap a . f . fmap c c `

where `a : forall x. F x -> x`

and `c : forall x. x -> G x`

, thus `f : forall x. F (G x) -> G (F x)`

. In Agda:

```
record Bialgebra
{F G : Set → Set}
{F-functor : Functor F}
{G-functor : Functor G}
(f : {X : Set} → F (G X) → G (F X)) : Set where
field
: {X : Set} → F X → X
a : {X : Set} → X → G X
c
bialgebra-proof: {X : Set}
→ c {X} ∘ a ≡ map G-functor a ∘ f ∘ map F-functor c
```

where we can build two separate `Bialgebra swap`

s:

```
: Bialgebra swap
bubbleSort .a bubbleSort = cata bub
Bialgebra.c bubbleSort = Mu
Bialgebra.bialgebra-proof bubbleSort = -- left as homework Bialgebra
```

and

```
: Bialgebra swap
naiveInsertSort .a naiveInsertSort = unNu
Bialgebra.c naiveInsertSort = ana bub
Bialgebra.bialgebra-proof naiveInsertSort = -- left as homework Bialgebra
```

I’m not entirely confident about this, since as said earlier, I don’t have this formalized in Agda. It’s a shame, because this looks like it would be a lot of fun to do. We’re left with a final diagram, equaqting `cata (ana naiveIns)`

and `ana (cata bub)`

:

```
?fmap (cata (ana naiveIns))?
Unsorted (Mu Unsorted) - - - - -> Unsorted (Nu Sorted)
| |
Mu | | ana naiveIns
v cata (ana naiveIns) v
Mu Unsorted - - - - - -|| - - - - -> Nu Sorted
| ana (cata bub) |
cata bub | | unNu
v v
Sorted (Mu Unsorted) - - - - - -> Sorted (Nu Sorted)
?fmap (ana (cata bub)?
```

The morphisms surrounded by question marks aren’t given in the paper, but I’ve attempted to fill them in. The ones I’ve given complete the square, but they’re the opposite of what I’d expect from the initial algebra / terminal coalgebra UMPs. This is something to come back to, I think, but is rather annoying since Agda would just tell me the damn answer.

Standard recursion scheme machinery:

```
para :: Functor f => (f (Mu f, a) -> a) -> Mu f -> a
= f . fmap (id &&& para f) . unMu
para f
apo :: Functor f => (a -> f (Either (Nu f) a)) -> a -> Nu f
= Nu . fmap (either id (apo f)) . f apo f
```

The idea is that `para`

s can look at all the structure that hasn’t yet been folded, while `apo`

s can exit early by giving a `Left`

.

The paper brings us back to insertion sort. Instead of writing the naive version as a `cata . ana`

, we will now try writing it as a `cata . apo`

. Under this new phrasing, we get the type:

```
Unsorted a (Nu (Sorted a))
-> Sorted a (Either (Nu (Sorted a))
Unsorted a (Nu (Sorted a)))) (
```

which is quite a meaningful type. Now, our type can signal that the resuling list is already sorted all the way through, or that we had to push an unsorted value inwards. As a result, `ins`

looks exactly like `bub`

, except that we can stop early in most cases, safe in the knowledge that we haven’t changed the sortedness of the rest of the list. The `b < a`

case is the only one which requires further recursion.

```
ins :: Ord a
=> Unsorted a (Nu (Sorted a))
-> Sorted a (Either (Nu (Sorted a))
Unsorted a (Nu (Sorted a))))
(Nil = Nil
ins :> Nu Nil) = a :> Left (Nu Nil)
ins (a :> Nu (b :> x))
ins (a | a <= b = a :> Left (Nu (b :> x)) -- already sorted!
| otherwise = b :> Right (a :> x)
```

Let’s think now about selection sort. Which should be an `ana . para`

by duality, with the resulting type:

```
Unsorted a ( Mu (Unsorted a)
Sorted a (Mu (Unsorted a))
,
)-> Sorted a (Mu (Unsorted a))
```

It’s much harder for me to parse any sort of meaning out of this type. Now our input has both all the unsorted remaining input, as well as a single term bubbling up. I actually can’t figure out how this helps us; presumably it’s something about laziness and not needing to do something with the sorted side’s unsorted tail? But I don’t know. Maybe a reader can drop a helpful comment.

Anyway, the paper gives us `sel`

which implements the type:

```
sel
:: Ord a
=> Unsorted a ( Mu (Unsorted a)
, Sorted a (Mu (Unsorted a))
)
-> Sorted a (Mu (Unsorted a))
sel Nil = Nil
sel (a :> (x, Nil)) = a :> x
sel (a :> (x, b :> x'))
| a <= b = a :> x
| otherwise = b :> Mu (a :> x')
```

Getting an intution here as to why the `otherwise`

case uses `x'`

instead of `x`

is an exercise left to the reader, who can hopefully let me in on the secret.

As before, we can pull a bialgebra out of `ins`

and `sel`

. This time, the input side uses the `(,)`

, and the output uses `Either`

, and I suppose we get the best of both worlds: early stopping, and presumably whatever caching comes from `(,)`

:

```
swop :: Ord a
=> Unsorted a (x, Sorted a x)
-> Sorted a (Either x (Unsorted a x))
Nil = Nil
swop :> (x, Nil)) = a :> (Left x)
swop (a :> (x, (b :> x')))
swop (a | a <= b = a :> Left x
| otherwise = b :> Right (a :> x')
```

This time our bialgebras are:

```
: Bialgebra swop
insertSort .a insertSort = apo ins
Bialgebra.c insertSort = id &&& unNu
Bialgebra.bialgebra-proof insertSort = -- left as homework Bialgebra
```

and

```
: Bialgebra swop
selectSort .a selectSort = para sel
Bialgebra.c selectSort = either id Mu
Bialgebra.bialgebra-proof selectSort = -- left as homework Bialgebra
```

Lots of the same techniques, and I’m running out of time, so we’ll go quickly. The key insight thus far is that select sort and insert sort both suck. How do we go faster than $O(n^2)$? Quicksort, and Treesort!

What’s interesting to me is I never considered Quicksort to be a tree-sorting algorithm. But of course it is; it’s recursively dividing an array in half, sorting each, and then putting them back together. But that fact is obscured by all of this “pivoting” nonsense; it’s just a tree algorithm projected onto arrays.

Hinze et al. present specialized versions of Quicksort and Treesort, but we’re just going to skip to the bialgebra bits:

```
data Tree a k = Empty | Node k a k
deriving (Eq, Ord, Show, Functor)
sprout :: Ord a
=> Unsorted a (x, Tree a x)
-> Tree a (Either x (Unsorted a x))
Nil = Empty
sprout :> (x, Empty)) = Node (Left x) a (Left x)
sprout (a :> (x, (Node l b r)))
sprout (a | a <= b = Node (Right (a :> l)) b (Left r)
| otherwise = Node (Left l) b (Right (a :> r))
```

This is the creation of a binary search tree. `Left`

trees don’t need to be manipulated, and `Right`

ones need to have the new unsorted element pushed down. The other half of the problem is to extract elements from the BST:

```
wither :: Tree a (x, Sorted a x)
-> Sorted a (Either x (Tree a x))
Empty = Nil
wither Node (_, Nil) a (r, _)) = a :> Left r
wither (Node (_, b :> l') a (r, _)) = b :> Right (Node l' a r) wither (
```

I think I understand what’s going on here. We have a tree with nodes `a`

and “subtrees” `Sorted a x`

, where remember, `x`

ties the knot. Thus, in the first level of the tree, our root node is the pivot, and then the left “subtree” is the subtree itself, plus a view on it corresponding to the smallest element in it. That is, in `(x, Sorted a x)`

, the `fst`

is the tree, and the `snd`

is the smallest element that has already been pulled out.

So, if we have a left cons, we want to return that, since it’s necessarily smaller than our root. But we continue (via `Right`

) with a new tree, using the same root and right sides, letting the recursion scheme machinery reduce that into its smallest term.

But I must admit that I’m hand-waving on this one. I suspect better understanding would come from getting better intutions behind `para`

and `apo`

.

Let’s tie things off then, since I’ve clearly hit my limit of understanding on this paper for this week. While having a deadline is a nice forcing function to actually go through papers, it’s not always the best for deeply understanding them! Alas, something to think about for the future.

We’re given two implementations of `grow`

:

```
grow' :: Mu Unsorted -> Nu Tree
grow,= ana . para $ fmap (either id unNu) . sprout
grow = cata . apo $ sprout . fmap (id &&& Mu) grow'
```

as well as two for `flatten`

:

```
flatten' :: Mu Tree -> Nu Sorted
flatten,= cata . apo $ wither . fmap (id &&& unNu)
grow = ana . para $ fmap (either id Mu) . wither grow'
```

and then finally, give us `quickSort`

and `treeSort`

:

```
treeSort :: SortingFunc a
quickSort,= flatten . downcast . grow
quickSort = flatten' . downcast . grow' treeSort
```

where `downcast`

was given earlier as:

```
downcast :: Functor f => Nu f -> Mu f
= Mu . fmap downcast . unNu downcast
```

This is interesting, but comes with an obvious questions: what if we intermix `flatten`

with `grow'`

, and vice versa? Rather unexcitingly, they still sort the list, and don’t seem to have different asymptotics. As a collorary, we must thus be excited, and assume that these are two “new” sorting functions, at least, ones without names. I guess that’s not too surprising; there are probably infinite families of sorting functions.

What a fun paper! I did a bad thing by jumping into Agda too quickly, hoping it would let me formalize the “this is a sorted list” stuff. But that turned out to be premature, since the `Sorted`

wrapper is only ever a pair, and exists only to signal some information to the reader. Thus, I spent six hours working through the Agda stuff before realizing my deadline was coming up sooner than later.

Implicit in that paragraph is that I started implementing before I had read through the entire paper, which was unwise, as it meant I spent a lot of time on things that turned out to be completely unrelated. Note to self to not do this next time.

Also, it turns out I’m not as firm on recursion schemes as I thought! It’d be valuable for me to go through `para`

s in much more depth than I have now, and to work harder at following the stuff in this paper. How do the authors keep everything straight? Do they just have more experience, or are they using better tools than I am?