Week three, and this update is coming in hot, a whole day early! This week I worked on the ring solving chapter, realizing that I can make a very much non-toy solver, and pack it into a chapter. We now build a multivariate semiring solver, discuss how and why it works, and then do some dependent-type shenanigans to put a delightful user interface in front of the whole thing.

In addition, it came with some excellent opportunities to discuss where semantics come from, and let me talk about homomorphisms earlier than I was otherwise hoping to.

My plan for the week was to tackle the remainder of the setoids chapter, but setoids are awful and it’s hard to motivate myself to do that, since I avoid using them in my day-to-day life whenever I can. Which is always. We’ll see what happens with this chapter, but maybe it’ll get melted down into something else. Nevertheless, understanding setoids *is* important for actually doing anything with the stdlib, so I dunno.

On the typesetting front, I spent an hour today fighting with Latex trying to ensure that it has glyphs for every unicode character in the book. I’ve got all but one of them sorted out now, and in the process, learned way more about Latex than any human should need to know.

The plan for next week is to cleanup the extremely WIP backmatter chapters. There’s a bunch of crap in there about me trying to do math math and failing, because math math doesn’t give two sniffs about constructability, and so none of it works out. If I’m feeling particularly plucky, I might try my hand at defining the reals, just because it might be fun.

As of today’s update, the book is now 360 pages long! I estimate it’ll be about 450 when it’s done, so we’re clearly making progress.

Anyway, that’s all for today. If you’ve already bought the book, you can get the updates for free on Leanpub. If you haven’t, might I suggest doing so? Your early support and feedback helps inspire me and ensure the book is as good as it can possibly be.

]]>It’s week two of regular updates on Certainty by Construction, baby! This week I made 17 commits to the repository, half of which were towards the goal of improving the book’s typesetting. Spurred on by a bug report asking “what the hell does `AgdaCmd:MakeCase`

mean?” I decided to upgrade the book’s build system. Now you should see explicit keystrokes to press when the book asks you to run a command alongside.

You’ll also notice intra-prose syntax highlighting, meaning that if the book mentions a type, it will now be presented in a beautiful blue, among other things in other colors. Agda has some janky support for this, but I couldn’t get it working, which means I annotated each and every piece of syntax highlighting by hand. Please file a bug if you notice I’ve missed any.

Content-wise, the old chapter on “structured sets” has become “relations”, and it has several new sections fleshing out the idea and giving several more examples. I’m now in the middle of rewriting the setoids chapter, but it too has three new sections, and thus the whole thing is no longer *all* about modular arithmetic.

Next week I’m going to continue powering on with the setoids chapter—including a big digression on what congruence entails under a setoid—and then I think I’ll tackle the ring solving chapter.

For the first time, this book seems like I might not be working on it for the rest of my life. It’s nowhere near done, but the topic and style are finally hashed out, and the content is mostly in an alpha state. From here it’s really just to continue grinding, rewriting all the crap bits over and over again, until they’re no longer crap.

Anyway, that’s all for today. If you’ve already bought the book, you can get the updates for free on Leanpub. If you haven’t, might I suggest doing so? Your early support and feedback helps inspire me and ensure the book is as good as it can possibly be.

]]>As part of a new ~quarterly goal, I’m going to be publishing updates to Certainty by Construction every Friday. This is for a few reasons: one, things get done much more quickly when you’re not doing them in private; two, relatedly, it’s good to get some exposure here and keep myself accountable.

Anyway, there are 26 new pages since last week, although a good deal of that is code without any prose around it yet. I’m in the process of cannibalizing the sections on relations and setoids into a single chapter. It’s a discussion of mathematical relations, their properties, an several examples. We explore different pre-orders, partial orders and total orders, and have a length digression about effectively designing indices for `data`

types.

This last point arose from me spending a few hours trying to work out under which circumstances exactly Agda gets confused about whether or not a computing index will give rise to a constructor. My findings are that it’s not really about computing indices, so much as it is about Agda running out of variables in which it can pack constraints. I suspect this knowledge can be exploited to make more interesting constructors than I thought possible, but I haven’t worked out how to do it yet.

I’ve also been working on how to simplify some bigger setoid proofs, where you have a bunch of equational reasoning you’d like to do under congruence. The folklore on this is generally to introduce a lemma somewhere else, but this has always struck me as a disappointing solution. Modulo the concrete syntax, this seems to work pretty well:

```
_≈nested_[_]_
: A
→ {f : A → A}
→ (cong : {x y : A} → x ≈ y → f x ≈ f y)
→ {x y z : A}
→ x IsRelatedTo y
→ f y IsRelatedTo z
→ f x IsRelatedTo z
_ ≈nested cong [ relTo x=y ] (relTo fy=z)
= relTo (trans (cong x=y) fy=z)
infixr 2 _≈nested_[_]_
```

which lets you focus in on a particular sub-expression, and use a new equational reasoning block to rewrite that, before popping your results back to the full expression. As an example:

```
((a *H c) *x+ 0#) +H b *S c +H d *S a ⌋ * x + b * d
⌊ (+-congʳ ∘ *-congʳ) [ -- focus on subexpr
≈nested ((a *H c) *x+ 0#) +H b *S c +H d *S a ⌋
⌊ (((a *H c) *x+ 0#) +H b *S c) (d *S a) x ⟩
≈⟨ +H-+-hom ((a *H c) *x+ 0#) +H b *S c ⌋ + ⟦ d *S a ⌋
⌊(+H-+-hom ((a *H c) *x+ 0#) (b *S c) x) ⟩
≈⟨ +-congʳ
⌊ a *H c ⌋ * x + 0# + ⌊ b *S c ⌋ + ⌊ d *S a ⌋
≈⟨ …via… *S-*-hom ⟩(b * ⌊ c ⌋) + (d * ⌊ a ⌋)
⌊ a *H c ⌋ * x + (+-congʳ (*-congʳ (*H-*-hom a c x))) ⟩
≈⟨ +-congʳ
⌊ a ⌋ * ⌊ c ⌋ * x + b * ⌊ c ⌋ + d * ⌊ a ⌋-- pop back
∎ ] (⌊ a ⌋ * ⌊ c ⌋ * x + b * ⌊ c ⌋ + d * ⌊ a ⌋) * x + (b * d)
```

The attentive reader here will notice that I have also clearly been improving the chapter on ring solving. Maybe I’m just better at proofs these days, but the whole thing feels much less challenging than my first few times looking at it.

Anyway, that’s all for today. If you’ve already bought the book, you can get the updates for free on Leanpub. If you haven’t, might I suggest doing so? Your early support and feedback helps inspire me and ensure the book is as good as it can possibly be.

]]>It is widely acknowledged that the languages you speak shape the thoughts you can think; while this is true for natural language, it is doubly so in the case of programming languages. And it’s not hard to see why; while humans have dedicated neural circuitry for natural language, it would be absurd to suggest there is dedicated neural circuitry for fiddling around with the semantics of pushing around arcane symbol abstractly encoded as electrical potentials over a conductive metal.

Because programming—and mathematics more generally—does not come easily to us humans, it can be hard to see the forest for the trees. We have no built-in intuition as to what should be possible, and thus, this intuition is built by observing the artifacts created by more established practitioners. In these more “artificial” of human endeavors, newcomers to the field are truly constructivists—their methods for practicing the art are shaped only by their previously-observed patterns. Because different programming languages support different features and idioms, the imaginable shape of what programming *is* must be shaped by the languages we understand.

In a famous essay, “Beating the Averages,” Paul Graham points out the so-called *Blub paradox.* This, Graham says, is the ordering of programming languages by powerfulness; a programmer who thinks in a middle-of-the-road language along this ordering (call it Blub) can identify less powerful languages, but not those which are more powerful. The idea rings true; one can arrange languages in power by the features they support, and subsequently check to see if a language supports all the features felt to be important. If it doesn’t, it must be less powerful. However, this technique doesn’t work to identify more powerful languages—at best, you will see that the compared language supports all the features you’re looking for, but you don’t know enough to ask for more.

More formally, we can describe the Blub paradox as a semi-decision procedure. That is, given an ordering over programming languages (here, by “power”,) we can determine whether a language is less than our comparison language, but not whether it is more than. We can determine when the answer is definitely “yes,” but, not when it is “no!”

Over two decades of climbing this lattice of powerful languages, I have come to understand a lesser-known corollary of the Blub paradox, coining it the *Co-Blub paradox*^{1}. This is the observation that knowledge of lesser languages is *actively harmful* in the context of a more powerful language. The hoops you unwittingly jumped through in Blub due to lacking feature X are *anti-patterns* in the presence of feature X. This is obviously true when stated abstractly, but insidious when one is in the middle of it.

Let’s look at a few examples over the ages, to help motivate the problem before we get into our introspection proper. In the beginning, people programmed directly in machine code. Not assembly, mind you, but in raw binary-encoded op-codes. They had a book somewhere showing them what bits needed to be set in order to cajole the machine into performing any given instruction. Presumably if this were your job, you’d come to memorize the bit patterns for common operations, and it wouldn’t be nearly as tedious as it seems today.

Then came assembly languages, which provided human-meaningful mnemonics to the computer’s opcodes. No longer did we need to encode a jump as `11111000110000001100`

— now it was `jl 16`

. Still mysterious, to be sure, but significant gains are realized in legibility. When encoded directly in machine code, programs were, for the most part, write-only. But assembly languages don’t come for free; first you need to write an assembler: a program that reads the mnemonics and outputs the raw machine code. If you were already proficient writing machine code directly, you can imagine the task of implementing an assembler to feel like make work—a tool to automate a problem you don’t have. In the context of the Co-Blub paradox, knowing the direct encodings of your opcodes is an anti-pattern when you have an assembly language, as it makes your contributes inscrutable among your peers.

Programming directly in assembly eventually hit its limits. Every computer had a different assembly language, which meant if you wanted to run the same program on a different computer you’d have to completely rewrite the whole thing; often needing to translate between extremely different concepts and limitations. Ignoring a lot of history, C came around with the big innovation that software should be portable between different computers: the same C program should work regardless of the underlying machine architecture. If you were an assembly programmer, you ran into the anti-pattern that while you could squeeze more performance and perform clever optimizations if you were aware of the underlying architecture, this fundamentally limited you *to that platform.*

By virtue of being, in many ways, a unifying assembly language, C runs very close to what we think of as “the metal.” Although different computer architectures have minor differences in registers and ways of doing things, they are all extremely similar variations on a theme. They all expose storable memory indexed by a number, operations for performing basic logic and arithmetic tasks, and means of jumping around to what the computer should consider to be the next instruction. As a result, C exposes this abstraction of what a computer *is* to its programmers, who are thus required to think about mutable memory and about how to encode complicated objects as sequences of bytes in that memory. But then came Java, whose contribution to mainstream programming was to popularize the idea that memory is cheap and abundant, and thus OK to waste some in order to alleviate the headache of needing to track it all yourself. As a C programmer coming to Java, you must unlearn the idea that memory is sacred and scarce, that you can do a better job of keeping track of it than the compiler can, and, hardest of all, that it is an important thing to track in the first place.

There is a clear line of progression here; as we move up the lattice of powerful languages, we notice that more and more details of what we thought were integral parts of programming turn out to be not particularly relevant to the actual task at hand. However, the examples thus discussed are already known to the modern programmer. Let’s take a few steps further, into languages deemed esoteric in the present day. It’s easy to see and internalize examples from the past, but those staring us in the face are much more difficult to spot.

Compare Java then to Lisp, which—among many things—makes the argument that functions, and even *programs themselves,* are just as meaningful objects as are numbers and records. Where Java requires the executable pieces to be packaged up and moved around with explicit dependencies on the data it requires, Lisp just lets you write and pass around functions, which automatically carry around all the data they reference. Java has a *design pattern* for this called the “command pattern,” which requires much ado and ink to be spilled, while in Lisp it just works in a way that is hard to understand if you are used to thinking about computer programs as static sequences of instructions. Indeed, the command pattern is bloated and ultimately unnecessary in Lisp, and practitioners must first unlearn it before they can begin to see the beauty of Lisp.

Haskell takes a step further than Lisp, in that it restricts when and where side-effects are allowed to occur in a program. This sounds like heresy (and feels like it for the first six months of programming in Haskell) until you come to appreciate that *almost none* of a program needs to perform side-effects. As it happens, side-effects are the only salient observation of the computer’s execution model, and by restricting their use, Haskell frees its programmers from needing to think about how the computer will execute their code—promising only that it will. As a result, Haskell code looks much more like mathematics than it looks like a traditional computer program. Furthermore, by abstracting away the execution model, the runtime is free to parallelize and reorder code, often even eliding unnecessary execution altogether. The programmer who refuses to acknowledge this reality and insists on coding with side-effects pays a great price, both on the amount of code they need to write, in its long-term reusability, and, most importantly, in the correctness of their computations.

All of this brings us to Agda, which is as far as I’ve gotten along the power lattice of programming languages. While Agda looks a great deal like Haskell, its powerful typesystem allows us to articulate many invariants that are impossible to write down in other languages. It’s tempting to think about Agda as Haskell-but-with-better-types, but this is missing the point. Agda’s type system is so precise we can *prove* that our solutions are correct, which alleviates the need to actually *run* the subsequent programs. In essence, programming in Agda abstracts away the notion of execution entirely. Following our argument about co-Blub programmers, they will come to Agda with the anti-pattern that thinking their hard-earned, battle-proven programming techniques for wrangling runtime performance will come in handy. But this is not the case; most of the techniques we have learned and consider “computer science” are in fact *implementation ideas:* that is, specific realizations from infinite classes of solutions, chosen not for their simplicity or clarity, but for their *efficiency.*

Thus, the process of learning Agda, in many ways, is learning to separate the beautiful aspects of problem solving from the multitude of clever hacks we have accumulated over the years. Much like the fish who is unable to recognize the ubiquitous water around him, as classically-trained programmers, it is nigh-impossible to differentiate the salient points from the implementation details until we find ourselves in a domain where they do not overlap. Indeed, in Agda, you will often feel the pain of having accidentally conflated the two, when your proofs end up being much more difficult than you feel they deserve. Despite the pain and the frustration, this is in fact a feature, and not a bug. It is a necessary struggle, akin to the type-checker informing you that your program is wrong. While it can be tempting to blame the tool, the real fault is in the workmanship.

Although precisely speaking, the name should be the co-(Blub paradox), as the corollary applies to the paradox as a whole, not only the Blub piece. Alas, such is an awkward construction in English, and thus we will not use it.↩︎

At work I was recently tasked with figuring out what API calls our program makes, and more interestingly, which code-paths lead to those API calls. Determining this by hand is tedious and error-prone, and worse, doesn’t stay up to date with code changes. Instead, let’s see how we can use the type system to eliminate the pain.

The existing code was organized around a class `HasAPI`

that looks something like this:

```
type HasAPI :: Service -> Symbol -> Constraint
class HasAPI srv name where
type APICall srv name
callAPI :: APICall srv name
```

Here, `HasAPI`

is a type class with an associated type family `APICall`

which gives the type for making the call. For example, there might be an instance:

```
instance HasAPI ShoutService "shout" where
type APICall ShoutService "shout" = String -> IO String
= pure $ fmap toUpper str callAPI str
```

This is a silly example — the real codebase makes actual API calls — but it serves for demonstration.

Our goal is to document every codepath that makes any use of `callAPI`

, in some sense, “infecting” every path with some marker of that fact. This is a common experience to Haskell programmers; in fact, `IO`

has this same pattern of infectiousness. Whenever you make a function perform IO, every type in the callstack needs to document the fact it performs `IO`

. This is the inspiration we will take, except that changing types is extremely expensive. What if we pushed a constraint around instead?

The trick is to define a new class, of the same shape as `HasAPI`

:

```
type CallsAPI :: Service -> Symbol -> Constraint
class CallsAPI srv name
```

but crucially, we give `CallsAPI`

*no instances.* On first blush, this seems insane: why introduce a class with no methods and no instances? Having no methods means it can’t do anything useful. Having no instances means GHC can never eliminate the constraint, and thus must propagate it upwards. This is the infectiousness we want; any function which makes an API call must document that fact in its type — failure to do so will result in GHC failing to compile with the message `No instance for (CallsAPI srv name)`

.

The trick now is to ensure that `callsAPI`

produces a `CallsAPI`

constraint. The easy way to do this is a little renaming to ensure existing polymorphic code continues work:

```
type UnsafeHasAPI :: Service -> Symbol -> Constraint
class UnsafeHasAPI srv name where
type APICall srv name
unsafeCallAPI :: APICall srv name
type HasAPI :: Service -> Symbol -> Constraint
type HasAPI = (UnsafeHasAPI srv name, CallsAPI srv name)
callAPI :: forall srv name
. HasAPI srv name
=> APICall srv name
= unsafeCallAPI callAPI
```

Any code written against the old `HasAPI`

constraint will continue to work (modulo the instance definitions,) but concrete calls to `callAPI`

now result in a dangling, unsatisfiable `CallsAPI`

constraint. You’ll need to go through the codebase now, and document every transitive call to the API with matching `CallsAPI`

constraints. Thankfully, HLS can help with this task: it will underline the missing cases, and suggest a code action that will automatically add these constraints to the type. Rinse and repeat, until every code path is documented.

Great success! We have automatically found every codepath that makes an API call, and forced them to document that fact. Better yet, we have solved the problem once and for all; our coworkers also must document any new API calls they make, lest their code not compile. It seems like we’re done!

Except for one fact: GHC will rudely refuse to compile our project, even if we correctly track all of our API calls. The problem of course, is that all we have managed to do is force `main`

to collect every `CallsAPI`

constraint. But GHC will still complain `No instance for (CallsAPI srv name)`

. Of course, you could just give an orphan instance in the same module that defines `main`

, which would work, but this doesn’t give you any sort of *external documentation.* It’s nice when you read the code, but it doesn’t help the business people.

A better approach here is to selectively solve the `CallsAPI`

constraints, which we can do with some Haskell dark magic. The `Dict`

type captures a constraint, giving us a convenient way to manipulate constraints:

```
type Dict :: Constraint -> Type
data Dict c where
Dict :: c => Dict c
```

We can write an eliminator to bring the `c`

from a `Dict c`

into scope, which, importantly, allows us to solve otherwise-unsolved constraints:

```
:: (c => r) -> Dict c -> r
(\\)Dict = f f \\
```

If we can get our hands on a `Dict (CallsAPI Srv Name)`

, we can use `(\\)`

to convince GHC to compile our program.

GHC is happy to give us dictionaries for constraints it knows about:

```
showIntDict :: Dict (Show Int)
= Dict showIntDict
```

but unfortunately, refuses to give us dictionaries for unsolved constraints:

```
callsAPIDict :: forall srv name. Dict (CallsAPI srv name)
= Dict
callsAPIDict
-- Error: No instance for (CallsAPI srv name)
```

It seems like we’re just as stuck, but we have a trick up our sleeve. The first step is to define another class with an instance in scope. GHC will happily give us a dictionary for such a thing:

```
class Trivial
instance Trivial
trivialDict :: Dict Trivial
= Dict trivialDict
```

and now for something naughty:

```
callsAPIDict :: forall srv name. Dict (CallsAPI srv name)
= unsafeCoerce trivialDict callsAPIDict
```

Behind the scenes, GHC compiles classes into records, instances into values of these records, and replaces wanted constraints with function arguments taking those records. By ensuring that `Trivial`

and `CallsAPI`

are both empty classes, with no methods or super-classes, we can be certain the generated records for these classes will be identical, and thus that it is OK to coerce one into the other.

Armed with `withDict`

and `callsAPIDict`

, we can play the part of the constraint solver and satisfy constraints ourself. GHC will happily compile the following example:

```
ex :: HasAPI ShoutService "shout" => IO String
= callAPI @ShoutService @"shout" "hello world"
ex
-- Look ma, no HasAPI constraint!
test :: IO String
= ex \\ callsAPIDict @ShoutService @"shout" test
```

So that’s the rough technique. But how do we actually use it in anger?

Our actual use case at work is to add these API calls to our swagger documentation. Swagger is this automatically generated manifest of an API surface; we want to document the fact that some API calls might call other ones. Our server is one big servant application, and servant is extensible. So the real technique is to build a servant combinator that eliminates `HasAPI`

constraints when you document them in the API definition.

Getting into the nitty gritty bits of servant is beyond the scope of this post, but we can sketch the idea. Servant APIs use the type-level `(:>)`

operator to combine information about an endpoint. For example, we might expose another service:

```
type ServantAPI = "api" :>
"echo"
:> ReqBody '[JSON] String
:> Get '[JSON] String
```

This definition states that we have a REST server with a single route, `api/echo`

which responds to `POST`

requests, returning a JSON-encoded string, which takes a JSON-encoded string as the request body.

A servant server for `ServantAPI`

would have type `Server ServantAPI`

, where `Server`

is a type family given by `HasServer`

. Evaluating the type family results in `String -> Handler String`

, so in order to implement this server, we would need to provide a function of that type.

Let’s implement our server endpoint:

```
echo :: CallsAPI ShoutService "shout"
=> String
-> Handler String
= liftIO $ callAPI @ShoutService @"shout" str echo str
```

Unfortunately, due to our earlier work, we can’t eliminate the `CallsAPI`

constraint, and thus we can’t actually use `echo`

as the handler for our endpoint.

It’s important to note that servant’s DSL is extensible, and we can add our own machinery here. The first step is to build a type that we can use in servant:

```
type MakesAPICall :: Service -> Symbol -> Type
data MakesAPICall srv name
```

We can now build a second version of `ServantAPI`

:

```
type ServantAPI = "api" :>
"echo"
:> MakesAPICall ShoutService "shout"
:> ReqBody '[JSON] String
:> Get '[JSON] String
```

In order to actually run our endpoint, we need to give an instance of `HasServer`

for our new `MakesAPICall`

combinator:

```
instance HasServer api ctx
=> HasServer (MakesAPICall srv name :> api) ctx
where
type ServerT (MakesAPICall srv name :> api) m =
Dict (CallsFed srv name) -> ServerT api m
=
route _ ctx f Proxy @api) ctx $ fmap ($ callsAPIDict @srv @name) f route (
```

The `ServerT`

instance here adds a `Dict (CallsFed srv name)`

to the type of the handler required to satisfy this endpoint, while `route`

automatically fills in the dictionary whenever the handler needs to be run. In an ideal world, we could give our `ServerT`

instance as:

```
type ServerT (MakesAPICall srv name :> api) m =
CallsFed srv name => ServerT api m
```

but GHC doesn’t let us use quantified types on the right-hand sides of type families, so this is unfortunately a no-go. Playing games with `Dict`

instead is the best approach I’ve found here, but I’d love to hear if anyone has a better idea.

We still can’t use `echo`

as a handler, but we can use `makesCall echo`

as one, where `makesCall`

is given as:

```
makesCall :: (c => r) -> Dict c -> r
= (\\) makesCall
```

Servers that document their API calls via `MakesAPICall`

and which wrap their handlers with `makesCall`

can now eliminate `CallsFed`

constraints. Since this is the only way of eliminating `CallsFed`

constraints, we can be sure that every API call is correctly documented in the servant DSL!

The final step here is to add an instance of `HasSwagger (MakesAPICall srv name :> api)`

, but the details are gory and devoid of educational value. Suffice it to say that this instance was written, and now we have automatically generated JSON documentation describing which server endpoints make which other API calls. This documentation is guaranteed to be correct, because updating it is the only way to convince GHC to compile your code.

Contrast this to the internet of yore. By virtue of being hard to access, the internet filtered away the mass appeal it has today. It was hard and expensive to get on, and in the absence of authoring tools, you were only creating internet content if you *had something to say.* Which meant that, as a consumer, if you found something, you had good reason to believe it was well-informed. Why would someone go through the hassle of making a website about something they weren’t interested in?

In 2022, we have a resoundingly sad answer to that question: advertising. The primary purpose of the web today is “engagement,” which is Silicon Valley jargon for “how many ads can we push through someone’s optical nerve?” Under the purview of engagement, it makes sense to publish webpages on every topic imaginable, regardless of whether or not you know what you’re talking about. In fact, engagement goes up if you *don’t* know what you’re talking about; your poor reader might mistakenly believe that they’ll find the answer they’re looking for elsewhere on your site. That’s twice the advertising revenue, baby!

But the spirit of the early web isn’t gone: the bookmarks I’ve kept these long decades mostly still work, and many of them still receive new content. There’s still weird, amateur, passion-project stuff out there. It’s just hard to find. Which brings us to our main topic: search.

Google is inarguably the front page of the internet. Maybe you already know where your next destination is, in which case you probably search for the website on Google and click on the first link, rather than typing in the address yourself. Or maybe you don’t already know your destination, and you search for it. Either way, you hit Google first.

When I say the internet is getting worse, what I really mean is that the Google search results are significantly less helpful than they used to be. This requires some qualification. Google has gotten exceedingly good at organizing everyday life. It reliably gets me news, recipes, bus schedules, tickets for local events, sports scores, simple facts, popular culture, official regulations, and access to businesses. It’s essentially the yellow pages and the newspaper put together. For queries like this, which are probably 95% of Googles traffic, Google does an excellent job.

The difficulties come in for that other 5%, the so-called “long tail.” The long tail is all those other things we want to know about. Things without well-established, factual answers. Opinions. Abstract ideas. Technical information. If you’re cynical, perhaps it’s all the stuff that doesn’t have wide-enough appeal to drive engagement. Whatever the reason, the long tail is the stuff that’s hard to find on the modern internet.

Notice that the long-tail is exactly the stuff we need search for. Mass-appeal queries are, almost by definition, not particularly hard to find. If I need a bus schedule, I know to talk to my local transit authority. If I’m looking to keep up with the Kardashians, I’m not going to have any problems (at least, no *search* problems.) On the other hand, it’s much less clear where to get information on why my phone starts overheating when I open the chess app.

So what happens if you search for the long tail on Google? If you’re like me, you flail around for ten minutes wasting your time reading crap articles before you remember that Google is awful for the long tail, and you come away significantly more frustrated, not having found what you were looking for in the first place.

Lets look at some examples. One of my favorite places in the world is Koh Lanta, Thailand. When traveling, I’m always on the lookout for places that give off the Koh Lanta vibe. What does that mean? Hard to say, exactly, but having tourist amenities without being touristy. Charming, slow, cheap. I don’t know exactly; if I did, it’d be easier to find. Anyway, forgetting that Google is bad at long tails, I search for `what is the koh lanta of croatia?`

and get:

- Koh-Lanta - Wikipedia [note: not the island, the game show]
- Top 15 Unique Things to Do in Koh Lanta
- Visit Koh Lanta on a trip to Thailand
- Beautiful places to travel, Koh lanta, Sunset
- Holiday Vacation to Koh Lanta: Our favourite beaches and …
- Koh Lanta Activities: 20 Best Things to Do
- etc

With the exception of “find a flight from Dubrovnik to Koh Lanta” on page two, you need to get to page five before you see any results that even acknowledge I *also* searched for `croatia`

. Not very impressive.

When you start paying attention, you’ll notice it on almost every search — Google isn’t actually giving you answers to things you searched for. Now, maybe the reason here is that there *aren’t* any good results for the query, but that’s a valuable thing to know as well. Don’t just hit me with garbage, it’s an insult to my intelligence and time.

I wanted to figure out why exactly the internet is getting worse. What’s going on with Google’s algorithm that leads to such a monotonous, boring, corporate internet landscape? I thought I’d dig into search engine optimization (SEO) — essentially, techniques that improve a website’s ranking in Google searches. I’d always thought SEO was better at selling itself than it was at improving search results, but my god was I wrong.

SEO techniques are extremely potent, and their widespread adoption is what’s wrong with the modern web.

For example, have you ever noticed that the main content of most websites is something like 70% down the page? Every recipe site I’ve ever seen is like this — nobody cares about how this recipe was originally your great-grandmother’s. Just tell us what’s in it. Why is this so prevalent on the web?

Google rewards a website for how long a user stays on it, with the reasoning being that a bad website has the user immediately hit the back button. Seems reasonable, until you notice the problem of incentives here. Websites aren’t being rewarded for having good content under this scheme, they’re rewarded for wasting your time and making information hard to find. Outcome: websites that answer questions, but hide the information somewhere on a giant (ad-filled) page.

Relatedly, have you noticed how every website begins with a stupid paragraph overviewing the thing you’re searching for? It’s always followed by a stupid paragraph describing why you should care about the thing. For example, I just searched for `garden irrigation`

, and the first result is:

Water is vital to plant health, but watering by hand can be a hassle. You have to drag hoses between gardens, move sprinklers around, or take the time to water each plant. Our innovative watering systems take the hassle out of watering. They’re the easiest way to give plants the consistent moisture they need for your biggest harvest and most beautiful blooms.

*Water is vital to plant health.* Wow, who knew! Why in god’s name would I be searching for garden irrigation if I didn’t know that water was vital to plant health. Why is copy like this so prevalent on the web?

Things become clearer when you look at some of the context of this page:

Url: https://[redacted]/how-to/how-to-choose-a-watering-system/8747.html

Title: How to Choose a Garden Irrigation System

Heading: Soak, Drip or Spray: Which is right for you?

Subheading: Choose the best of our easy, customizable, irrigation systems to help your plants thrive and save water

As it happens, Google rewards websites which use keywords in their url, title, headings, and first 100 words. Just by eyeballing, we can see that this particular website is targeting the keywords “water”, “system”, “irrigation”, and “garden”. Pages like these hyper-optimized to come up for particular searches. The stupid expository stuff exists only to pack “important keywords” into the first 100 words.

But keyword targeting doesn’t stop there. As I was reading through this SEO stuff (that is, the first page of a Google search for `seo tricks`

,) every single page offered 15-25 great, technical SEO tricks. And then, without fail, the final point on each page was “but really, the best SEO strategy is having great content!” That’s weird. “Great content” isn’t something an algorithm can identify; if it were, you wouldn’t be currently reading the ravings of a madman, angry about the state of the internet.

So, why do all of these highly-optimized SEO pages ubiquitously break form, switching from concrete techniques to platitudes? You guessed it, it’s a SEO technique! Google offers a keyword dashboard, where you can see which keywords group together, and (shudder) which keywords are *trending.* Google rewards you for having other keywords in the group on your page. And it extra rewards you for having trending keywords. You will not be surprised to learn that “quality content” is a keyword that clusters with “seo,” nor that it is currently a trending keyword.

Think about that for a moment. Under this policy, Google is incentivizing pages to become *less focused,* by adding text that is only tangentially related. But, how do related keywords come about? The only possible answer here is to find keywords that often cluster on other pages. But this is a classic death spiral, pulling every page in a topic to have the same content.

Another way of looking at it is that if you are being incentivized, you are being *disincentivized.* Webpages are being penalized for including original information, because original information can’t possibly be in the keyword cluster.

There are a multitude of perverse incentives from Google, but I’ll mention only two more. The first is that websites are penalized for having low-ranking pages. The conventional advice here is to delete “underperforming” pages, which only makes the search problem worse — sites are being rewarded for deleting pages that don’t align with the current search algorithm.

My last point: websites are penalized for even *linking* to low-ranking pages!

It’s not hard to put all of the pieces together and see why the modern web is so bland and monotonous. Not only is the front-page of the internet aggressively penalizing websites which *aren’t* bland and monotonous, it’s also punishing any site which has the audacity to link to more interesting parts of the web.

So the discoverable part of web sucks. But is that really Google’s fault? I’d argue no. By virtue of being the front-page, Google’s search results are under extreme scrutiny. In the eyes of the non-technical population, especially the older generations, the internet and Google are synonymous. The fact is that Google gets unfairly targeted by legislation because it’s a big, powerful tech company, and we as a society are uncomfortable with that.

Worse, the guys doing the regulation don’t exactly have a grasp on how internet things work.

Society at large has been getting very worried about disinformation. Who’s problem is that? Google’s — duh. Google is how we get information on the internet, so it’s up to them to defend us from disinformation.

Unfortunately it’s really hard to spot disinformation. Sometimes even the *government* lies to us (gasp!). I can think of two ways of avoiding getting in trouble with respect to disinformation. One: link only to *official sites,* thus changing the problem of trustworthiness to one of authority. If there is no authority, just give back the consensus. Two: don’t return any information whatsoever.

Google’s current strategy seems to be somewhere between one and two. For example, we can try a controversialish search like `long covid doesn't exist`

. The top results at time of writing are:

- The search for Long Covid (science.org)
- Small Study Finds No Obvious Physical Causes for Long COVID (medscape.com)
- Fact Check-‘Long COVID’ is not fake, quoted French study did … (reuters.com)
- Harvard Medical School expert explains ‘long COVID’ (harvard.edu)
- Claim that French study showed long COVID doesn’t exist … (healthfeedback.org)
- What doctors wish patients knew about long COVID (ama-assn.org)

I’m not particularly in the know, but I recognize most of these organizations. Science.org sounds official. Not only is one of the pages from Harvard, but also it’s from a Harvard Medical School *expert.* I especially like the fifth one, the metadata says:

Claim: Long COVID is “mostly a mental disease”; the condition long COVID is solely due to a person’s belief, not actual disease; long COVID doesn’t exist

Fact check by Health Feedback: Inaccurate

Every one of these websites comes off as *authoritative* — not in sense of “knowing what they’re talking about” because that’s hard to verify — but in the sense of being the sort of organization we’d trust to answer this question for us. Or, in the case of number five, at least telling us that they fact checked it.

Let’s try a search for something requiring less authority, like “best books.” In the past I would get a list of books considered the best. But now I get:

- The Greatest Books: The Best Books of All Time - 1 to 50
- The Best Books of All Time | chapters.indigo.ca
- 100 Best Books of All Time - Reader’s Digest
- Best Book Lists - Goodreads
- Best Books 2022: Books We Love : NPR

You’ll notice there are no actual books here. There are only *lists* of best books. Cynical me notes that if you were to actually list a book, someone could find it controversial. Instead, you can link to institutional websites, and let them take the controversy for their picks.

This isn’t the way the web needs to be. Google could just as well given me personal blogs of people talking about long covid and their favorite books, except (says cynical me) that these aren’t authoritative sources, and thus, linking to them could be considered endorsement. And the web is too big and too fast moving to risk linking to anything that hasn’t been vetted in advance. It’s just too easy to accidentally give a *good* result to a controversial topic, and have the law makers pounce on you. Instead, punt the problem back to authorities.

The web promised us a democratic, decentralized public forum, and all we got was the stinking yellow pages in digital format. I hope the crypto people can learn a lesson here.

Anyway, all of this is to say that I think lawmakers and liability concerns are the real reason the web sucks. All things being equal, Google would like to give us good results, but it prefers making boatloads of money, and that would be hard to do if it got regulated into nothingness.

Google isn’t the only search engine around. There are others, but it’s fascinating that none of them compete on the basis of providing better results. DDG claims to have better privacy. Ecosia claims to plant trees. Bing exists to keep Microsoft relevant post-2010, and for some reason, ranks websites for being highly-shared on social media (again, things that are, by definition, not hard to find.)

Why don’t other search engines compete on search results? It can’t be hard to do better than Google for the long tail.

It’s interesting to note that the problems of regulatory-fear and SEO-capture are functions of Google’s cultural significance. If Google were smaller or less important, there’d be significantly less negative-optimization pressure on it. Google is a victim of its own success.

That is to say, I don’t think all search engines are doomed to fail in the same way that Google has. A small search engine doesn’t need to be authoritative, because nobody is paying attention to it. And it doesn’t have to worry about SEO for the same reason — there’s no money to be made in manipulating its results.

What I dream of is Google circa 2006. A time where a search engine searched what you asked for. A time before aggressive SEO. A time before social media, when the only people on the internet had a reason to be there. A time before sticky headers and full-screen modal pop-ups asking you to subscribe to a newsletter before reading the article. A time before click-bait and subscription-only websites which tease you with a paragraph before blurring out the rest of the content.

These problems are all solvable with by a search engine. But that search engine isn’t going to be Google. Let’s de-rank awful sites, and boost personal blogs of people with interesting things to say. Let’s de-rank any website that contains ads. Let’s not index any click-bait websites, which unfortunately in 2022 includes most of the news.

What we need is a search engine, by the people, and for the people. Fuck the corporate interests and the regulatory bullshit. None of this is hard to do. It just requires someone to get started.

]]>A few months ago, the excellent David Rusu gave me an impromptu lecture on ring signatures, which are a way of signing something as an anonymous member of a group. That is, you can show someone in the signing pool was actually responsible for signing the thing, but can’t determine *which member of the pool actually signed it.* David walked me through all the math as to how that actually happens, but I was unable to follow it, because the math was hard and, perhaps more importantly, it felt like hand-compiling a proof.

What do I mean by “hand-compiling” a proof? Well, we have some mathematical object, something like

postulate Identity : Set Message : Set SignedBy : Message → Identity → Set use-your-imagination : {A : Set} → A record SignedMessage {n : ℕ} (pool : Vec Identity n) : Set where field message : Message @erased signer : Fin n signature : SignedBy message (lookup pool signer)

where `@erased`

is Agda’s runtime irrelevance annotation, meaning the signer field won’t exist at runtime. In fact, attempting to write a function that would extract it results in the following error:

Identifier

`signer`

is declared erased, so it cannot be used here

when checking that the expression`signer x`

has type`Fin n`

Nice one Agda!

Hand-compiling this thing is thus constructing some object that has the desired properties, but doing it in a way that requires BEING VERY SMART, and throwing away any chance at composability in the process. For example, it’d be nice to have the following:

open SignedMessage weakenL : ∀ {n pool new-id} → SignedMessage {n} pool → SignedMessage (new-id ∷ pool) weakenL x = use-your-imagination weakenR : ∀ {n pool new-id} → SignedMessage {n} pool → SignedMessage (pool ++ [ new-id ]) weakenR x = use-your-imagination

which would allow us to arbitrarily extend the pool of a signed message. Then, we could trivially construct one:

sign : Message → (who : Identity) → SignedMessage [ who ] message (sign msg who) = msg signer (sign msg who) = zero signature (sign msg who) = use-your-imagination

and then obfuscate who signed by some random choice of subsequent weakenLs and weakenRs.

Unfortunately, this is not the case with ring signatures. Ring signatures require you to “bake in” the signing pool when you construct your signature, and you can never again change that pool, short of doing all the work again. This behavior is non-composable, and thus, in my reckoning, unlikely to be a true solution to the problem.

The paper I chose to review this week is Proof-Carrying Code by George Necula, in an attempt to understand if the PL literature has anything to say about this problem.

PCC is an old paper (from 1997, egads!) but it was the first thing I found on the subject. I should really get better at vetting my literature before I go through the effort of going through it, but hey, what are you going to do?

The idea behind PCC is that we want to execute some untrusted machine code. But we don’t want to sacrifice our system security to do it. And we don’t want to evaluate some safe language into machine code, because that would be too slow. Instead, we’ll send the machine code, as well as a safety proof that verifies it’s safe to execute this code. The safety proof is tied to the machine code, such that you can’t just generate a safety proof for an unrelated problem, and then attach it to some malicious code. But the safety proof isn’t obfuscated or anything; the claim is that if you can construct a safety proof for a given program, that program is necessarily safe to run.

On the runtime side, there is a simple algorithm for checking the safety proof, and it is independent of the arguments that the program is run with; therefore, we can get away with checking code once and evaluating it many times. It’s important that the algorithm be simple, because it’s a necessarily trusted piece of code, and it would be bad news if it were to have bugs.

PCC’s approach is a bit… unimaginative. For every opcode we’d like to allow in the programs, we attach a safety precondition, and a postcondition. Then, we map the vector of opcodes we’d like to run into its pre/post conditions, and make sure they are confluent. If they are, we’re good to go. This vector of conditions is called the vector VC in the paper.

So, the compiler computes the VC and attaches it to the code. Think of the VC as a proposition of safety (that is, a type), and a proof of that proposition (the VC itself.) In order to validate this, the runtime does a safety typecheck, figuring out what the proposition of safety would have to be. It compares this against the attached proof, and if they match, it typechecks the VC to ensure it has the type it says. If it does, our code is safe.

The PCC paper is a bit light on details here, so it’s worth thinking about exactly what’s going on here. Presumably determining the safety preconditions is an easy problem if we can do it at runtime, but proving some code satisfies it is hard, *or else we could just do that at runtime too.*

I’m a bit hesitant to dive into the details here, because I don’t really care about determining whether some blob of machine code is safe to run. It’s a big ball of poorly typed typing judgments about memory usage. Why do I say poorly typed? Well consider one of the rules from the paper:

$\frac{m \vdash e : \tau \text{list} \quad \quad e \neq 0} {m \vdash e : \text{addr} \wedge \ldots}$

Here we have that from `e : List τ`

(and that `e`

isn’t 0) we can derive `e : addr`

. At best, if we are charitable in assuming $e \neq 0$ means that `e`

isn’t `nil`

, there is a type preservation error here. If we are less charitable, there is also some awful type error here involving 0, which might be a null check or something? This seems sufficiently messy that I don’t care enough to decipher it.

How applicable is any of this to our original question around ring signatures? Not very, I think, unfortunately. We already have the ring signature math if we’d like to encode a proof, and the verification of it is easy enough. But it’s still not very composable, and I doubt this paper will add much there. Some more promising approaches would be to draw the mystery commutative diagrams ala Adders and Arrows, starting from a specification and deriving a chain of proofs that the eventual implementation satisfies the specification. The value there is in all the intermediary nodes of the commutative diagram, and whether we can prove weakening lemmas there.

But PCC isn’t entirely a loss; I learned about `@erased`

in Agda.

I was describing my idea from last week to automatically optimize programs to Colin, who pointed me towards Syntax-Guided Synthesis by Alur et al.

Syntax-Guided Synthesis is the idea that free-range program synthesis is really hard, so instead, let’s constrain the search space with a grammar of allowable programs. We can then enumerate those possible programs, attempting to find one that satisfies some constraints. The idea is quite straightforward when you see it, but that’s not to say it’s unimpressive; the paper has lots of quantitative results about exactly how well this approach does.

The idea is we want to find programs with type I `→`

O, that satisfy some specification. We’ll do that by picking some Language of syntax, and trying to build our programs there.

All of this is sorta moot, because we assume we have some oracle which can tell us if our program satisfies the spec. But the oracle is probably some SMT solver, and is thus expensive to call, so we’d like to try hard not to call it if possible.

Let’s take an example, and say that we’d like to synthesize the `max`

of two `Nat`

s. There are lots of ways of doing that! But we’d like to find a function that satisfies the following:

data MaxSpec (f : ℕ × ℕ → ℕ) : ℕ × ℕ → Set where is-max : {x y : ℕ} → x ≤ f (x , y) → y ≤ f (x , y) → ((f (x , y) ≡ x) ⊎ (f (x , y) ≡ y)) → MaxSpec f (x , y)

If we can successfully produce an element of MaxSpec `f`

, we have a proof that `f`

is an implementation of `max`

. Of course, actually producing such a thing is rather tricky; it’s equivalent to determining if MaxSpec `f`

is Decidable for the given input.

In the first three cases, we have some conflicting piece of information, so we are unable to produce a MaxSpec:

decideMax : (f : ℕ × ℕ → ℕ) → (i : ℕ × ℕ) → Dec (MaxSpec f i) decideMax f i@(x , y) with f i | inspect f i ... | o | [ fi≡o ] with x ≤? o | y ≤? o ... | no ¬x≤o | _ = no λ { (is-max x≤o _ _) → contradiction (≤-trans x≤o (≤-reflexive fi≡o)) ¬x≤o } ... | yes _ | no ¬y≤o = no λ { (is-max x y≤o x₂) → contradiction (≤-trans y≤o (≤-reflexive fi≡o)) ¬y≤o } ... | yes x≤o | yes y≤o with o ≟ x | o ≟ y ... | no x≠o | no y≠o = no λ { (is-max x x₁ (inj₁ x₂)) → contradiction (trans (sym fi≡o) x₂) x≠o ; (is-max x x₁ (inj₂ y)) → contradiction (trans (sym fi≡o) y) y≠o }

Otherwise, we have a proof that `o`

is equal to either `y`

or `x`

:

... | no proof | yes o≡y = yes (is-max (≤-trans x≤o (≤-reflexive (sym fi≡o))) (≤-trans y≤o (≤-reflexive (sym fi≡o))) (inj₂ (trans fi≡o o≡y))) ... | yes o≡x | _ = yes (is-max (≤-trans x≤o (≤-reflexive (sym fi≡o))) (≤-trans y≤o (≤-reflexive (sym fi≡o))) (inj₁ (trans fi≡o o≡x)))

MaxSpec is a proof that our function is an implementation of `max`

, and decideMax is a proof that “we’d know one if we saw one.” So that’s the specification taken care of. The next step is to define the syntax we’d like to guard our search.

The paper presents this syntax as a BNF grammar, but my thought is why use a grammar when we could instead use a type system? Our syntax is a tiny little branching calculus, capable of representing Terms and branching Conditionals:

mutual data Term : Set where var-x : Term var-y : Term const : ℕ → Term if-then-else : Cond → Term → Term → Term data Cond : Set where leq : Term → Term → Cond and : Cond → Cond → Cond invert : Cond → Cond

All that’s left for our example is the ability to “compile” a Term down to a candidate function. Just pattern match on the constructors and push the inputs around until we’re done:

mutual eval : Term → ℕ × ℕ → ℕ eval var-x (x , y) = x eval var-y (x , y) = y eval (const c) (x , y) = c eval (if-then-else c t f) i = if evalCond c i then eval t i else eval f i evalCond : Cond → ℕ × ℕ → Bool evalCond (leq m n) i = Dec.does (eval m i ≤? eval n i) evalCond (and c1 c2) i = evalCond c1 i ∧ evalCond c2 i evalCond (invert c) i = not (evalCond c i)

So that’s most of the idea; we’ve specified what we’re looking for, via MaxSpec, what our syntax is, via Term, and a way of compiling our syntax into functions, via eval. This is the gist of the technique; the rest is just algorithms.

The paper presents several algorithms and evaluates their performances. But one is clearly better than the others in the included benchmarks, so we’ll just go through that one.

Our algorithm to synthesize code corresponding to the specification takes a few parameters. We’ve seen the first few:

module Solver {Lang I O : Set} (spec : (I → O) → I → Set) (decide : (f : I → O) → (i : I) → Dec (spec f i)) (compile : Lang → I → O)

However, we also need a way of synthesizing terms in our Language. For that, we’ll use enumerate, which maps a natural number to a term:

(enumerate : ℕ → Lang)

Although it’s not necessary for the algorithm, we should be able to implement exhaustive over enumerate, which states every Lang is eventually produced by enumerate:

(exhaustive : (x : Lang) → Σ[ n ∈ ℕ ] (enumerate n ≡ x))

Finally, we need an oracle capable of telling us if our solution is correct. This might sound a bit like cheating, but behind the scenes it’s just a magic SMT solver. The idea is that SMT can either confirm that our program is correct, or produce a counterexample that violates the spec. The type here is a bit crazy, so we’ll take it one step at a time.

An oracle is a function that takes a Lang…

(oracle : (exp : Lang)

and either gives back a function that can produce a `spec (compile exp)`

for every input:

→ ((i : I) → spec (compile exp) i)

or gives back some input which is not a `spec (compile exp)`

:

⊎ Σ[ i ∈ I ] ¬ spec (compile exp) i) where

The algorithm here is actually quite clever. The idea is that to try each enumerated value in order, attempting to minimize the number of calls we make to the oracle, because they’re expensive. So instead, well keep a list of every counterexample we’ve seen so far, and ensure that our synthesized function passes all of them before sending it off to the oracle. First, we’ll need a data structure to store our search progress:

record SearchState : Set where field iteration : ℕ cases : List I open SearchState

The initial search state is one in which we start at the beginning, and have no counterexamples:

start : SearchState iteration start = 0 cases start = []

We can try a function by testing every counterexample:

try : (I → O) → List I → Bool try f = all (Dec.does ∘ decide f)

and finally, can now attempt to synthesize some code. Our function check takes a SearchState, and either gives back the next step of the search, or some program, and a proof that it’s what we’re looking for.

check : SearchState → SearchState ⊎ (Σ[ exp ∈ Lang ] ((i : I) → spec (compile exp) i)) check ss

We begin by getting and compiling the next enumerated term:

with enumerate (iteration ss) ... | exp with compile exp

check if it passes all the previous counterexamples:

... | f with try f (cases ss)

if it doesn’t, just fail with the next iteration:

... | false = inj₁ (record { iteration = suc (iteration ss) ; cases = cases ss })

Otherwise, our proposed function might just be the thing we’re looking for, so it’s time to consult the oracle:

... | true with oracle exp

which either gives a counterexample that we need to record:

... | inj₂ (y , _) = inj₁ (record { iteration = suc (iteration ss) ; cases = y ∷ cases ss })

or it confirms that our function satisfies the specification, and thus that were done:

... | inj₁ x = inj₂ (exp , x)

Pretty cool! The paper gives an optimization that caches the result of every counterexample on every synthesized program, and reuses these whenever that program appears as a subprogram of a larger one. The idea is that we can trade storage so we only ever need to evaluate each subprogram once — important for expensive computations.

Of course, pumping check by hand is annoying, so we can instead package it up as solve which takes a search depth, and iterates check until it runs out of gas or gets the right answer:

solve : ℕ → Maybe (Σ[ exp ∈ Lang ] ((i : I) → spec (compile exp) i)) solve = go start where go : SearchState → ℕ → Maybe (Σ Lang (λ exp → (i : I) → spec (compile exp) i)) go ss zero = nothing go ss (suc n) with check ss ... | inj₁ x = go ss n ... | inj₂ y = just y]]>

Today we’re heading back into the Elliottverse — a beautiful world where programming is principled and makes sense. The paper of the week is Conal Elliott’s Generic Parallel Functional Programming, which productively addresses the duality between “easy to reason about” and “fast to run.”

Consider the case of a right-associated list, we can give a scan of it in linear time and constant space:

module ExR where data RList (A : Set) : Set where RNil : RList A _◁_ : A → RList A → RList A infixr 5 _◁_ scanR : ⦃ Monoid A ⦄ → RList A → RList A scanR = go mempty where go : ⦃ Monoid A ⦄ → A → RList A → RList A go acc RNil = RNil go acc (x ◁ xs) = acc ◁ go (acc <> x) xs

This is a nice functional algorithm that runs in $O(n)$ time, and requires $O(1)$ space. However, consider the equivalent algorithm over left-associative lists:

module ExL where data LList (A : Set) : Set where LNil : LList A _▷_ : LList A → A → LList A infixl 5 _▷_ scanL : ⦃ Monoid A ⦄ → LList A → LList A scanL = proj₁ ∘ go where go : ⦃ Monoid A ⦄ → LList A → LList A × A go LNil = LNil , mempty go (xs ▷ x) = let xs' , acc = go xs in xs' ▷ acc , x <> acc

While scanL is also $O(n)$ in its runtime, it is not amenable to tail call optimization, and thus also requires $O(n)$ *space.* Egads!

You are probably not amazed to learn that different ways of structuring data lead to different runtime and space complexities. But it’s a more interesting puzzle than it sounds; because RList and LList are isomorphic! So what gives?

Reed’s pithy description here is

Computation time doesn’t respect isos

Exploring that question with him has been very illuminating. Math is deeply about extentionality; two mathematical objects are equivalent if their abstract interfaces are indistinguishable. Computation… doesn’t have this property. When computing, we care a great deal about runtime performance, which depends on fiddly implementation details, even if those aren’t externally observable.

In fact, as he goes on to state, this is the whole idea of denotational design. Figure out the extensional behavior first, and then figure out how to implement it.

This all harkens back to my review of another of Elliott’s papers, Adders and Arrows, which starts from the extensional behavior of natural addition (encoded as the Peano naturals), and then derives a chain of proofs showing that our everyday binary adders preserve this behavior.

Anyway, let’s switch topics and consider a weird fact of the world. Why do so many parallel algorithms require gnarly array indexing? Here’s an example I found by googling for “parallel c algorithms cuda”:

```
void stencil_1d(int *in, int *out) {
__global__ int temp[BLOCK_SIZE + 2 * RADIUS];
__shared__ int gindex = threadIdx.x + blockIdx.x * blockDim.x;
int lindex = threadIdx.x + RADIUS;
[lindex] = in[gindex];
tempif (threadIdx.x < RADIUS) {
[lindex - RADIUS] = in[gindex - RADIUS];
temp[lindex + BLOCK_SIZE] =
temp[gindex + BLOCK_SIZE];
in}
();
__syncthreadsint result = 0;
for (int offset = -RADIUS ; offset <= RADIUS ; offset++)
+= temp[lindex + offset];
result [gindex] = result;
out}
```

and here’s another, expressed as an “easy induction” recurrence relation, from Richard E Ladner and Michael J Fischer. Parallel prefix computation:

Sweet lord. No wonder we’re all stuck pretending our computer machines are single threaded behemoths from the 1960s. Taking full advantage of parallelism on modern CPUs must require a research team and five years!

But it’s worth taking a moment and thinking about what all of this janky indexing must be doing. Whatever algorithm is telling the programmer which indices to write where necessarily must be providing a view on the data. That is, the programmer has some sort of “shape” in mind for how the problem should be subdivided, and the indexing is an implementation of accessing the raw array elements in the desired shape.

At risk of beating you on the head with it, this array indexing is *a bad implementation of a type system.* Bad because it’s something the implementer needed to invent by hand, and is not in any form that the compiler can help ensure the correctness of.

That returns us to the big contribution of *Generic Function Parallel Algorithms,* which is a technique for decoupling the main thrust of an algorithm from extentionally-inconsequential encodings of things. The idea is to implement the algorithm on lots of trivial data structures, and then compose those small pieces together to get a *class* of algorithms.

The first step is to determine which trivial data structures we need to support. Following the steps of Haskell’s `GHC.Generics`

module, we can decompose any Haskell98 data type as compositions of the following pieces:

data Rep : Set₁ where V : Rep U : Rep K : Set → Rep Par : Rep Rec : (Set → Set) → Rep _:+:_ : Rep → Rep → Rep _:*:_ : Rep → Rep → Rep _:∘:_ : Rep → Rep → Rep

which we can embed in Set via Represent:

open import Data.Empty open import Data.Sum open import Data.Unit hiding (_≤_) record Compose (F G : Set → Set) (A : Set) : Set where constructor compose field composed : F (G A) open Compose Represent : Rep → Set → Set Represent V a = ⊥ Represent U a = ⊤ Represent (K x) a = x Represent Par a = a Represent (Rec f) a = f a Represent (x :+: y) a = Represent x a ⊎ Represent y a Represent (x :*: y) a = Represent x a × Represent y a Represent (x :∘: y) a = Compose (Represent x) (Represent y) a

If you’ve ever worked with `GHC.Generics`

, none of this should be very exciting. We can bundle everything together, plus an iso to transform to and from the Represented type:

record Generic (F : Set → Set) : Set₁ where field RepOf : Rep from : F A → Represent RepOf A to : Represent RepOf A → F A open Generic ⦃ ... ⦄ GenericRep : (F : Set → Set) → ⦃ Generic F ⦄ → Set → Set GenericRep _ = Represent RepOf

Agda doesn’t have any out-of-the-box notion of `-XDeriveGeneric`

, which seems like a headache at first blush. It means we need to explicitly write out a RepOf and from/to pairs by hand, *like peasants.* Surprisingly however, needing to implement by hand is beneficial, as it reminds us that RepOf *is not uniquely determined!*

A good metaphor here is the number 16, which stands for some type we’d like to generify. A RepOf for 16 is an equivalent representation for 16. Here are a few:

- $2+(2+(2+(2+(2+(2+(2+2))))))$
- $((2+2)*2)+(((2+2)+2)+2)$
- $2 \times 8$
- $8 \times 2$
- $(4 \times 2) \times 2$
- $(2 \times 4) \times 2$
- $4 \times 4$
- $2^4$
- $2^{2^2}$

And there are lots more! Each of $+$, $\times$ and exponentiation corresponds to a different way of building a type, so every one of these expressions is a distinct (if isomorphic) type with 16 values. Every single possible factoring of 16 corresponds to a different way of dividing-and-conquering, which is to say, a different (but related) algorithm.

The trick is to define our algorithm inductively over each Set that can result from Represent. We can then pick different algorithms from the class by changing the specific way of factoring our type.

Let’s consider the case of left scans. I happen to know it’s going to require Functor capabilities, so we’ll also define that:

record Functor (F : Set 𝓁 → Set 𝓁) : Set (lsuc 𝓁) where field fmap : {A B : Set 𝓁} → (A → B) → F A → F B record LScan (F : Set → Set) : Set₁ where field overlap ⦃ func ⦄ : Functor F lscan : ⦃ Monoid A ⦄ → F A → F A × A open Functor ⦃ ... ⦄ open LScan ⦃ ... ⦄

What’s with the type of lscan? This thing is an exclusive scan, so the first element is always mempty, and thus the last elemenet is always returned as proj₂ of lscan.

We need to implement LScan for each Representation, and because there is no global coherence requirement in Agda, we can define our Functor instances at the same time.

The simplest case is void which we can scan because we have a ⊥ in negative position:

instance lV : LScan (\a → ⊥) lV .func .fmap f x = ⊥-elim x lV .lscan ()

⊤ is also trivial. Notice that there isn’t any `a`

inside of it, so our final accumulated value must be mempty:

lU : LScan (\a → ⊤) lU .func .fmap f x = x lU .lscan x = x , mempty

The identity functor is also trivial. Except this time, we *do* have a result, so it becomes the accumulated value, and we replace it with how much we’ve scaned thus far (nothing):

lP : LScan (\a → a) lP .func .fmap f = f lP .lscan x = mempty , x

Coproducts are uninteresting; we merely lift the tag:

l+ : ⦃ LScan F ⦄ → ⦃ LScan G ⦄ → LScan (\a → F a ⊎ G a) l+ .func .fmap f (inj₁ y) = inj₁ (fmap f y) l+ .func .fmap f (inj₂ y) = inj₂ (fmap f y) l+ .lscan (inj₁ x) = let x' , y = lscan x in inj₁ x' , y l+ .lscan (inj₂ x) = let x' , y = lscan x in inj₂ x' , y

And then we come to the interesting cases. To scan the product of `F`

and `G`

, we notice that every left scan of `F`

is a prefix of `F × G`

(because `F`

is on the left.) Thus, we can use `lscan F`

directly in the result, and need only adjust the results of `lscan G`

with the accumulated value from `F`

:

l* : ⦃ LScan F ⦄ → ⦃ LScan G ⦄ → LScan (\a → F a × G a) l* .func .fmap f x .proj₁ = fmap f (x .proj₁) l* .func .fmap f x .proj₂ = fmap f (x .proj₂) l* .lscan (f-in , g-in) = let f-out , f-acc = lscan f-in g-out , g-acc = lscan g-in in (f-out , fmap (f-acc <>_) g-out) , f-acc <> g-acc

l* is what makes the whole algorithm parallel. It says we can scan `F`

and `G`

in parallel, and need only a single join node at the end to stick `f-acc <>_`

on at the end. This parallelism is visible in the `let`

expression, where there is no data dependency between the two bindings.

Our final generic instance of LScan is over composition. Howevef, we can’t implement LScan for every composition of functors, since we require the ability to “zip” two functors together. The paper is pretty cagey about exactly what `Zip`

is, but after some sleuthing, I think it’s this:

record Zip (F : Set → Set) : Set₁ where field overlap ⦃ func ⦄ : Functor F zip : {A B : Set} → F A → F B → F (A × B) open Zip ⦃ ... ⦄

That looks a lot like being an applicative, but it’s missing `pure`

and has some weird idempotent laws that are not particularly relevant today. We can define some helper functions as well:

zipWith : ∀ {A B C} → ⦃ Zip F ⦄ → (A → B → C) → F A → F B → F C zipWith f fa fb = fmap (uncurry f) (zip fa fb) unzip : ⦃ Functor F ⦄ → {A B : Set} → F (A × B) → F A × F B unzip x = fmap proj₁ x , fmap proj₂ x

Armed with all of this, we can give an implementation of lscan over functor composition. The idea is to lscan each inner functor, which gives us an `G (F A × A)`

. We can then unzip that, whose second projection is then the totals of each inner scan. If we scan these *totals*, we’ll get a running scan for the whole thing; and all that’s left is to adjust each.

instance l∘ : ⦃ LScan F ⦄ → ⦃ LScan G ⦄ → ⦃ Zip G ⦄ → LScan (Compose G F) l∘ .func .fmap f = fmap f l∘ .lscan (compose gfa) = let gfa' , tots = unzip (fmap lscan gfa) tots' , tot = lscan tots adjustl t = fmap (t <>_) in compose (zipWith adjustl tots' gfa') , tot

And we’re done! We now have an algorithm defined piece-wise over the fundamental ADT building blocks. Let’s put it to use.

Let’s pretend that Vecs are random access arrays. We’d like to be able to build array algorithms out of our algorithmic building blocks. To that end, we can make a typeclass corresponding to types that are isomorphic to arrays:

open import Data.Nat open import Data.Vec hiding (zip; unzip; zipWith) record ArrayIso (F : Set → Set) : Set₁ where field Size : ℕ deserialize : Vec A Size → F A serialize : F A → Vec A Size -- also prove it's an iso open ArrayIso ⦃ ... ⦄

There are instances of ArrayIso for the functor building blocks (though none for :+: since arrays are big records.) We can now use an ArrayIso and an LScan to get our desired parallel array algorithms:

genericScan : ⦃ Monoid A ⦄ → (rep : Rep) → ⦃ d : ArrayIso (Represent rep) ⦄ → ⦃ LScan (Represent rep) ⦄ → Vec A (Size ⦃ d ⦄) → Vec A (Size ⦃ d ⦄) × A genericScan _ ⦃ d = d ⦄ x = let res , a = lscan (deserialize x) in serialize ⦃ d ⦄ res , a

I think this is the first truly dependent type I’ve ever written. We take a Rep corresponding to how we’d like to divvy up the problem, and then see if the Represent of that has ArrayIso and LScan instances, and then give back an algorithm that scans over arrays of the correct Size.

Finally we’re ready to try this out. We can give the RList implementation from earlier:

▷_ : Rep → Rep ▷_ a = Par :*: a _ : ⦃ Monoid A ⦄ → Vec A 4 → Vec A 4 × A _ = genericScan (▷ ▷ ▷ Par)

or the LList instance:

_◁ : Rep → Rep _◁ a = a :*: Par _ : ⦃ Monoid A ⦄ → Vec A 4 → Vec A 4 × A _ = genericScan (Par ◁ ◁ ◁)

But we can also come up with more interesting strategies as well. For example, we can divvy up the problem by left-associating the first half, and right-associating the second:

_ : ⦃ Monoid A ⦄ → Vec A 8 → Vec A 8 × A _ = genericScan ((Par ◁ ◁ ◁) :*: (▷ ▷ ▷ Par))

This one probably isn’t an *efficient* algorithm, but it’s cool that we can express such a thing so succinctly. Probably of more interest is a balanced tree over our array:

_ : ⦃ Monoid A ⦄ → Vec A 16 → Vec A 16 × A _ = let ⌊_⌋ a = a :*: a in genericScan (⌊ ⌊ ⌊ ⌊ Par ⌋ ⌋ ⌋ ⌋)

The balanced tree over products is interesting, but what if we make a balanced tree over *composition?* In essence, we can split the problem into chunks of $2^(2^n)$ amounts of work via Bush:

{-# NO_POSITIVITY_CHECK #-} data Bush : ℕ → Set → Set where twig : A × A → Bush 0 A bush : {n : ℕ} → Bush n (Bush n A) → Bush (suc n) A

Which we won’t use directly, but can use it’s Rep:

_ : ⦃ Monoid A ⦄ → Vec A 16 → Vec A 16 × A _ = let pair = Par :*: Par in genericScan ((pair :∘: pair) :∘: (pair :∘: pair))

The paper compares several of these strategies for dividing-and-conquering. In particular, it shows that we can minimize total work via a left-associated ⌊_⌋ strategy, but maximize parallelism with a *right*-associated ⌊_⌋. And using the `Bush`

from earlier, we can get a nice middle ground.

The paper follows up, applying this approach to implementations of the fast fourier transform. There, the Bush approach gives constant factor improvments for both *work* and *parallelism,* compared to all previously known algorithms.

Results like these are strong evidence that Elliott is *actually onto something* with his seemingly crazy ideas that computation should be elegant and well principled. Giving significant constant factor improvements to well-known, extremely important algorithms *mostly for free* is a true superpower, and is worth taking extremely seriously.

Andrew McKnight and I tried to use this same approach to get a nice algorithm for sorting, hoping that we could get well-known sorting algorithms to fall out as special cases of our more general functor building blocks. We completely failed on this front, namely because we couldn’t figure out how to give an instance for product types. Rather alarmingly, we’re not entirely sure *why* the approach failed there; maybe it was just not thinking hard enough.

Another plausible idea is that sorting requires branching, and that this approach only works for statically-known codepaths.

Andrew and I spent a good chunk of the week thinking about this problem, and we figure there are solid odds that you could *automatically* discover these generic algorithmic building blocks from a well-known algorithm. Here’s the sketch:

Use the well-known algorithm as a specification, instantiate all parameters at small types and see if you can find instances of the algorithm for the functor building blocks that agree with the spec. It seems like you should be able to use factorization of the input to target which instances you’re looking for.

Of course, once you have the algorithmic building blocks, conventional search techniques can be used to optimize any particular goal you might have.

]]>We might as well dive in. Since all of this complexity analysis stuff shouldn’t *change* anything at runtime, we really only need to stick the analysis in the types, and can erase it all at runtime.

The paper thus presents its main tools in an `abstract`

block, which is a new Agda feature for me. And wow, does Agda ever feel like it’s Haskell but from the future. An `abstract`

block lets us give some definitions, which *inside* the `abstract`

block can be normalized. But outside the block, they are opaque symbols that are just what they are. This is a delightful contrast to Haskell, where we need to play a game of making a new module, and carefully not exporting things in order to get the same behavior. And even then, in Haskell, we can’t give opaque `type`

synonyms or anything like that.

Anyway, the main type in the paper is Thunk, which tracks how many computation steps are necessary to produce an eventual value:

abstract Thunk : ℕ → Set → Set Thunk n a = a

Because none of this exists at runtime, we can just ignore the `n`

argument, and use the `abstract`

ion barrier to ensure nobody can use this fact in anger. Thunk is a *graded* monad, that is, a monad parameterized by a monoid, which uses `mempty`

for `pure`

, and `mappend`

for binding. We can show that Thunk does form a graded monad:

pure : a → Thunk 0 a pure x = x infixl 1 _>>=_ _>>=_ : Thunk m a → (a → Thunk n b) → Thunk (m + n) b x >>= f = f x infixr 1 _=<<_ _=<<_ : (a → Thunk n b) → Thunk m a → Thunk (m + n) b f =<< x = f x

We’ll omit the proofs that Thunk really is a monad, but it’s not hard to see; Thunk is truly just the identity monad.

Thunk is also equipped with two further operations; the ability to mark a computation cycle, and the ability to extract the underlying value by throwing away the complexity analysis:

infixr 0 !_ !_ : Thunk n a → Thunk (suc n) a !_ a = a force : {a : Set} → Thunk n a → a force x = x

Here, !_ is given a low, right-spanning precedence, which means it’s relatively painless to annotate with:

_ : Thunk 3 ℕ _ = ! ! ! pure 0

Our definitions are “opt-in,” in the sense that the compiler won’t yell at you if you forget to call !_ somewhere a computational step happens. Thus, we require users to follow the following conventions:

- Every function body must begin with a call to !_.
- force may not be used in a function body.
- None of pure, _>>=_ nor !_ may be called partially applied.

The first convention ensures we count everything that should be counted. The second ensures we don’t cheat by discarding complexity information before it’s been counted. And the third ensures we don’t accidentally introduce uncounted computation steps.

The first two are pretty obvious, but the third is a little subtler. Under the hood, partial application gets turned into a lambda, which introduces a computation step to evaluate. But that step won’t be ticked via !_, so we will have lost the bijection between our programs and their analyses.

The paper shows us how to define a lazy vector. VecL `a c n`

is a vector of `n`

elements of type `a`

, where the cost of forcing each subsequent tail is `c`

:

{-# NO_POSITIVITY_CHECK #-} data VecL (a : Set) (c : ℕ) : ℕ → Set where [] : VecL a c 0 _∷_ : a → Thunk c (VecL a c n) → VecL a c (suc n) infixr 5 _∷_

Let’s try to write fmap for VecL. We’re going to need a helper function, which delays a computation by artificially inflating its number of steps:

abstract wait : {n : ℕ} → Thunk m a → Thunk (n + m) a wait m = m

(the paper follows its own rules and ensures that we call !_ every time we wait, thus it comes with an extra suc in the type of wait. It gets confusing, so we’ll use this version instead.)

Unfortunately, the paper also plays fast and loose with its math. It’s fine, because the math is right, but the code presented in the paper doesn’t typecheck in Agda. As a workaround, we need to enable rewriting:

open import Agda.Builtin.Equality.Rewrite {-# REWRITE +-suc +-identityʳ #-}

We’ll also need to be able to lift equalities over the `Thunk`

time bounds:

cast : m ≡ n → Thunk m a → Thunk n a cast eq x rewrite eq = x

Finally, we can write fmap:

fmap : {c fc : ℕ} → (a → Thunk fc b) → VecL a c n → Thunk (2 + fc) (VecL b (2 + fc + c) n) fmap f [] = wait (pure []) fmap {c = c} f (x ∷ xs) = ! f x >>= \x' → ! pure (x' ∷ cast (+-comm c _) (xs >>= fmap f))

This took me about an hour to write; I’m not convinced the approach here is as “lightweight” as claimed. Of particular challenge was figuring out the actual time bounds on this thing. The problem is that we usually reason about asymptotics via Big-O notation, which ignores all of these constant factors. What would be nicer is the hypothetical type:

```
fmap
: {c fc : ℕ}
Thunk (O fc) b)
→ (a → VecL a c n
→ Thunk (O c) (VecL b (O (fc + c)) n) →
```

where every thunk is now parameterized by `O x`

saying our asymptotics are bounded by `x`

. We’ll see about fleshing this idea out later. For now, we can power through on the paper, and write vector insertion. Let’s assume we have a constant time comparison function for a:

postulate _<=_ : a → a → Thunk 1 Bool

First things first, we need another waiting function to inflate the times on every tail:

waitL : {c' : ℕ} {c : ℕ} → VecL a c' n → Thunk 1 (VecL a (2 + c + c') n) waitL [] = ! pure [] waitL (x ∷ xs) = ! pure (x ∷ wait (waitL =<< xs))

and a helper version of if_then_else_ which accounts in Thunk:

if_then_else_ : Bool → a → a → Thunk 1 a if false then t else f = ! pure f if true then t else f = ! pure t infixr 2 if_then_else_

we can thus write vector insertion:

insert : {c : ℕ} → a → VecL a c n → Thunk 4 (VecL a (4 + c) (suc n)) insert x [] = wait (pure (x ∷ wait (pure []))) insert x (y ∷ ys) = ! x <= y >>= \b → ! if b then x ∷ wait (waitL (y ∷ ys)) else y ∷ (insert x =<< ys)

The obvious followup to insert is insertion sort:

open import Data.Vec using (Vec; []; _∷_; tail) sort : Vec a n → Thunk (1 + 5 * n) (VecL a (4 * n) n) sort [] = ! pure [] sort (x ∷ xs) = ! insert x =<< sort xs

This thing looks linear, but insertion sort is $O(n^2)$, so what gives? The thing to notice is that the cost of each *tail* is linear, but we have $O(n)$ tails, so forcing the whole thing indeed works out to $O(n^2)$. We can now show head runs in constant time:

head : {c : ℕ} → VecL a c (suc n) → Thunk 1 a head (x ∷ _) = ! pure x

and that we can find the minimum element in linear time:

minimum : Vec a (suc n) → Thunk (8 + 5 * n) a minimum xs = ! head =<< sort xs

Interestingly, Agda can figure out the bounds on minimum by itself, but not any of our other functions.

The paper goes on to show that we can define last, and then get a quadratic-time `maximum`

using it:

last : {c : ℕ} → VecL a c (suc n) → Thunk (1 + suc n * suc c) a last (x ∷ xs) = ! last' x =<< xs where last' : {c : ℕ} → a → VecL a c n → Thunk (1 + n * suc c) a last' a [] = ! pure a last' _ (x ∷ xs) = ! last' x =<< xs

Trying to define `maximum`

makes Agda spin, probably because of one of my rewrite rules. But here’s what it should be:

```
maximum : Vec a (suc n) → Thunk (13 + 14 * n + 4 * n ^ 2) a
maximum xs = ! last =<< sort xs
```

The paper goes on to say some thinks about partially evaluating thunks, and then shows its use to measure some popular libraries. But I’m more interested in making the experience better.

Clearly this is all too much work. When we do complexity analysis by hand, we are primarily concerned with *complexity classes,* not exact numbers of steps. How hard would it be to generalize all of this so that `Thunk`

takes a function bounding the runtime necessary to produce its value?

First, a quick refresher on what big-O means. A function $f : \mathbb{N} \to \mathbb{N}$ is said to be in $O(g)$ for some $g : \mathbb{N} \to \mathbb{N}$ iff:

$\exists (C k : \mathbb{N}). \forall (n : \mathbb{N}, k \leq n). f(n) \leq C \cdot g(n)$

That is, there is some point $k$ at which $g(n)$ stays above $f(n)$. This is the formal definition, but in practice we usually play rather fast and loose with our notation. For example, we say “quicksort is $O(n\cdot\log{n})$ in the length of the list”, or “$O(n\cdot\log{m})$ , where $m$ is the size of the first argument.”

We need to do a bit of elaboration here to turn these informal statements into a formal claim. In both cases, there should are implicit binders inside the $O(-)$, binding $n$ in the first, and $m, n$ in the second. These functions then get instantiated with the actual sizes of the lists. It’s a subtle point, but it needs to be kept in mind.

The other question is how the hell do we generalize that definition to multiple variables? Easy! We replace $n : \mathbb{N}, k \leq n$ with a vector of natural numbers, subject to the constraint that they’re *all* bigger than $k$.

OK, let’s write some code. We can give the definition of O:

open import Data.Vec.Relation.Unary.All using (All; _∷_; []) renaming (tail to tailAll) record O {vars : ℕ} (g : Vec ℕ vars → ℕ) : Set where field f : Vec ℕ vars → ℕ C : ℕ k : ℕ def : (n : Vec ℕ vars) → All (k ≤_) n → f n ≤ C * g n

The generality of O is a bit annoying for the common case of being a function over one variable, so we can introduce a helper function O':

hoist : {a b : Set} → (a → b) → Vec a 1 → b hoist f (x ∷ []) = f x O' : (ℕ → ℕ) → Set O' f = O (hoist f)

We can trivially lift any function `f`

into O `f`

:

O-build : {vars : ℕ} → (f : Vec ℕ vars → ℕ) → O f O-build f .O.f = f O-build f .O.C = 1 O-build f .O.k = 0 O-build f .O.def n x = ≤-refl

and also trivially weaken an O into using more variables:

O-weaken : ∀ {vars} {f : Vec ℕ vars → ℕ} → O f → O (f ∘ tail) O-weaken o .O.f = o .O.f ∘ tail O-weaken o .O.C = o .O.C O-weaken o .O.k = o .O.k O-weaken o .O.def (_ ∷ x) (_ ∷ eq) = o .O.def x eq

More interestingly, we can lift a given O' into a higher power, witnessing the fact that eg, something of $O(n^2)$ is also $O(n^3)$:

O-^-suc : {n : ℕ} → O' (_^ n) → O' (_^ suc n) O-^-suc o .O.f = o .O.f O-^-suc o .O.C = o .O.C O-^-suc o .O.k = suc (o .O.k) O-^-suc {n} o .O.def xs@(x ∷ []) ps@(s≤s px ∷ []) = begin f xs ≤⟨ def xs (≤-step px ∷ []) ⟩≤ C * (x ^ n) ≤⟨ *-monoˡ-≤ (x ^ n) (m≤m*n C (s≤s z≤n)) ⟩≤ (C * x) * (x ^ n) ≡⟨ *-assoc C x (x ^ n) ⟩≡ C * (x * (x ^ n)) ∎ where open O o open ≤-Reasoning

However, the challenge is and has always been to simplify the construction of Thunk bounds. Thus, we’d like the ability to remove low-order terms from Os. We can do this by eliminating $n^k$ whenever there is a $n^{k'}$ term around with $k \leq k'$:

postulate O-drop-low : {z x y k k' : ℕ} → k ≤ k' → O' (\n → z + x * n ^ k + y * n ^ k') → O' (\n → z + n ^ k')

The `z`

variable here lets us compose O-drop-low terms, by subsequently instantiating

As a special case, we can eliminate constant terms via O-drop-low by first expanding constant terms to be coefficients of $n^0$:

O-drop-1 : {x y k : ℕ} → O' (\n → x + y * n ^ k) → O' (\n → n ^ k) O-drop-1 {x} {y} {k} o rewrite sym (*-identityʳ x) = O-drop-low {0} {x} {y} {k = 0} {k} z≤n o

With these functions, we can now easily construct O' values for arbitrary one-variable functions:

_ : O' (_^ 1) _ = O-drop-1 {4} {5} {1} $ O-build $ hoist \n → 4 + 5 * n ^ 1 _ : O' (_^ 2) _ = O-drop-1 {4} {1} {2} $ O-drop-low {4} {5} {3} {1} {2} (s≤s z≤n) $ O-build $ hoist \n → 4 + 5 * n ^ 1 + 3 * n ^ 2

Finally, we just need to build a version of Thunk that is adequately lifted over the same functions we use for O:

abstract OThunk : {vars : ℕ} → (Vec ℕ vars → ℕ) → Set → Set OThunk _ a = a OThunk' : (ℕ → ℕ) → Set → Set OThunk' f = OThunk (hoist f)

The limit function can be used to lift a Thunk into an OThunk:

limit : {vars : ℕ} {f : Vec ℕ vars → ℕ} {a : Set} → (v : Vec ℕ vars) → (o : O f) → Thunk (o .O.f v) a → OThunk f a limit _ _ x = x

and we can now give an asymptotic bound over sort:

```
: O' (_^ 1)
o2 = O-drop-1 {1} {5} {1} $ O-build $ hoist \n -> 1 + 5 * n
o2
: Vec a n → OThunk' (_^ 1) (VecL a (4 * n) n)
linearHeadSort = n} v = limit (n ∷ []) o2 $ sort v linearHeadSort {n
```

I’m traveling right now, and ran out of internet on publication day, which means I don’t have a copy of the paper in front of me as I write this (foolish!) Overall, the paper is slightly interesting, though I don’t think there’s anything especially novel here. Sticking the runtime behavior into the type is pretty much babby’s first example of graded monads, and we don’t even get asymptotics out of it! Instead we need to push big polynomials around, and explicitly call wait to make different branches work out.

The O stuff I’ve presented here alleviates a few of those problems; as it allows us to relatively-easily throw away the polynomials and just work with the highest order terms. A probably better approach would be to throw away the functions, and use a canonical normalizing-form to express the asymptotes. Then we could define a $\lub$ operator over OThunks, and define:

`>>=_ : OThunk f a → (a → OThunk g b) → OThunk (f ⊔ g) b _`

to let us work compositionally in the land of big O.

My biggest takeaway here is that the techniques described in this paper are probably not powerful enough to be used in anger. Or, at least, not if you actually want to get any work done. Between the monads, polynomials, and waiting, the experience could use a lot of TLC.

]]>