Why Is the Web So Monotonous? Google.

Does it ever feel like the internet is getting worse? That’s been my impression for the last decade. The internet feels now like it consists of ten big sites, plus fifty auxiliary sites that come up whenever you search for something outside of the everyday ten. It feels like it’s harder to find amateur opinions on matters, except if you look on social media, where amateur opinions are shared, unsolicited, with much more enthusiasm than they deserve. The accessibility of the top ten seems like it collapses the internet into a monoculture of extremism, and, perhaps even more disappointingly, a monoculture that echos the offline world.

Contrast this to the internet of yore. By virtue of being hard to access, the internet filtered away the mass appeal it has today. It was hard and expensive to get on, and in the absence of authoring tools, you were only creating internet content if you had something to say. Which meant that, as a consumer, if you found something, you had good reason to believe it was well-informed. Why would someone go through the hassle of making a website about something they weren’t interested in?

In 2022, we have a resoundingly sad answer to that question: advertising. The primary purpose of the web today is “engagement,” which is Silicon Valley jargon for “how many ads can we push through someone’s optical nerve?” Under the purview of engagement, it makes sense to publish webpages on every topic imaginable, regardless of whether or not you know what you’re talking about. In fact, engagement goes up if you don’t know what you’re talking about; your poor reader might mistakenly believe that they’ll find the answer they’re looking for elsewhere on your site. That’s twice the advertising revenue, baby!

But the spirit of the early web isn’t gone: the bookmarks I’ve kept these long decades mostly still work, and many of them still receive new content. There’s still weird, amateur, passion-project stuff out there. It’s just hard to find. Which brings us to our main topic: search.

Google is inarguably the front page of the internet. Maybe you already know where your next destination is, in which case you probably search for the website on Google and click on the first link, rather than typing in the address yourself. Or maybe you don’t already know your destination, and you search for it. Either way, you hit Google first.

When I say the internet is getting worse, what I really mean is that the Google search results are significantly less helpful than they used to be. This requires some qualification. Google has gotten exceedingly good at organizing everyday life. It reliably gets me news, recipes, bus schedules, tickets for local events, sports scores, simple facts, popular culture, official regulations, and access to businesses. It’s essentially the yellow pages and the newspaper put together. For queries like this, which are probably 95% of Googles traffic, Google does an excellent job.

The difficulties come in for that other 5%, the so-called “long tail.” The long tail is all those other things we want to know about. Things without well-established, factual answers. Opinions. Abstract ideas. Technical information. If you’re cynical, perhaps it’s all the stuff that doesn’t have wide-enough appeal to drive engagement. Whatever the reason, the long tail is the stuff that’s hard to find on the modern internet.

Notice that the long-tail is exactly the stuff we need search for. Mass-appeal queries are, almost by definition, not particularly hard to find. If I need a bus schedule, I know to talk to my local transit authority. If I’m looking to keep up with the Kardashians, I’m not going to have any problems (at least, no search problems.) On the other hand, it’s much less clear where to get information on why my phone starts overheating when I open the chess app.

So what happens if you search for the long tail on Google? If you’re like me, you flail around for ten minutes wasting your time reading crap articles before you remember that Google is awful for the long tail, and you come away significantly more frustrated, not having found what you were looking for in the first place.

Lets look at some examples. One of my favorite places in the world is Koh Lanta, Thailand. When traveling, I’m always on the lookout for places that give off the Koh Lanta vibe. What does that mean? Hard to say, exactly, but having tourist amenities without being touristy. Charming, slow, cheap. I don’t know exactly; if I did, it’d be easier to find. Anyway, forgetting that Google is bad at long tails, I search for what is the koh lanta of croatia? and get:

  • Koh-Lanta - Wikipedia [note: not the island, the game show]
  • Top 15 Unique Things to Do in Koh Lanta
  • Visit Koh Lanta on a trip to Thailand
  • Beautiful places to travel, Koh lanta, Sunset
  • Holiday Vacation to Koh Lanta: Our favourite beaches and …
  • Koh Lanta Activities: 20 Best Things to Do
  • etc

With the exception of “find a flight from Dubrovnik to Koh Lanta” on page two, you need to get to page five before you see any results that even acknowledge I also searched for croatia. Not very impressive.

When you start paying attention, you’ll notice it on almost every search — Google isn’t actually giving you answers to things you searched for. Now, maybe the reason here is that there aren’t any good results for the query, but that’s a valuable thing to know as well. Don’t just hit me with garbage, it’s an insult to my intelligence and time.

Where Things Go Wrong🔗

I wanted to figure out why exactly the internet is getting worse. What’s going on with Google’s algorithm that leads to such a monotonous, boring, corporate internet landscape? I thought I’d dig into search engine optimization (SEO) — essentially, techniques that improve a website’s ranking in Google searches. I’d always thought SEO was better at selling itself than it was at improving search results, but my god was I wrong.

SEO techniques are extremely potent, and their widespread adoption is what’s wrong with the modern web.

For example, have you ever noticed that the main content of most websites is something like 70% down the page? Every recipe site I’ve ever seen is like this — nobody cares about how this recipe was originally your great-grandmother’s. Just tell us what’s in it. Why is this so prevalent on the web?

Google rewards a website for how long a user stays on it, with the reasoning being that a bad website has the user immediately hit the back button. Seems reasonable, until you notice the problem of incentives here. Websites aren’t being rewarded for having good content under this scheme, they’re rewarded for wasting your time and making information hard to find. Outcome: websites that answer questions, but hide the information somewhere on a giant (ad-filled) page.

Relatedly, have you noticed how every website begins with a stupid paragraph overviewing the thing you’re searching for? It’s always followed by a stupid paragraph describing why you should care about the thing. For example, I just searched for garden irrigation, and the first result is:

Water is vital to plant health, but watering by hand can be a hassle. You have to drag hoses between gardens, move sprinklers around, or take the time to water each plant. Our innovative watering systems take the hassle out of watering. They’re the easiest way to give plants the consistent moisture they need for your biggest harvest and most beautiful blooms.

Water is vital to plant health. Wow, who knew! Why in god’s name would I be searching for garden irrigation if I didn’t know that water was vital to plant health. Why is copy like this so prevalent on the web?

Things become clearer when you look at some of the context of this page:

Url: https://[redacted]/how-to/how-to-choose-a-watering-system/8747.html

Title: How to Choose a Garden Irrigation System

Heading: Soak, Drip or Spray: Which is right for you?

Subheading: Choose the best of our easy, customizable, irrigation systems to help your plants thrive and save water

As it happens, Google rewards websites which use keywords in their url, title, headings, and first 100 words. Just by eyeballing, we can see that this particular website is targeting the keywords “water”, “system”, “irrigation”, and “garden”. Pages like these hyper-optimized to come up for particular searches. The stupid expository stuff exists only to pack “important keywords” into the first 100 words.

But keyword targeting doesn’t stop there. As I was reading through this SEO stuff (that is, the first page of a Google search for seo tricks,) every single page offered 15-25 great, technical SEO tricks. And then, without fail, the final point on each page was “but really, the best SEO strategy is having great content!” That’s weird. “Great content” isn’t something an algorithm can identify; if it were, you wouldn’t be currently reading the ravings of a madman, angry about the state of the internet.

So, why do all of these highly-optimized SEO pages ubiquitously break form, switching from concrete techniques to platitudes? You guessed it, it’s a SEO technique! Google offers a keyword dashboard, where you can see which keywords group together, and (shudder) which keywords are trending. Google rewards you for having other keywords in the group on your page. And it extra rewards you for having trending keywords. You will not be surprised to learn that “quality content” is a keyword that clusters with “seo,” nor that it is currently a trending keyword.

Think about that for a moment. Under this policy, Google is incentivizing pages to become less focused, by adding text that is only tangentially related. But, how do related keywords come about? The only possible answer here is to find keywords that often cluster on other pages. But this is a classic death spiral, pulling every page in a topic to have the same content.

Another way of looking at it is that if you are being incentivized, you are being disincentivized. Webpages are being penalized for including original information, because original information can’t possibly be in the keyword cluster.

There are a multitude of perverse incentives from Google, but I’ll mention only two more. The first is that websites are penalized for having low-ranking pages. The conventional advice here is to delete “underperforming” pages, which only makes the search problem worse — sites are being rewarded for deleting pages that don’t align with the current search algorithm.

My last point: websites are penalized for even linking to low-ranking pages!

It’s not hard to put all of the pieces together and see why the modern web is so bland and monotonous. Not only is the front-page of the internet aggressively penalizing websites which aren’t bland and monotonous, it’s also punishing any site which has the audacity to link to more interesting parts of the web.

How Culpable is Google?🔗

So the discoverable part of web sucks. But is that really Google’s fault? I’d argue no. By virtue of being the front-page, Google’s search results are under extreme scrutiny. In the eyes of the non-technical population, especially the older generations, the internet and Google are synonymous. The fact is that Google gets unfairly targeted by legislation because it’s a big, powerful tech company, and we as a society are uncomfortable with that.

Worse, the guys doing the regulation don’t exactly have a grasp on how internet things work.

Society at large has been getting very worried about disinformation. Who’s problem is that? Google’s — duh. Google is how we get information on the internet, so it’s up to them to defend us from disinformation.

Unfortunately it’s really hard to spot disinformation. Sometimes even the government lies to us (gasp!). I can think of two ways of avoiding getting in trouble with respect to disinformation. One: link only to official sites, thus changing the problem of trustworthiness to one of authority. If there is no authority, just give back the consensus. Two: don’t return any information whatsoever.

Google’s current strategy seems to be somewhere between one and two. For example, we can try a controversialish search like long covid doesn't exist. The top results at time of writing are:

  1. The search for Long Covid (science.org)
  2. Small Study Finds No Obvious Physical Causes for Long COVID (medscape.com)
  3. Fact Check-‘Long COVID’ is not fake, quoted French study did … (reuters.com)
  4. Harvard Medical School expert explains ‘long COVID’ (harvard.edu)
  5. Claim that French study showed long COVID doesn’t exist … (healthfeedback.org)
  6. What doctors wish patients knew about long COVID (ama-assn.org)

I’m not particularly in the know, but I recognize most of these organizations. Science.org sounds official. Not only is one of the pages from Harvard, but also it’s from a Harvard Medical School expert. I especially like the fifth one, the metadata says:

Claim: Long COVID is “mostly a mental disease”; the condition long COVID is solely due to a person’s belief, not actual disease; long COVID doesn’t exist

Fact check by Health Feedback: Inaccurate

Every one of these websites comes off as authoritative — not in sense of “knowing what they’re talking about” because that’s hard to verify — but in the sense of being the sort of organization we’d trust to answer this question for us. Or, in the case of number five, at least telling us that they fact checked it.

Let’s try a search for something requiring less authority, like “best books.” In the past I would get a list of books considered the best. But now I get:

  1. The Greatest Books: The Best Books of All Time - 1 to 50
  2. The Best Books of All Time | chapters.indigo.ca
  3. 100 Best Books of All Time - Reader’s Digest
  4. Best Book Lists - Goodreads
  5. Best Books 2022: Books We Love : NPR

You’ll notice there are no actual books here. There are only lists of best books. Cynical me notes that if you were to actually list a book, someone could find it controversial. Instead, you can link to institutional websites, and let them take the controversy for their picks.

This isn’t the way the web needs to be. Google could just as well given me personal blogs of people talking about long covid and their favorite books, except (says cynical me) that these aren’t authoritative sources, and thus, linking to them could be considered endorsement. And the web is too big and too fast moving to risk linking to anything that hasn’t been vetted in advance. It’s just too easy to accidentally give a good result to a controversial topic, and have the law makers pounce on you. Instead, punt the problem back to authorities.

The web promised us a democratic, decentralized public forum, and all we got was the stinking yellow pages in digital format. I hope the crypto people can learn a lesson here.

Anyway, all of this is to say that I think lawmakers and liability concerns are the real reason the web sucks. All things being equal, Google would like to give us good results, but it prefers making boatloads of money, and that would be hard to do if it got regulated into nothingness.

A Note on Other Search Engines🔗

Google isn’t the only search engine around. There are others, but it’s fascinating that none of them compete on the basis of providing better results. DDG claims to have better privacy. Ecosia claims to plant trees. Bing exists to keep Microsoft relevant post-2010, and for some reason, ranks websites for being highly-shared on social media (again, things that are, by definition, not hard to find.)

Why don’t other search engines compete on search results? It can’t be hard to do better than Google for the long tail.

What Can We Do?🔗

It’s interesting to note that the problems of regulatory-fear and SEO-capture are functions of Google’s cultural significance. If Google were smaller or less important, there’d be significantly less negative-optimization pressure on it. Google is a victim of its own success.

That is to say, I don’t think all search engines are doomed to fail in the same way that Google has. A small search engine doesn’t need to be authoritative, because nobody is paying attention to it. And it doesn’t have to worry about SEO for the same reason — there’s no money to be made in manipulating its results.

What I dream of is Google circa 2006. A time where a search engine searched what you asked for. A time before aggressive SEO. A time before social media, when the only people on the internet had a reason to be there. A time before sticky headers and full-screen modal pop-ups asking you to subscribe to a newsletter before reading the article. A time before click-bait and subscription-only websites which tease you with a paragraph before blurring out the rest of the content.

These problems are all solvable with by a search engine. But that search engine isn’t going to be Google. Let’s de-rank awful sites, and boost personal blogs of people with interesting things to say. Let’s de-rank any website that contains ads. Let’s not index any click-bait websites, which unfortunately in 2022 includes most of the news.

What we need is a search engine, by the people, and for the people. Fuck the corporate interests and the regulatory bullshit. None of this is hard to do. It just requires someone to get started.