April 29, 2024

The mountain of shit theory

Uriel Fanelli's blog in English

Fediverse

Fediverse and Threads, here we go again.

I've already said what I think about the history of large social networks that federate using activitypub, and I've already explained why the model wouldn't scale. The irritating thing is that in response people like Eugen Rochko (the programmer who writes Mastodon) don't try to prove that it scales, but just stick to details, like "but the interaction between instances is mainly based on the mechanism of the follow”.

Which contains some (but not all) truth, but the problem is that this is ONLY true on existing software, which Rochko believes to be only his own.

But things aren't like that, because Rochko pretends to forget other things.

1) No, on Threads and others the traffic does NOT depend only on the follow. Because the other systems are NOT Mastodon.

Let's take a large social network. Let's take a well-known person: Ferragni has around 20 million or so accounts that follow her. We would say the fediverse, followers. Let's imagine responding to Ferragni with a nice, or unpleasant, image and that someone responds by doing this:

In Fediverse terms, I sarawhatever posted a URL pointing to an image of me eating ice cream, and someone posted my image rendered inside another as a response. Let's forget about the fact that you had to download my image to use it in your image, but the problem is that if we talk about Ferragni, in a moment of absolute low tide she will only be seen fifty thousand times.

But it gets worse: if my image sparks a controversy, we easily go up to one million images downloaded. Why'? Because Meta's algorithm wants to maximize views, and therefore tends to emphasize controversy.

A single post submitted using activityPub (which sends images as links) is likely to give Rochko's instance the same traffic as if all its users had posted the same image and at least one person had viewed them.

Just one post.

If we consider that there are users who are famous singers or entire television stations that have MILLIONS of people who follow and subscribe to them, and then we read the comments, we can say one thing: that it is NOT true that traffic depends on the follow.

Or rather, we can say that it can scale in a more linear way with the number of follows, depending on the algorithm that is on the other side and which decides how much prominence to give to a post – or to a controversy.

If the algorithm decides that your post with your image will be seen 25 million times today, that's what your server will have to handle. Point.


The naive answer is that at this point your instance can always leave the images on S3, or another cloud. Certain. We did all this to decentralize and free ourselves from internet oligarchs, and now we're hosting content with an internet oligarch.

But the CDN discussion opens a second front. This blog, for example, is sometimes mentioned on Facebook and other social networks.

When it is mentioned, it receives many visits. What do I see, on my modest Grafana? I see this:

Because large systems are not ONE server or ONE IP address associated with a host, as happens in the fediverse. They are a MILLION headed hydra called CDN. If I'm in Düsseldorf, for example, and I'm on Facebook, I see a post as

https://scontent-dus1-1.xx.fbcdn.net/v/t39.2365-6/whatever

but if I observe the same thing from, I know, Rome, I read the post as coming from https://scontent-rome1-1.xx.fbcdn.net/v/t39.2365-6/whatever and in fact to have I have to aggregate the logs for “fbcdn.net” for the total data.

There are now two cases: the one in which my image (a copy) is uploaded to the Facebook CDN (and then it is like having a Facebook account in terms of privacy and profiling), or the one in which it is left where finds (for example, because it is old, or because it is not uploaded or read in Rome.

If it's about image, we can choose between the death of our server due to load, or for Facebook to take possession of it and then do its convenient pig, profiling our profile (remote for it) of the fediverse. But since we are not users, it does NOT have some privacy obligations, which only apply if a contract exists, i.e. we are users.

If, however, the CDN carries our post, or toot, which contains the image in the form of a link, we must prepare to be bombarded by GET requests whose Referrer is: the particular node of the CDN that is managing the request.

Either let Facebook take over the content and take care of delivery, allowing profiling and losing some data protections, or your server dies from the load as soon as someone follows Taylor Swift


3) Ah, yes. Did you say “links”?

Exact. Because when we go to a large social network and click on an "external" link, we are not pointing to the external link. We are aiming for a system that "does some checks" (in short, it takes information from the browser, profiles us carefully, takes every possible data) and then redirects us towards the linked content.

This requires that, in large systems, the "intermediate link" makes at least a HEAD towards the final content, to the extent that it loads the so-called SEO poster. This means that your server, depending on how many people read the thread, receives at least one call regardless of whether the link is followed.

So, if you post a reply message to Taylor Swift, which contains the SEO of the message, or just your avatar, everyone who scrolls that thread will load the avatar in parallel. Even if your link was NOT followed, on a single view at least the avatar would be loaded, causing a GET, or stored in some cache, and saved for… "good time".

In this case the links themselves, including the link to your post, produce traffic, and they do not produce it in proportion to the number of followers that Taylor has on YOUR instance, but in proportion to the followers that Taylor has on HIS "instance", since that your link, or at least the link to your post, is read by every Meta client, or moved to Meta (which is like having an account there in terms of privacy and profiling).

4) And here's the problem with reports.

Like dick on macaroni, comes the problem of moderation. If I post unwelcome content, it can be reported to the moderator. In the fediverse, the report on a certain instance is also forwarded to the moderator.

So far, on the fediverse things are moving at the rate of a report per two months. So manageable at a hobby level. But what happens if I send a flashy photo and receive 23,000 reports from a big player? It means I have to deal with closing 23,000 reports.

While on large systems I imagine that the reports are aggregated by content, since the form of the report on Mastodon is free, that is, I can make a report for "Offends the God of the Buttered Rabbit", if I want to be a good moderator I am forced to read them everyone, because someone might have a good legal reason to mention.

Moral of the story: sysadmin doesn't scale either.


You will ask yourself: but doesn't Eugen Rochko know these things? Of course he knows, but for some reason he prefers to downplay the problem or pretend it doesn't exist. He knows very well that if a Mastodon user decides to follow Taylor Swift, then for each post there is also a thread of replies to follow, and that if the user clicks to read the thread, the traffic of the Mastodon server depends on Taylor's success and not by the number of follows on that instance.

But he pretends not to know.

Pleroma has created a rate limiting system that I personally like, so I will be able to mitigate the problem. If pleroma fails, I'll put rate limiting on the reverse ingress proxy, and if it still doesn't work, the firewall will take care of it. And if this doesn't work either, the maximum traffic of my Docsis 3.1 connection will limit the traffic.

But the problem remains: if big players enter, and don't limit traffic, many small instances will have problems.

As to why Rochko pretends not to understand, I don't know.

Leave a Reply

Your email address will not be published. Required fields are marked *