April 29, 2024

The mountain of shit theory

Uriel Fanelli's blog in English

Fediverse

Fediverse and single user instances.

I am not sure I can say that an instance with 7 users like mine (including my cousin: don't tell me I hate women) is not a personal instance. However, the fedeverse has a number of instances with less than ten users and a lot with “just” one user. This situation is definitely the fedeverse's killer-app, but it runs into (in my opinion) a defect in software design.

The problem here is not the protocol, but the design of the software running the instance. Imagine that you are an instance with a single user, and you have just started on the fedeverse. Before you can interact with others, you have to federate.

Federate means , in current implementations, the following: once the user of my personal instance makes a follow to a user from another instance , say a big one with a million users, the instance with a million users will start sending toward the personal instance all message traffic marked as “public.” These messages will pile to build, in the personal instance, the “federated timeline.”

This is clearly an absurd design: many instances designed for a few users use a simple sqlite3 as their database. Which scales as best it can, but understand well that if we federate with instances of a million users, or even a hundred thousand, we will end up with a database that, however great sqlite3 is, will lose performance. And we will never, probably, read any of the messages.


Why is this design paramount?

In the pioneering period of the fediverse, and if we go back to the days of pump.io , or Ostatus, Diaspora or others, the problem was to “create a stream.” Users expected to spend time reading a “stream” of information, which in the early years of fedeverse could only happen at the cost of duplicating everything and sending a copy to as many servers as possible.

On the other hand, it was very difficult to find one's real friends on a little-used thing such as pump.io: as a result, everyone wanted to find new friends, otherwise the social network would lose its social component. The federated timeline, therefore, was mainly for finding new people to follow.

There were, I remember, some communities : they were generally squatters and “People's Front of Judea”-type associations, which few wanted to have anything to do with. As a result it was perfectly fine to have a continuous streamline of posts, to read about people who seemed interesting and pick them as “followed”.

This catastrophic design error, initially covering some necessity, has been preserved in today's instances, almost forgetting the initial mistake: what was intended was not to distribute messages. The need to be met was to distribute accounts, with people to meet.

If there had been a stream made up of new accounts, which each instance propagated along with the "bios" of new users, it would probably have solved the user problem (the problem of meeting new people) with much less resource deployment.

In modern days, it would be much better if the “federated timeline” was a stream of new users , or deleted users, or moved users, instances were sharing. The idea would be to read their “bio” and decide, so a new instance would be “federated” without being DDOS-ed of useless messages from random people all around the world.


This kind of goal analysis, though, is not very common. The modern programmer's motto is “cool before of important,” and so they don't ask what is important to implement : they ask what is “cool” to implement.

Surely a federated timeline where discussions of random people flowed was cool : in social networks a few years ago, it was called a “livestream”. It was cool. It is a useless (and often boring) stream of nonsense written by stupid people, but years ago it was cool . And because "cool before important," programmers jumped on it.

Actually, a good programmer might even refine the requirement: the person who starts an instance doesn't need to have a livestream, or even the full list of users of other instances. It needs a way to inform other instances of its existence, and to know about the active users of other instances. That way, the amount of data to move would be even less, with a little delta di align, from time to time.


But why is this important?

As I said, single-user instances, or those for a few users, are designed to have a small impact, that is, to consume few resources. Some, such as Pleroma or Akkoma or Misskey use a postgres database (again , 'cause “cool before of important”) and can hold up (by carefully setting an auto-vacuum and auto-analyze) but software running on sqlite3, for example , will degrade after a few GB of nonsense received from large instances.

In the design of modern instances, especially Mastodon, there are still plenty of these “skeletons” , ie, nonsensical habits and inventions from ancient times that continue to ruin a user experience that could even be better, if only we didn't insist on keeping these prehistoric features alive.

The activity pub protocol, after all, allows users and their bios to be exchanged. There is no reason, then, in not sending only active users (who want to be seen) to the federated servers. And there's no reason, on the other hand, to kill smaller instances with the workloads of the big servers.

In some software there is a solution. On Akkoma and Pleroma there is an option to put a huge instance in “followers only,” meaning you can limit exchanges to users who have been mutually subscribed. On Mastodon this is equivalent to the "Silence" option. , which achieves the same effect of limiting traffic , still allowing "user follow" between instances. Misskey can also achieve this effect, though undocumented in details, by using the federation page in the control panel.

Simpler software, such as Honk, or gotosocial or Ktistek, made for small instances, have no such defense tools, with the result that when they federate with a user of a huge instance then they begin to receive a flood of crap that the user, often a single one, is not interested in.


Another question that might come to mind is “but why are small instances good?”

The benefits are many. First, the possibility of much more granular moderation, and a better trust relationship with the moderator. (Of course if the instance is a single-user one, the user is also moderator. )

The second point is that even by scraping, collecting data from thousands of instances is not as easy as scraping on huge instances. And in this, the “federated timeline” seems purpose-built to allow NSA to accumulate data. It shouldn't even exist, in my opinion.

The third point is data security. If we imagine that the fediverse's five million users were divided into instances of 10 users on average, to get the data from five million users (an all-too-microscopic data breach) the hacker would have to attack five hundred thousand different servers, each with different defences, different architectures, different software, and different vulnerabilities. It would be a huge job to get an all-too-small amount of data.

In this sense, the “ideal fediverse” would have to be made up of small instances run by the geek, who then hosts friends, or the small company doing local bbs. It used to happen in the past.

Small instances, then, would create “in a sense” a better global situation: the reason why several instances are “designed to scale” is, again, the same “cool before of important” that drives (and plagues) today's software design.

On mainstream social networks, everything is measured by the number of users. The number of users, followers, and likes is the new SUV: size doesn't matter, sure, but if YOUR size doesn't matter you can either buy a big SUV or have lots of followers.

Following this thinking, all instances are designed spending coding time with scalability in mind, instead of spending time to give a few users a better experience, and especially a much easier setup of the whole thing.

If there was an easy way to host a node , say a little box to attach to one's home router to set up with a few clicks, IMHO the fediverse would expand more easily than the current situation. Of course, since the little box would be small, it would have to be designed for few users and low data retention. Which, from a mainstream point of view, is not cool enough.

So, again: selfhosting of small/personal instances is the best way to build the fediverse: the issue is bad development, because of bad developers, coding under the motto “cool before of important!”


Will this change in the future? Well, to be honest IT NEEDS to change.

We can see it , each time there is a wave of migrants from Twitter: the big instances can't hold the new load, and start begging users to go on some other place. The struggle is there already: system architecture is a harsh mistress, and there is no way to cheat.

So far, we see a small percentage of Twitter users which are moving to the fediverse, and yet, the issue of past architectural mistakes glows. Now, imagine 50% of Facebook migrating.

What would happen?

Well, it would happen the same of any architecture mistake I've seen in the last 28 years of IT: people working all around the clock, in order to fix past architecture debts.

Leave a Reply

Your email address will not be published. Required fields are marked *