April 16, 2024

The mountain of shit theory

Uriel Fanelli's blog in English

Fediverse

The famous app that tracks your virus. With graphene. Quantum. Of the AI.

Since the beginning of the pandemic of bad health , there has been at least a hairy use of the issue of tracing people as a preventive method of the epidemic. And everyone, but everyone, has spoken about it, except the technologists involved. It is normal for technologists to be silenced in a post-fascist world where people still believe in Gentile and Croce's lies, but since hushing up is not easy, now I have my say.

Good. Suppose we do it. Tomorrow they call me and tell me "make the app that tracks users 24/7, so that when it happens that one gets sick we go to quarantine the others". Aha. To say it, it seems simple. Let's go to breakdown of this "specification".

So, let's talk about capacity, otherwise it seems that we work for INPS. How many of these apps do we want to have? The South Koreans , the good ones, say they did it on the spot , because they don't worry about privacy and are good citizens who allow themselves to be spied on.

Ok. But not everyone has a cell phone. Or rather: Italy was the country of records, which first reached a number of SIMs greater than the number of inhabitants. But I want to help those who claim it is feasible, and so we remove the elderly, we remove the stoners like Serra who boast of not having a cell phone, and let's say that there are 30 million. Half the population. (Going under it makes little sense: if A infects B, or we are aware of A, or B, or we know nothing at all).

Good.

It is about collecting a logpoint with the location of the mobile phone. How often should we take the position? Well, if we think that we send people to work and people take the metro to Milan, let's take a minute. (obviously we have a perfect GPS that also works indoors). Otherwise one climbs the escalator, and I lose everyone close to him.

So if we send the logpoint immediately, that's 30 million logpoints per minute, which makes 500,000 transactions per second. Our backend gateway API should be a whole lot better sized than the INPS server. But let's say that the data is sent EVERY HOUR: we accumulate it on the mobile phone and send it every hour. Aha. We are at 8334 TPS. More sustainable. But the amount of logpoints and the I / O change little. Storage.

How long do we have to keep the data, that is, how long do we have to monitor a person? We said, if I'm not mistaken, 12 days of "intact asymptomaticity". We begin to have many logpoints, 518.400.000.000. It is not impossible, because we are lovers of Teradata, Cloudera & co. If we use the instructions of the Tuscan ars there are 14, but the amount of data changes little

To buy the hardware and to mount something from scratch we dream of it, because to have the hardware, to mount it, to have energy, UPS, network, cooling and everything takes weeks, if not months. Having abandoned the green field option then we must move on to using something existing.

So, we have to ask who we sell the data to. Let's also say that a local telco has all the hardware you need and we rent it to them. Good. Otherwise we will have to contact an Azure tenant, Amazon, Google, etc.

Very well. Now that we know where to put it, we need to write the logic of this. So our job should:

  • For a person who got sick (IMSI, MSISDN, whatever).
  • Go back and look for everyone he's been in contact with .

First question: what is "contact".

We always use the definition of Tuscan ARS.

If we take all those who have passed less than two meters for 15 minutes (suppose that the GPS also takes by metro, indoors at work, and practically everywhere. And that has the precision of a meter. Kill that satellite), if this is 'entered the subway for 12 days in a row (let's say 10 because there is a weekend in between), we are screwed. And if we find someone who has gone to work, will his company be stopped? It is to be considered "contact": according to the definition, he was in a closed environment with him, says ARS tuscany. What begins to be complicated, huh?

Or rather, no: if we are at the beginning of the infection, and we have about ten cases, it is done quickly. If we are in Italy and we already have more than one hundred thousand, and we send them to work, the public administration would have had millions and millions of people to put in preventive quarantine.

He was supposed to receive a list of 2 or 3 million people and then reach them all and tell them to quarantine. Alternatively, they should swab. Aha. Too bad that 2 or 3 million tampons are not there: quarantine and that's it.

So ok, let's add an alarm to our APP, which tells you "you have been selected among the suspected infections, because you have crossed paths with one who had the infection. Get home and don't you dare go out. " And since there are not enough tampons, he must stay home and hope.

So we are writing an app that, in case a guy X is sick, closes the company he worked for, alerts everyone who lives with him and everyone who has traveled with him.

I'm not a sociologist or psychologist, so I don't want to know what these millions of people who get this warning would do. So let's focus on this algorithm:

  • For each infected (IMSI, MSISDN, whatever)
  • Look for all those who have been in contact with him in the past 12 days, less than two meters and for more than 15 minutes, OR those who have been in a closed environment with him for longer time even if more than two meters).

But now we have a problem: 12, 14 days of time. One guy has infected another, which after a couple of days becomes infectious. So, if 4 days have passed, we must have another step: not only everyone who met Tizio, but everyone who met those who met guy.

If we are at the beginning of the pandemic, and we have dozens of cases, with a few thousand voluntary quarantines we are in place. If we are already at the point of having one hundred thousand, in doing so we obtain an average exponential trend. This should not surprise us, because we have not thought of a very simple thing: WE ARE WRITING A SIMULATION OF THE PANDEMIA. And if the contagion proceeds exponentially, the algorithm pulls out an exponentially increasing number of names.

Good. So let's give the specs to our spark experts, and we have something like this.

  • For each infected (IMSI, MSISDN, whatever)
  • Look for all those who have been in contact with him in the past 12 days, less than two meters and for more than 15 minutes, OR those who have been in a closed environment with him for longer time even if more than two meters).
  • For each suspect, calculate how long ago the contact occurred, divide it by two days obtaining a recursion factor n.
  • Iterate the algorithm n times for each level n suspect.

Now, how does this stuff grow? What is computational complexity? Well, it will depend on the average number of people met and the average time that has passed, and blablabla. But why we are calculating it: we already have the formula, and it is the usual exponential with an R0 of 2.5. We are writing a simulation of the epidemic, right? So we already know how this number will grow. We don't need evidence where you just have to think .

So, the numbers of our simulation grow AS the numbers of the epidemic. So, if we are like in Korea and we are at the beginning, there is no problem, we stop the epidemic when we are a few hundred, perhaps thousands of cases.

But if we are already in the current situation, with hundreds of thousands of infected people, also entrusting the APP with the task of alerting infected people to stay at home in quarantine (and therefore DO NOT go out even to shop or go to work ), so as not to flood the public administration, in the course of 12, 14 days we would alert millions of people .

And they are not just anybody: it would be mainly those who go to work, and those who use public transport on a journey longer than 15 minutes. So basically we would hit urban means of transport, the homes of very large families and workplaces.

And here is that our algorithm becomes unnecessarily complex, expensive, and makes a disproportionate use of personal data.

Why am I saying unnecessarily burdensome? Because with so many infected, to do the same thing it would be possible to calculate, without retaining user data, the places where the infections occur, and close all the places of the same type.

But we wanted to mitigate the damage, right? Sure. And it is possible, for a few sick people, to do it. But when you have a hundred thousand sick people around, tracing for large numbers does not make sense: it pays more to find the places where these contacts take place, and to close them in advance.

Only in the case of very few cases does such an app make sense.

And even in this case, now there is a small problem. Because if there are a few dozen cases, we have a low prevalence. And to stop a low prevalence, 30 million mobile phones are NOT enough. Just use the sample estimate formula to understand that if we want to intercept 10,100 cases out of 60 million, the "sample" of our estimate becomes catastrophically high, and in practice we have to track ALL, ALL the time.

Let's recap: for the strategy to work and we can keep life as it is, stopping ONLY the sick, it is necessary that:

  • Get to the start of the pandemic.
  • ALL citizens are tracked, not just half or most.
  • People also isolate themselves at home (but if they are few we take them to the hospital, it makes sense).

So it works in the case of South Korea. Who had probably already built the system BEFORE the infection started: to program things, create API gateways, etc., you need, having heroic programmers and scalable infrastructure and storage done, a couple of weeks, maybe three. Koreans probably went to work as soon as the Chinese reported the problem.

If by chance you arrive in Europe TODAY, you cannot do the above, because we arrive late, and then there is a far more effective algorithm, that is, a "particle" method similar to the one used for the calculation of fluids.

  • A lower number of cell phones are used.
  • It is observed in which places the patients had suspicious contacts.
  • The places in question are closed, and all the places of the same type. (or discipline them as you do with supermarkets)

This method is computationally simpler, requires fewer tracked people, does NOT require personal data (we are identifying places) and especially reports infected personnel very well in hospitals, infected retirement homes, and other outbreaks.

The moral of the story is very simple: when you arrive late with one hundred thousand and more infected, the method of tracking people is more complex than the method of identifying the most dangerous places .

The disadvantage is that the REAL problems emerge: hospitals with nurses, rest homes used as storage, urban means of transport (used to go to work), and many but many companies.

Exactly what you DO NOT want to know. It would be useful to know, but we prefer to use an approach that is now too late to use effectively .

I am not a fanatic of conspiracy theories, but proposing tracing NOW is only, in my opinion, an attempt to change people's attitude towards tracing. " Being tracked today is good, " they tell us.

Leave a Reply

Your email address will not be published. Required fields are marked *