April 28, 2024

The mountain of shit theory

Uriel Fanelli's blog in English

Kein Pfusch

Dear Google …

Dear Google ...

As I have already written, this blog barely keeps the classic webserver logs, without using complex analytics methods (I removed google analytics after the end of the experiment with porn, as you can verify), and I use these logs only to see if there are network and connectivity problems, since I use a home FTTH connection to run it.

The point is that, despite this, even a simple superficial examination of the logs reveals interesting things.

First things first: there's a lesbian girl who comes to this blog periodically. My advice is: use a better browser. You're leaving traces of your sexual preferences everywhere. I can clearly see your referers. My advice is: when you go around for personal matters, use some better browsers, like Brave or Tor Browser. Otherwise you can see clearly. You know who you are. Maybe not everyone should know, for example … me.

But that wasn't what I wanted to talk about. Because since last Sunday there are some records that have started to appear "in the charts", and since I don't keep the logs for more than 15 days, it immediately stood out in the charts. I refer to this:

Dear Google ...
I updated the pic to contain all the IPs of that subnet. Be patient, as I said, I don't have an analytics system.

IP addresses are those of google crawlers, bots that read pages, index them and put them in the search engine. As you can see, there is little traffic, in the sense that about seven thousand page readings are a small number out of the total of three days (the percentage indicates exactly this), but they are also very few data, which means that the answer is almost always a " you already have it in cache "(304 Not Modified).

So you say: hey, nothing strange happens: google is sucking the blog and puts it in the search engine. Really?

Already'.

But there's a problem.

To respect your privacy I can explain the problem only by exposing "aggregates" to which the GDPR does not apply. So now I will avoid showing any "talking data" graphs. And I will use antediluvian instruments: when you know what to look for, grep is fine.

Dear Google ...
This is not an aggregate, but only contains the google crawler.

As you can see, Google has obsessively taken the habit of always downloading the same article.

How obsessively? Well, let's see:

Dear Google ...
"Arrakis" is the name of the server that collects the logs and rotates them

In practice, he dedicated about 7 times more attention to the joker article than the rest of the blog . I can understand that Joker is a more popular topic than Brexit (seriously?) But this attention to a single article seems absurd to me: in fact, google is not "downloading" the article:

He's monitoring it.

Because if the problem was to download it, it does not need to do it 7000 times in three days: when it did once, enough and advances. But even admitting that the article changes, and that you want to download an "accelerated version for mobile", (amp) it seems to me that things don't come back.

So I thought that maybe that article is much more sought after and therefore google "downloads" it more often. Interesting, but things are not like this:

Dear Google ...

Joker comes in fifth place after the home page, again the home page, "those that log in without specifying any URL", the post on mechanical orange, just before the Brexit post.

Dear Google ...
Um …

This is very interesting. It is interesting because I expected to find jokers between the post on mechanical orange and the post on the brexit, if the frequency was due to popularity. On the contrary, one would say that google has downloaded the other two only a few dozen times (and here it is, since maybe I have corrected something and changed the content. 27 times? Mah.)

But in the case of the post on joker, it doesn't seem to me to have corrected it ~ 7000 times, and the orders of magnitude are not explained.

Now you will tell me: welcome to the beautiful world of SEO. Quite right. It all depends on the subject matter, the keywords, and all the good SEO paraphernalia.

But here comes the problem. First of all, not even the history of the SEO explains the need to monitor seven thousand times a page in three days. I understand if you find a similar proportion between the accesses due to google requests, but here we are talking about the crawler.

But even if it is true, and it was an SEO problem, the problem is even worse (also because Google received a fine for a similar issue).

The point is that if I try to imagine what words have made the article popular on Joker more than that of "Arancia Meccanica", we get that from the point of view of Google and from the point of view of the American public Joker is much more ' "Interesting". ("Interesting" contains all the meanings that SEOs give to the term.).

The problem is simple: if I advertised and advertised things on the "Arancia Meccanica" page, I would expect more attention from google than when I put it on "Joker". And this is because MY traffic and MY audience are reading that page more. But when Google decides to give priority to another post, it becomes very different. It becomes different because in a post on the Italian left I could advertise Italian companies, while on an American film post I would put American ones.

But this preference in indexing, in fact, distorts the reality: if the interest of google is a symptom of better indexing (which in the end impacts on all the activity of SEO), then it is penalizing non-content Americans. With the relative advantage for American companies.

Now, since I don't advertise on my pages (I did it for the experiment on porn, but I already have the numbers I need, so I quit), the thing concerns me little. Very little.

But if I were a national newspaper and discovered that a page on Joker is being pushed more than a page on the new government, when the Italians looked more at the new government, and therefore at the Italian-oriented advertising, this thing would be worrying. Very.

In a post on the brexit, for example, I could have put up advertisements of British companies that want to sell in Italy, but apparently Google does not consider MY traffic (which also knows why I had google analytics when I put it online), but makes his considerations.

Let's put it this way:

if someone did, using systematic methods, similar statistics on a larger site (such as that of the EU commission), and obtained unexplained discrepancies like these, it would be unlikely that google could defend itself from the accusation of discriminating against non-American content, relative advertising, and related companies.

For me it's not a problem: I just used ipfilter to block Google crawlers. After the experiment with porn, I don't need it anymore, and the traffic that Google sends me is very small, all in all insignificant.

But if some big newspaper were to notice such things, with discrepancies like "27/7000", well, the conversation would become different. The Bertelsmann / Alex Sprenger of the situation, with the same numbers, quietly brings you to the attention of the European Commission.

But I repeat: here we are not talking about the traffic that google sends me: we are talking about the obsessive attention with which he downloaded the same page twice a minute for days. This monitoring of a content is, in my opinion, inexplicable.

And if someone accused Google of being a tool for monitoring pages "politically uncomfortable for the US", it would be another case in which Google would not be able to defend itself very well.

Just saying. That we all love you, dear Google, but BDSM is a different thing, and requires that everyone be consenting.

links

Leave a Reply

Your email address will not be published. Required fields are marked *