May 5, 2024

The mountain of shit theory

Uriel Fanelli's blog in English

Fediverse

Of annoying parastatistics.

I don't know if you noticed, but science faculties don't use the word "science" in their name. Physics is called the "Faculty of Physics", not "Physical Science". The faculty of chemistry is called "Faculty of Chemistry", not "of chemical science". Engineering perhaps had the word "engineering science" only in some ancient faculties, but now they have abolished it.

Strangely, the less scientific faculties take the word "science". Communication sciences, social sciences, educational sciences, etc.

Perhaps this is what inspires them to use methods that seem scientific, or that come from science, with results that are starting to get embarrassing, especially when it comes to statistics.

Let me be clear, the techniques in use are correct: the problem is that a very experienced carpenter, applying very correct techniques, has little chance of repairing a computer motherboard. A symptom of this is the use, if not the abuse, of normalization and correlation in multifactorial problems.

The most boring is certainly the data on the "gender gap".

So, we're talking about income. What are the factors that contribute to the formation of income? According to those who talk about the gender gap, the main factor is gender, since their "measure" ONLY makes sense in this case.

The problem is that, in Italy, the formation of income depends, in descending order, on the following factors:

  • Schooling (the simple qualification, in short)
  • Specialization (in which sector you work)
  • Geographical location (salaries are higher in the north)
  • Family of origin (parents forerunners, in short)
  • Gender
  • Health (disabled people have problems)
  • Physical aspect.

This sort of hierarchy does nothing but calculate, ultimately, how important a certain factor is. And at that point, a decision tree, or a dichotomizer, or ID5 is built, and interesting considerations can be made. This does NOT mean that gender does not matter: but it is not a predictor.

Obviously now the learned genius arrives and tells me “but we have done the math on an equal footing with all the other factors”, a bovine way of saying “we have normalized”. And let's be clear: normalizing and seasonally adjusting are legal and permitted procedures. But if you do it in a multifactorial problem, on factors that weigh more than yours, there is a small problem of a scientific nature. What does it mean?

It means that if I tell you:

  • X has a doctorate in economics, is involved in high finance, lives in the North, comes from a wealthy family
  • Y is in fifth grade, works in the field of domestic cleaning, lives in the South, comes from a poor family

you can "predict" that X's income is higher. If instead I tell you:

  • H is a woman.
  • K is a man.

with just these two data, you can't tell me who is richer of the two.

What happens then? It happens that the heavier factors are much better predictors. And therefore, even if you normalize (bravo! Bravo!), You only get that your prevalence does NOT tell you what you want to say: to say what you want to say, it would take that gender was the FIRST among the predictors of income.

And not only do this: you are denying yourself the possibility of imagining the solutions. If you do the analysis and come to the conclusion that women earn less (but you have normalized! Forge of heroes!), The only thing you can do is a law that raises wages. But you have not even touched the problem: that hierarchy of predictors tells me that the solution consists not only in educating women, but in specializing women: they must study, and they must study the right things that lead to high salaries.

Am I saying that there is no gender gap? No. I'm saying that it mainly depends on other factors, which in some way have a relationship with gender.

It is, in fact, a multifactorial question.


Another analysis of this type concerns the imaginary problem of transsexual people in sport. I'm talking about an imaginary problem because, if we take the number of Olympic champions who are transsexuals, the answer is that it happens ZERO times. If we go to the national champions who are transsexual, we still find ZERO on the podium.

A phenomenon that does not happen is not a phenomenon. It is a fantasy.

But let's also assume that the world is full of transsexuals who want to play against women. What are the predictors of success in sports? It depends on the sport.

In some sports, height is much more relevant than gender. Basketball. In some sports, being African in some areas is an advantage, because the legs are on average longer. In other Olympic sports, such as figure skating, gender is almost irrelevant to performance.

If we reduce everything to one factor and make sport something mono-factorial, can we say that transsexuals always have an advantage? Uhm ..

Of annoying parastatistics.
This transsexual person (Alex Tilinca), at birth, was classified as "female"

So, would you have Alex Tilinca compete against women because she has a vagina and XX chromosomes? The answer is obvious: it depends. It would have an advantage in throwing the puck, probably, but from what I can see from the other photos it is too low for basketball. I can't tell you how he is on skates, but for sure he would have an advantage in the Greco-Roman struggle.

But if you think not and ask yourself "why not" the answer is that you consider the most important muscles of sex, in the possibility of winning a competition. If you thought biological sex was important, you wouldn't ask Alex to compete with men.

The fact that there are more important physical characteristics of this kind in the Olympic world means that we are asking ourselves the wrong problem. Instead of male / female, we should use a muscle strength index to categorize.

And … we already talked about the age factor, right?

I mean, in the end we separate men and women because we think men have an advantage, but we don't chase away those who show up at a high-leg obstacle course race, as African athletes do: leg length. proportion to the body weighs more than that, or in any case it weighs: do we want to divide Africans from others? Will we do a championship just for the Chinese? Yet the average difference in performance between Africans and Chinese in running is even greater than that between males and females.


This tendency to take multifactorial problems and reduce them to a single factor is boring. It just bores me to see the same mistake always made.

Everything that depends on several factors CANNOT be analyzed by reducing it to just one. Not even if normalized. There are decision trees, there is clustering, and many other things.

Point.

Leave a Reply

Your email address will not be published. Required fields are marked *