April 29, 2024

The mountain of shit theory

Uriel Fanelli's blog in English

Fediverse

School, GDPR, cloud & co.

School, GDPR, cloud & co.

When the GDPR arrived, I and other colleagues were building Hadoop storage sized for 400 million users, and so it was about an asteroid. Suddenly I found myself with a pool of lawyers from the client company (the legal department), who understood the regulation well, but did not understand the meaning of the technicalities. On the other hand, I and the other architects (networks, storage, fabric & co) understood the technicalities but not the legalese.

My company gave me the GDPR crash course, and the fair started.

In general, the lawyers understood all the legal part well, I said, but they did not understand the technicalities: they often said "in that case we will delete the record", believing it was like a database, but they did not know that if you start making partitions of 64GB each , for example, you cannot delete a record. You must, depending on the format (AVRO, PARQUET, etc) write a job that copies everything to another block, except that record. Impracticable.

Also the fact that depending on the format (AVRO yes, PARQUET no, etc) it is not possible to easily change the structure (the fields) was an unknown thing for them. THE difference between pseudonymous, anonymous and encrypted data, as well as the difference between a homeomorphic and a non-homeomorphic encryption was difficult for them to grasp.

So they were weeks of "fire", if we exclude the amount of g-knuckle typical of the legal teams of any company. Well yes', those who quote quote, those who do not share if squote. (LOL)

Now, if I try to imagine applying such a process to schools using foreign clouds, several things come to mind. For example: the GDPR Article 5 (1) ©, says "personal data shall be adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed".

Now, the problem here is understanding what "adequate" means, what "relevant" means and what "necessary" means, and what "purposes" means: because Legalese is a language, and not necessarily a simple language.

Let's take a step back: “Privacy by design”. This concept, central to the GDPR, says that it is necessary that privacy is implemented in the design of the service and data management. And it cannot be derogated from the google of the situation or to Microsoft, because the data makes a long tour, which involves both the "data processor" and the "data owner", and the "data controller", (and on the data owner c 'and' a certain judicial ambiguity '), and therefore the school should also build the documentation of Privacy by design.

First of all, as the school also handles this data and gives us the report cards, google apparently becomes a data controller while the school is a data processor, for example, regarding the school process. In reality, it would be more likely to say that google and the school are in a joint controller situation, also because some data (for example the IP from which a student connects) is not entered by the school. Link

Having established who does what, in both cases the school must make a Privacy By Design document, which can be inspected. Google may have done it too (since it offers the business platform to schools, at advantageous economic conditions), but the two are separate and neither of them replaces the other.

What do we expect from this document (s) that the school makes?

  • what are the "necessary" uses (eg: serves to make report cards. It serves to identify the user (eg, name) etc.
  • what are the "purposes", that is what those data are for. For example, justifying videoconferencing from a home room could become problematic.
  • which data are “necessary”, which are “relevant”: for example, keeping the children's cameras on could be “relevant” but not necessary, and so on.
  • in case the lessons are recorded, for example, it is necessary to understand who can save the information, for what it is necessary, relevant, and so on.

Good.

At that point, all the data that the school accesses through the system will be taken (start time, end time of the lesson, number and name of those present, name, surname, documents sent / attached, videoconference, phonoconference, state of health, etc. ), you make a nice table and indicate a whole series of details:

  • where they are saved. (if the school has a hard copy). Who is responsible for it.
  • if they are necessary, relevant, unnecessary
  • if they are encrypted, pseudonymised, in clear text.
  • if they are PII (the starting time of the lesson is a PII? for the teacher, because he is present at work, for the students… boh. The absences register maybe).
  • how long they will be held.

filling in these fields is not as simple as it seems. To identify the user it might seem natural to use "name, surname", but in reality also Surname, Initial of the name could go, and the idea of ​​using only the surname would also be questionable.

This part is quite delicate: for example, an e-mail system can transfer anything. Let's take an example. I write from home that I will be absent because I have an influence on my team assistant, attaching the certificate and copying the krankenkasse and my boss. Sounds sensible, but it's not:

  • my boss just needs to know I'm absent, not my health.
  • my krankenkasse doesn't have to know who my boss is or who the team assistant is.
  • my team assistant does not have to know what my krankenkasse is because I am a voluntary taxpayer (ok, it's a German thing. Beyond a certain income you are not a taxpayer like the others, but you can also choose not to pay the krankenkasse, only you pay for everything out of your own pocket).

So I will tell the boss and colleagues that I am "absent", I will send the note for the absence and the medical certificate to the team assistant, to the krankenkasse idem, but without ever mentioning everyone in the same email. Otherwise:

  • my boss will delete the email and ask the admin to remove it from exchange.
  • the krankenkasse will delete the email and won't tell me anything (grrrr!) and I'll find out when they don't pay me for the days.
  • my team assistant will delete the email because there is a copy of the krankenkasse.

Moral: the DGPR is not easy.

Once established the data of which the school is a joint processor, and have classified them to understand what they are for, if they are necessary, desirable, PII, and all that, it is necessary to talk about retention time.

The data retention time must NOT go beyond their necessary use. In the case of the school, if there is a hard copy format, it means that in theory, at the end of the school year, ALL should be deleted, and one wonders (if they are printed on paper as a report, eg classwork), if even a quarter would not make sense.

There is also a distinction between aggregate and given data: the marks of each single question and of each class test, for example, are "given". If we calculate the average of the grades per class, removing the names and surnames of the students, we have an "aggregate". Aggregates don't have many limits under the GDPR. But if you produce aggregates you are also, in addition to a joint data controller, also a data processor . In the event that the data processor requests the help of someone to process the data, it becomes a "third party".

So ultimately, the school that uses Google, or Teams, or whatever for its teaching:

  • almost automatically becomes a joint controller under the GDPR. So it has all the responsibility of the controller and also to define with the other controller who is the “Main” data controller who does most of the work. It is not taken for granted that it is google.
  • they must indicate a Data Officer, that is the person to inform in case of any problem (data leak, change of legitimate purpose, etc).
  • they must indicate who is the data owner (who has complete control of the data and can give or remove access to the data) and the data custodian (the person who supervises compliance with the above).

In the absence of this, the GDPR is absolutely not respected, and even the "release" is completely irrelevant according to the law: if there is no privacy by design and by concept, the fact that someone signs a release does not change anything. .

Last but not least, the opt out is a right. It means that parents, as guardians, must have access to a link from which they can remove their children's data, at any time arbitrarily chosen by them.

Honestly, from what I hear around I don't see any of this around the Italian schools, and even the German schools have realized that this is excessively laborious and are getting done by the ministry of own systems, (the ministry of school is federalized by Land) and therefore in the end the use of corporate platforms is decreasing profusely.

But the point is simple: a release is NOT ENOUGH to become GDPR-compliant when using google or teams or zoom, even if these companies are PROVATELY GDPR compliant.

Sooner or later, maybe after the lockdown is over, we will see a whole series of lawsuits, since the Italian principals have reacted with extreme ignorance "let's ask Äa libberadoria!" , as if this exhumed schools from complying with the GDPR.

But the release doesn't protect anyone, and the fines are BIG.

What you can do as a parent is:

  • expect that at the end of the school year ALL the data are removed from the foreign cloud, printed and archived as was done on paper.
  • expect that we limit ourselves to the MINIMUM data, i.e. if the surnames are unique, only the surname, if the surnames are not unique only name and initial, etc
  • pretend to know the name of the data officer.

For these things there is NO release.

Leave a Reply

Your email address will not be published. Required fields are marked *