OSS compliance with privacy by default and design

This talk by Cristina DeLisle can be found at https://archive.org/details/apconf-talks/Talk6_Cristina_compressed.mov. She is a data protection officer at XWiki SAS.

This talk starts with a discussion of the General Data Protection Regulation (GDPR) passed by the European Community in 2016. She pointed out that GDPR was searched for on Google in 2018 more often than Beyoncé and Kim Kardashian, which is not a bad thing really, possibly because it began to be enforced in 2018. Among the effects of GDPR were:

  • Companies were forced to review their privacy strategies, because the burden of proof for accountability was shifted to the companies. They could no longer assume everything was OK as long as no one complained, now they had an affirmative burden to demonstrate that they were in compliance. And the default was shifted in many cases from “opt out” to “opt in”, and the companies would need to have the policies and auditing to show that they had complied.
  • Data Collection and life cycle of data was affected. You are now limited in the purposes for which you can collect data, and you cannot take that data and use it for other purposes. And you have to be completely transparent about what data you collect, for what allowable purpose you are collecting it, how long it is retained, and if it is shared outside of the EU.
  • Breaches that exposes data must be reported in 72 hours, and any company or organization that regularly collects such data must have a Data Protection Officer.
  • Although GDPR directly applies only to the EU, it has been used as a model in other countries as well.
  • All of the GDPR requirements have implications for sysadmins, including in the Fediverse.

So far, the biggest fines under the GDPR have been for telemarketing, SPAM, data breaches, and surveillance.

The GDPR creates two roles, the Data Controller, and the Data Processor. The Data Controller is responsible for determining the purpose for which the data is collected, how it will be collected, and how it will be processed. In other words, this is the policy side. The Data Processor is a contractor that actually does the processing on behalf of the Data Controller. This is specified in the Data Processor Agreement that both parties enter into. A single organization can, however, perform both roles depending on how the data is handled. But it is the Controller that bears the most responsibility for GDPR compliance.

With this background, we need to consider how the Open Source community, and the Fediverse, can operate and still be compliant. As individual members of the Open Source Community, each of us is a Data Subject, and has enforceable rights under the GDPR. At the same time, organizations in the Open Source Community can be Data Controllers or Data Processors. She uses the example of Github, which is the controller of the personal data from your free private user account, and may also be processing invoices for your account. In the Fediverse, such as Mastodon, the users are all Data Subjects, but what about the instances? Are the admins of those Data Controllers? Data Processors? She brings up the example of Google in the milestone case about the Right to be Forgotten. Google was ultimately ordered to remove the site, and in getting there Google tried to say it was only a Processor, but the court essentially decided it was a Data Controller. Well, that is one thing with a giant corporation with thousands of employees, but what if an admin is asked to install a Mastodon instance, and does so without necessarily knowing how it works internally. Is that a Processor? And if they decide to do any data analytics, does that make them a Controller?

This then leads to a discussion of the legitimate reasons for collecting data under the GDPR:

  • Compliance with the law. If there is a specific legal obligation, such as for invoicing.
  • If there is a legitimate interest. For example, a bank will collect data as part of deciding whether or not to make a loan. But this needs to be carefully assessed.
  • If you have a contract with the Data Subject. But this requires that you have Specific and Informed Consent, the Data Subject must take an affirmative action to grant this right, and it must be Freely Revocable.

This can get pretty interesting in the details. If you have uploaded a photo to your profile, you have definitely consented to it being used. But you have to able to later delete that photo if you wish. If you contribute code to an open source project, they can refuse to later remove that code because they have a legitimate interest. Removing the code might be very damaging to the whole project. This is manifest in the Developer Certificate of Origin, introduced by the Linux Foundation on 2004, which says:

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

But a comment on a Fediverse post could be removed under the Right to Be Forgotten (RBF), or perhaps anonymized with any identification removed.


Open Source Software does have some advantages in dealing with the GDPR:

  • Transparency – The code is freely available to be inspected, and is developed out in the open.
  • Privacy-oriented – This code can be developed by people for people, not to advance a corporate objective (though this is not guaranteed).

But there are also challenges. It is clear that the GDPR assumes a corporate environment where specific roles can be presumed. In an open source project, will there always be a Data Protection Officer? Will privacy be designed in from the beginning? Will fixing security vulnerabilities be a top priority? We know that code being open means it can be done, but a lot of projects never get the “many eyeballs”. What about auditing and reviews of the code? We know that many projects run based on what each individual feels like doing, that it can be “scratching an itch”, but compliance wit the GDPR may not be anyone’s particular itch to scratch.

She then proposes that a Privacy Pledge be added to the ActivityPub protocol to cover at least some of these problems.

Listen to the audio version of this post on Hacker Public Radio!

 Save as PDF