In the back end of privacy policies, a firm needs to ensure its own processes and those of its supply chain are in line with privacy regulations, so what is presented to consumers in the front end through privacy policies is realistic and effective.
A new tool, using artificial intelligence methods, including machine learning, builds on a year-long effort and is currently under test with industrial partners. It was developed by a multi-stakeholder initiative led by HEC Paris Professor David Restrepo Amariles, Aurore Troussel (LL.M. HEC. 19), and Rajaa El Hamdani, data scientists at HEC Paris.
“By ticking this box you accept...”
Think back to the last time you signed up for a new website or online service. Did you read the terms and conditions before you clicked “accept”? If the answer to that question is an embarrassed “no”, then don’t worry, you are not alone. The length and the vocabulary used in most of privacy documents make data processing of companies time-consuming and difficult to understand. Researchers at HEC Paris developed Privatech, a new machine-learning powered application which detects breaches to General Data Protection Regulation (GDPR) in privacy documents.
Researchers at HEC Paris developed Privatech, a new machine-learning powered application which detects breaches to GDPR in privacy documents.
This application could serve consumers, lawyers, data protection officers, legal departments, and managers in auditing the privacy documents of a company. But more importantly, it aims to further generate privacy compliance in the back end of data flows, or in other words, to ensure companies are informed of data practices so they can take privacy preserving decisions. Privatech allows managers who are not specialized in privacy protection to conduct a preliminary compliance assessment and detect potential issues requiring specialized advice.
Privatech allows managers who are not specialized in privacy protection to conduct a preliminary compliance assessment and detect potential issues requiring specialized advice.
The challenge for businesses: complying with EU (and US) law
The General Data Protection Regulation came into force in 2018 and many companies saw this regulation as a challenge in terms of compliance. In 2017, 90% of executives consider GDPR to be the most difficult form of compliance to achieve1. GDPR requires companies to govern their data processing while ensuring data subject’s rights. Under GDPR, companies have to set up procedures and documents enabling users to access clear information about their personal data processing and to control this processing.
The cost of non-compliance is estimated to be 2.71 times the cost of compliance.
Two aspects of GDPR are of particular importance for businesses. First, GDPR has a very broad application’s scope, far outside EU borders. Second, GDPR sets forth fines of up to 10 million euros or up to 2% of the company’s entire global turnover, whichever is higher. It explains why the cost of non-compliance is estimated to be 2.71 times the cost of compliance2. In addition, the recent entry into force of the California Consumer Privacy Act (CCPA) shows that the data processing of companies will be more and more scrutinized by regulators. This regulatory trend makes investment in privacy compliance technologies relevant.
An app built with a coalition of law and business
Privatech uses machine learning to automate the process of complying with legislation. With the help of several law firms and companies including Atos, one of the largest IT companies in Europe, HEC Paris researchers created a tool for automating the assessment of privacy policies.
The tool can read privacy policies and detect lines that might not be compliant with the law or may leave consumers' personal data open to exploitation.
This means the tool can read privacy policies and detect lines that might not be compliant with the law or may leave consumers' personal data open to exploitation. To develop this tool, researchers relied on annotated privacy policies, with clauses’ labels corresponding to different data practices, and connected these annotations to corresponding GDPR articles and obligations.
Data practices are categories of data processing activities, for example “data retention” which is a data practice that refers to how long data can be stored. Each paragraph in the privacy policies was tagged to a corresponding data practice. We then trained a machine-learning algorithm to identify and label different data practices in a legal document. The app also assesses readability of privacy policies because a key aspect of GDPR requires privacy policies to be easily readable. The app is calibrated so that all text should be readable by any high-school student.
Reshaping privacy compliance from the ground up
Privatech aims to streamline privacy compliance and consumer protection by focusing on firms (data controllers and processors) rather than on consumers (data subjects). The application may help individuals to better understand the privacy policies that they would otherwise sign blindly. However, we decided to focus on companies as they could generate privacy compliance by design and are liable under GDPR.
By focusing on firms, Privatech aims to ensure companies are able to translate privacy policies disclosed to consumers into effective corporate compliance mechanisms. We expect that Privatech will eventually encourage companies to design and monitor their data processing activities, so they are legal, comprehensive and easy to understand.
Applications |
---|
Our work will be valuable for any companies and data handlers who have a need to comply with data protection legislation. The project will reduce the need for the repetitive and labour-intensive elements of legal assessment of privacy documents and improve compliance with legislation. Our work will also be valuable for consumers who may struggle to interpret privacy documentation such as GDPR. Ultimately, data protection authorities could also use the application to conduct audits and monitor compliance with GDPR. |
Methodology |
---|
The project started as a class deliverable for two courses offered by Professor David Restrepo at HEC Paris, TechLaw offered in the LL.M program, and Data Law and Technology Compliance offered in the MSc Data Science for Business jointly organized by HEC Paris and Ecole Polytechnique. A first beta version relied on the students’ work and collaboration from lawyers at Baker McKenzie Paris and tech entrepreneurs, including Croatian developer Vedran Grčić. Since August 2019 the project is fully developed in-house at HEC Paris by the Smart law Hub, which integrates law scholars and data scientists. The project has also changed its methodology and focus. The application is developed to detect unlawful or problematic sentences within privacy policies and to evaluate the complexity of privacy documents. The algorithms have been trained on data retrieved by researchers. The training data set is composed of sentences retrieved from various privacy policies, and judicial and administrative decisions. These sentences were labelled and categorized by data practices such as “data retention” or “data sharing”. This preparatory work allowed for the creation of a machine-learning algorithm able to identify and label different data practices in a legal document. In addition, a readability algorithm evaluates the complexity of privacy document to verify its compliance with transparency and explainability requirements. The main focus of the research today is compliance generation which seeks to monitor internal documents and documents in the data supply chain. |