Every time we take an Uber ride we actually want the platform to know our geographical location to match us to the closest driver. Indeed, we consumers benefit from such abundance of data being in the hands of companies. With these data, companies personalize their products and services to fit our preferences and needs.
At the same time, such ubiquitous availability of data increasingly presents risks. A notorious example of a consulting firm, Cambridge Analytica, exploiting the Facebook data of 50 million Americans to sway the 2016 US Presidential elections provides a cautionary tale. This is an extreme example, but similar (just on a smaller scale) data leakage and misuse incidents occur on a daily basis. What measures can governments and regulators take to prevent such incidents? How should companies and digital businesses, for whom a large part of their business models are our data, change their practices and policies so that our data are safe?
Cambridge Analytica is an extreme example, but similar data leakage and misuse incidents occur on a daily basis.
HEC Paris Assistant Professor of Operations Management Ruslan Momot explains how his recent research in collaboration with co-authors from the United States, the United Kingdom, and Canada sheds light on digital privacy.
Why is current regulation inefficient?
In order to answer this question, we study interaction between three parties who are concerned with our data, us – the consumer, the company we are interacting with (for example, Facebook and advertisers using its services), and malicious third parties, who may be other businesses (for example, ill intent advertisers or companies similar to Cambridge Analytica) or even governments. Our research question is: how does a company’s data policy (essentially, company’s decisions of how much data to collect and how to protect data) influence the interaction between these three parties?
We find that, in general, when companies choose data policies in their self-interest, more data are collected than what would be optimal for consumers. Our findings indicate that the claims of industry leaders that companies collect the exact amount (or even less data than) their consumers wish, are not necessarily true. Our work, thus, highlights the need for regulation of such markets.
In the United States the key data regulator is the Federal Trade Commission (FTC). After the Cambridge Analytica scandal erupted, the FTC fined Facebook $5 billion. Not surprisingly, the FTC cannot ask companies not to collect data at all. In the end, these data are at the core of the companies’ business model and asking not to collect these data is akin to asking companies to commit business suicide.
Thus, the FTC’s major efforts right now, in terms of regulating these kinds of markets, are essentially directed at asking companies to enforce their data protection policies and at ensuring that at least a minimal data protection level is delivered. We show in our research that this is simply not enough.
Irrespective of the data protection levels you're implementing, data will always be such a business asset so companies will still be collecting more data than their customers wish. So more creative methods are needed to minimize the risks from data collection.
Two solutions to reduce data collection: taxes and fines
In our work, we propose two key types of instruments for discouraging companies from collecting more data than is strictly necessary. One is a tax proportional to the amount of data that a company collects. The more data a company collects about its customers the higher the financial costs of these data to the company.
Another type of instrument is the liability fine. Whenever a company agrees to pay a certain amount of money to the regulator after a data breach, this amount of money should be proportional to the damage to consumers from the data breach. In the case of Cambridge Analytica, the breach was massive, so the company should have to pay a substantial fine.
Both these instruments can help in restoring efficiency in these kinds of markets and help a regulator like the United States’ FTC to push companies to collect only the exact amount of data that customers are willing to share.
A third way to reduce the harm: rethinking revenue management
Recent years have seen an emergence of data-driven revenue management. Companies increasingly harness our personal data in order to sell to us products and services at the right time and at the right price. Insurance companies offer personalized quotes based on intimate details of our lives including our medical histories. The financial industry designs personalized loans which fit our spending patterns. Facebook and Google decide how to build our newsfeeds. Amazon chooses a bespoke assortment of products to offer to each customer based on their past purchases.
Companies increasingly harness our personal data in order to sell to us products and services at the right time and at the right price.
What is common to all these seemingly different companies is the way in which they decide which price to set or which assortment to show each individual customer. The key ingredient is customers’ data: companies engaged in personalized revenue management apply sophisticated machine learning techniques and algorithms on the historical data of their previous customers in order to build models of human behavior. These models are then used to offer new clients personalized products and services: an insurance quote, tailored prices of airline tickets or targeted advertisements when browsing the web. In essence, the company can come up with the best possible price (or assortment, for example) for the new customer because he or she will resemble previous customers with similar characteristics.
With this kind of decision-making framework usually used in the data-driven revenue management applications, which heavily relies on the (potentially sensitive) historical data, there are pressing privacy risks.
Recent research in computer science shows that adversaries can actually reconstruct sensitive individual-level information by observing companies’ decisions, for example personalized prices.
For instance, a hacker might simply steal historical data from the company’s database. But a hacker doesn't even necessarily have to hack into the database! Recent research in computer science shows that adversaries can actually reconstruct sensitive individual-level information by observing companies’ decisions, for example personalized prices or assortments.
In our work we design new “privacy-preserving” algorithms to be used by companies engaged in data-driven decision-making. These algorithms are aimed at helping such companies to limit harm imposed on their customers due to data leakage or misuse, while still allowing profit. Unfortunately, data cannot be made 100% safe and we thus can only attempt to reduce harm as much as possible.
Differential Privacy - a key to privacy-preserving revenue management
One possible way to design privacy-preserving algorithms for the companies engaged in data-driven revenue management is to impose an additional constraint on the companies’ decision-making framework. In particular, we can require that the decisions of the company (i.e., an insurance quote or an assortment of products) should not be too dependent on (or too informative of) the data of any particular customer from a historical dataset that the company used to derive this decision. An adversary, thus, should not be able to backtrace company’s decisions and infer sensitive information of the customers in the historical dataset. Formally, such requirement corresponds to designing “differentially private” revenue management algorithms. Differential privacy is a concept developed in computer science literature. It has become an established de facto privacy standard in the industry used by companies such as Apple, Microsoft, and Google as well as public agencies such as the US Census Bureau.
We find that one can design such privacy-preserving algorithms through addition of carefully adjusted “noise” to companies’ decisions or to the sensitive data that a company uses.
We find that one can design such privacy-preserving (or differentially private) algorithms through addition of carefully adjusted “noise” to companies’ decisions or to the sensitive data that a company uses. “Noise” is essentially any meaningless data which is akin to a flip of a coin. Assume for instance an insurance company who designs a quote for a particular customer. This company can first calculate the true-optimal price (for instance, the price that would maximize company’s revenue from this particular customer), then flip a coin and add $1 if getting heads and subtract $1 if getting tails. Clearly, by adding such “noise” to the original true-optimal price, the company makes the carefully designed price “less optimal”, which potentially reduces profits. However, adversaries will have less information (or less inference power) to deduct anything meaningful about sensitive information regarding the company’s customers.
In our study we show that the company does not have to add a lot of noise to provide sufficiently strong consumer privacy guarantees. In fact, the more historical data the company has, the cheaper such privacy preservation is and, fortunately for the company, in some cases privacy can be achieved almost for free.