The problem with anonymous data and GDPR

GDPR is just around the corner. Depending on where they are with preparations CISOs and Privacy Officers will either be starting to relax, or starting to panic.  

Some may decide to just wing it. Not the best idea given the consequences of non-compliance, but this may well come from past experience of “getting away with it”!

But this blog is not about simple compliance, in the sense that companies need to have a process in place to become compliant, it’s about what lies beyond GDPR and some of the peculiarities that GDPR has  thrown up.

A recent conversation with a client highlighted one such example. One key requirement of GDPR is that consumer data must be anonymised, and unable to be re-identified.

Yet non anonymised data, if held secure, is useful for organisations and businesses to study in order to discover buying patterns or other consumer behaviour. Consider how Amazon tracks purchases and makes further product recommendations.

However, huge data lakes of anonymised data are not much use to marketers even if they meet the requirements of GDPR.

Worse - it can render the data more vulnerable. While it meets the letter of the EU law, the law does not always reflect the reality of how data behaves and how the unscrupulous can exploit anonymised data quite easily.

When I use the word unscrupulous, I do not necessarily mean only criminals and the hackers working on their behalf. It can also include rogue data brokers and marketers who can actually make quite good sense of anonymised data just from a jumble of URLS or timestamps, for example.

An online report makes this clear: “Some make things very easy: for instance, anyone who visits their own analytics page on Twitter ends up with a URL in their browsing record which contains their Twitter username, and is only visible to them. Find that URL, and you’ve linked the anonymous data to an actual person.”

Experts here say that just 10 URLs can be enough to uniquely identify someone by using the laws of probability. By taking the addresses of sites an individual has visited in the anonymised data, hackers can then compare it to the more public URLs that they have visited, such as social media, or public playlists on YouTube or Spotify - the stuff we love to share. It doesn't take long to correctly identify an individual from this.

So in fact anonymous data can be less secure than that which is not. And it neatly demonstrates how, with the best intentions, data privacy laws do not always reflect the day to day and up to date reality of operating conditions, nor how businesses are changing in the era of digital transformation.

Legislation takes year to compile, agree and then eventually turn into law. Often the laws will be based on the situation when the legislation is written rather than what is the likely scenario when the legislation becomes law.

Laws that are rushed in to ban certain substances or meet emergency situations, are pretty much concrete in conception and in the problem they are dealing with.

Data is not like that, and nor is digital. It is fluid and the ways it is being used are extremely fluid - hackers and criminals obviously are not controlled by the workings of the GDPR. The conception of GDPR did not foresee the explosion of data that we now are experiencing.

The problem with GDPR is not that it doesn't do a good job of protecting consumer privacy, but that it does less good job of assisting businesses to innovate while actually offering opportunities for identity thieves to flourish.

The answer is not to ignore GDPR but for practitioners, vendors and researchers to find data anonymisation techniques that properly hide identity and provide insight analytics for organisations.