Mathematician and data scientist advocate Clive Humby once said: “Data is the new oil. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value”.
As with oil, data now shape our world. More and more companies are collecting massive amounts of data, most of the time without a specific purpose, but on a “good-to have” basis. As a consequence, companies end up with “piles” of unused data, which land in the so-called “data graveyards”. This practice creates certain GDPR compliance issues and more specifically with the storage and purpose limitation principles.
What exactly are data graveyards?
Most companies collect and store petabytes worth of data. What is a petabyte? It is a unit of information and one petabyte equals one million gigabytes(!), which is nearly equal to 13.3 years of HDTV video or over 58,000 movies. There can be plenty of reasons for collecting and storing that amount of data; AI initiatives and machine learning (big data), profiling in order to target more effectively, and in general to make informed business decisions. But although at first this data was collected for a specific purpose, sometimes it is collected on a good-to-have basis. It can also be the case that this data comes from legacy systems. This data is to stay in the storages forgotten, never to be seen again, while in the meantime more and more data is added on top of the already existing ones. And BOOM, we have a data graveyard! A huge repository of unused data.
To begin with, collecting large amounts of data is not wrong – as long as the collection is based on a legal basis and meets all the GDPR requirements. In many cases, large amounts of data help companies make informed business decisions. But sometimes, data graveyards can cause more harm than good, especially in regards to GDPR compliance – especially when we think about the principles of the GDPR.
Storage limitation principle
According to this principle, personal data must be kept for no longer than necessary in order to achieve the specific purpose for which you have collected the personal data in the first place. In other words, even if you collect and use personal data fairly and lawfully, you cannot keep it for longer than you actually need it or you are required by a law. After this point, personal data should be deleted or anonymised. By definition, this is not the case with data graveyards. Personal data remains there for an indefinite period of time, breaching the storage limitation principle.
E.g. An example of a breach of the limitation principle comes from Denmark: During an inspection to a Hotel Group the Danish DPA became aware of a number of systems containing a lot of personal data (500.000 profiles of customers) that should have been deleted in accordance with the Hotel’s deadlines. The Danish DPA fined the Hotel Group around 148,000 € and reported them to the police.
Purpose limitation principle
The purpose limitation principle requires that you process personal data for specified, explicit and legitimate purposes, meaning that you must be clear and open from the outset about why you are processing the personal data. However, the ugly truth is that data graveyards consist, most of the time, of personal data that was collected on a good-to-have basis rather than on a need-to-have basis, as required by the purpose limitation principle.
According to this GDPR principle, the data you hold must be adequate, relevant and limited to what is necessary for the purposes of processing. In other words, you should identify the minimum amount of personal data you need to fulfil your purpose. Data graveyards involve the risk of breaching the data minimization principle, as there is no guarantee that the personal data contained therein is still adequate, relevant and limited to what is necessary in relation to the purposes for which it was collected in the first place.
According to GDPR, the data shall be accurate and, where necessary, kept up to date. You should take every reasonable step to ensure that personal data is completed or rectified without undue delay. When it comes to data graveyards, it is questionable both its source as well as if the data contained therein are updated regularly. Have in mind that inaccurate data can lead to the undesired situation that companies make an erroneous business decision.
Data graveyards come also with a lot of (unnecessary) costs:
Storing large amounts of data can give rise to an expensive set up procedure. In order to store this amount of data you will probably need an advanced “location” from a technology perspective. This may require a big investment from you side.
You may spend unnecessary working hours trying to maintain a pool of data that are not of use anymore.
Setting up a security system is an obligation that you cannot ignore. Data security can have so many layers, beginning with storage and including encryption. It is out in place to block third parties from accessing the data.
Tips to minimise the risks
Data graveyards come with a lot of compliance risks and unnecessary costs. Having a robust
retention policy in place will help you address the above-mentioned risks. In brief, a retention policy will define what personal data you are allowed to retain, for how long and what happens to the personal data once the retention period expires. Establishing the retention can be a challenging task but here are some tips:
- Consider whether there is a law or regulation requiring you to store the personal data for a specific period of time.
- If there is no such law or regulation, consider your purpose for processing the personal data at hand. Ask yourself, how long do I need to process the data in order to achieve this specific purpose? It is up to you to assess the appropriate retention period.
- You should consider whether you need to store the personal data for defending legal claims.
(For more information, check out our article on data retention periods)
Make sure to regularly review this policy, as legal requirements for storing personal data may change from time to time.
Once the retention period expires, you have two options: Either delete the personal data or anonymise it, by removing all personal identifiable information while retaining a core of useful information. The GDPR does not apply to anonymous data, therefore you can keep the data without limitations.
All in all, a “keep everything” policy and the creation of data graveyards does come with a lot of compliance issues and unnecessary costs. There is always the risk that you base business decisions on outdated data. Focus on collecting personal data that you actually need and not on a good-to-have basis and make sure you delete or anonymise the data you no longer need.