#2 – Identification of Personal Data & Data Discovery

The last post ended with me talking a bit about the necessity of identifying the personal data that the company keep before you build a sustainable privacy program. I, of course, will not leave you hanging. This post in this series will therefore elaborate on the notion of personal data. What is it, and how can we find it?
The truth is that both regulations (EU and UK GDPR) have a notoriously broad scope of application. However, a crucial cut-off point is that they solely apply to personal data. And luckily (at least to lawyers), the regulations have a legal definition of personal data that the WP29 have elaborated on, which makes it somewhat easier to determine where that cut-off is. Although it might sound easy in theory, in reality there are always discussions on the practical application of more obscure data types.

Nevertheless, the definition of personal data according to the regulations is, (1) any information (2) relating to (3) an identified or identifiable (4) natural person.
The regulations are broad in application, simply because the words ‘any information’ are used. It covers both objective and subjective information, e.g., observations and measurable things about a person, and judgments and evaluations. By necessity, the GDPR even covers untrue information. You might believe that the GDPR only applies to certain types of information, like sheets of paper in a filing system or a computer database. But despite the personal data’s appearance, it is afforded protection by the regulations. The regulations are technology-neutral in the sense of defining information.

The information needs to be related to someone, i.d., connected to an individual. Sometimes the link can be easy to identify, but sometimes, it’s unclear. Usually, you would evaluate if the information is relating by its content, purpose, or result. Ask yourself, does the information in its content mention a person? Or will the company use it to evaluate, treat in a certain way, or influence the status or behaviour of an individual? Would using the data likely affect a person’s rights and interests? Then it might be personal data.

Every piece of information needs to be able to identify a person. That means there is at least a chance that someone could be distinguishable from other persons. Or the more legal way of saying it: a person is identified or identifiable either directly or indirectly by the data. This is dependent on the means likely to be used by someone to identify the person in question and the resources needed to do so, the available technology at the time and the technological developments. It’s important to remember, for example, that pseudonymous information is identifiable.

Deceased persons and legal persons would not, as a general rule, be covered by the regulations. Natural persons only include living people, but some Member States in the EU (e.g., Denmark) even protect the dead for some time. Similarly, the regulations do not apply directly to legal entities. However, if information in a legal entity is considered related to natural persons, it can nonetheless be personal data like a corporate email address with someone’s name in it.

This brings us to the second question I promised to answer: How do you find personal data? It is usually done through data discovery. There are a few data discovery exercises that we recommend, like departmental interviews, sending out questionnaires, scanning and examining your systems, or looking through your contracts. If you want a macro perspective, you can always include an analysis of your policies and standard operating procedures (SOP). You might also consider using a data discovery tool or engaging DPOrganizer’s Professional Services team to help the company sniff out and analyse the data that’s kept.

I asked our founder, Egil Bergenlind, and our Head of Sales, Simon Neal, about data discovery, and they gave two thoughtful comments about it that they know are challenging for organisations. Firstly, Egil said that a data discovery project is like opening pandora’s box, not necessarily that the content is malicious, but that the volume of personal data is unexpected. It could therefore be difficult to estimate the amount of resources a discovery project would demand. It depends on the size and complexity of the organisation, the nature of their data processing and business, vendor and partner ecosystem, and IT infrastructure. But it also depends on how prepared and experienced you and your organisation are about data and data protection considerations.

Simon gave a real-life example about a company looking for a data discovery solution. They thought that an automated data discovery tool would be the best way to go forward. But it was too good to be true. The key takeaway that Simon gave me was that it may be easier, more cost-effective, and a time-saver to do manual discovery instead. Especially if the process of contracting a software vendor needs to go through a lot of departments and executives, and those folks in these departments have different opinions and understandings of the issues and scope of the project. That could just grind everything to a halt for months, for a project that could be done manually in a few weeks.

‘It is important to remember, once again, that data protection isn’t a set-and-forget box-ticking exercise. I’m sorry for repeating the same thing in this post, but it isn’t going to make it less true. To identify personal data, it’s imperative to have a sustainable data lifecycle approach. The general idea of the lifecycle is the points of data collection, usage, point of update or alteration, sharing, and destruction. When you’ve identified the current inventory of personal data, you need to ask yourself how often you need to search for potential new personal data where alterations happen that might be “create” personal data. For example, non-personal data that can become personal data, like machine data that are not related to anyone, could be connected to a specific person; such as an industrial machine with its operator. The lifecycle approach could naturally depend on the processing, like the velocity and volume of personal data used, changed internal processes or operations, etc.

In conclusion, a general definition of personal data is any information relating to an identifiable natural person who is alive (with exceptions as described above). Personal data can be found through data discovery exercises, which should be carried out regularly during the lifecycle of the processing.
In the next post, we are digging deeper into aspects of accountability concerning risks and security of processing, since we now know a bit more about personal data and how to find it.

Related blog posts

Training and awareness

Data Subject Rights – Automated processing & Profiling

Data Subject Rights – The right to object