Disclaimer: This article does not represent legal counseling or legal advice, but only contains author´s opinions based on her experience in the field of data products. These are no recommendations on how to act in a particular situation. If you have questions regarding a specific data product, please contact our team.
What types of data you can use for product development
You´ve got an amazing idea for a new data product – and you have it (almost) all figured out, what the product will offer, who are the potential users, how it will look like… and a team of data engineers, scientists, and designers to work on it.
What could go wrong?
Perhaps you´ve heard about GDPR (General Data Protection Regulation), and the hefty fines issued to tech companies, but could that affect your data start-up? Yes, it could, if you plan to use the personal data of EU residents, or to target them as customers. Below you will find the aspects you should consider and the potential legal basis you could use for various product lifecycle phases.
Strict rules concerning the use of personal data
With the enactment of the GDPR, strict rules concerning the use of personal data were introduced. You need a legal basis for every non-private processing of personal data, and personal data is not just direct identifiers such as names and identification numbers. All information that relates to a person can be classified as personal under certain circumstances (GPS data, IP address etc.). Processing is very widely defined, also covering plain access to data. Furthermore, there is a requirement for privacy by design, meaning that a product should be designed in a GDPR compliant way from the start, with no exception for innovative products.
Use of data during product development and testing
As a first step, you would probably like to evaluate a certain amount of data and filter out some useful insights. Afterwards you could try to monetize those insights by developing a product. Let´s see what the options are for getting the data for this endeavor in a legally compliant manner, with the pros and cons of every option:
- Anonymized data (data that contains no personal reference)
- Pros:
- not subject to the GDPR, so you can use it without restrictions.
- Cons:
- challenging anonymization methods – it is not enough to delete direct identifiers (such as personal numbers), but additional measures must be implemented, such as data aggregation, generalization, swapping etc. to ensure no individual can be reidentified.
- evolving technology – the fact that a data set was considered anonymous at a certain time, does not mean that it will remain anonymous forever.
- Reduced usefulness of data: by removing or changing data you would lose some insights.
- Pros:
- Synthetic data is artificial data that is generated from original data using a model that is trained to reproduce the characteristics and structure of the original data. The idea behind synthetic data is to mimic the real-world data and their codependency and statistical value, without disclosing the identity of individuals.
- Pros:
- use without the GDPR restrictions – only if the right measures for reidentification prevention are implemented (such as removal of extreme outliers).
- preserved integrity of the data set – enabling more insights than with anonymous data.
- data multiplication – a sample of 1.000 pieces of original personal data, could produce 10.000 synthetic pieces of data.
- variety of possibilities for generation models – you can build your own model for generation of synthetic data and train it by using various methods (Monte Carlo, Variational Auto-Encoder, Generative Adversarial Network), or you can use one of the existing open source or commercial generators, some of which are free (such as Mostly AI – no recommendation!).
- Cons:
- sample of original data needed – if this is personal data you need to have a legal basis for the creation of synthetic data. You could use your legitimate interest, if the balancing test´s outcome shows that your interest overrides the rights and freedoms of data subjects.
- risk of reidentification – whether synthetic data remains anonymous is a continuous issue. It depends on the extent to which the synthetic data deviates from the original data to avoid identifiability and the extent to which anonymity is sustained over time.
- Pros:
- Mock data is not based on real-world data, it is made up.
- Pros:
- not subject to the GDPR.
- can be generated fast, as there is no need to collect real-world data. You could use mock data generators to make mock names, addresses, emails, or IBAN numbers.
- Cons:
- Limited use as it has no statistical value, but it could be useful for product testing.
- Limited use as it has no statistical value, but it could be useful for product testing.
- Pros:
- Personal data (including pseudonymized data)
- Pros:
- most insightful information – with personal data you can get more insights that are of better quality, as no data is lost, approximated or mimicked.
- Cons:
- usually consent required – which would have to be free, informed, the purpose of collection and use well defined, and revocable any time without a reason, so it is challenging to manage.
- numerous GDPR obligations – implementation of data subjects rights, documentation of compliance, privacy by design and default, data processing/joint controllership agreements with other stakeholders, etc.
- Pros:
Extra tip: if you already hold some personal data legally, you could consider anonymizing it or making synthetic data out of it, by using your legitimate interest as a legal basis for this (if the documented balancing of interests in the specific case shows that your interest overrides the rights and freedoms of data subjects), and then use anonymized or synthetic data as described above.
How to use data to sell, advertise and improve your product
After you have used appropriate data to develop and test your product, hopefully it is time to launch it and start providing services or selling products to customers. Again, you will need to process your customers´ data to that end. Here you can use contract performance as a basis for data collection and use. You can also collect the data before you have concluded a contract with a customer, if this is a necessary step towards contract conclusion (e.g. to make an offer). But be careful, you can only process the data that is strictly necessary to perform the contract (the address only in case of delivery, bank card details if the payment is done by card etc.) or to take the necessary pre-contractual measures.
Use of data for the provision of services / selling of products
You may not collect the data that is ´nice to have´ such as customer preferences. Also, the general profiling of customers cannot be done based on the contract performance, and for that purpose you would usually need a customer’s consent. On the other hand, profiling of certain customer´s behavior, if strictly necessary for the provision of service, such as granting a loan/mortgage, could be allowed on the legitimate interest basis, but these situations are very limited and would have to be properly documented.
Use of data for personalized advertising
We are in the digital world, right? So, we do not plan to advertise our data product on a billboard, but online. As we know, advertising campaigns are a lot more efficient if we target a group that we believe is most interested in our products/services. It would be ideal if we could know exactly what people want (by profiling them) and be able to reach out to them directly (direct marketing) with our offer.
As stated in the previous section, for general profiling consent is usually needed, so this might not be a viable option, due to management challenges and potentially low consent rate. If obtaining consent is too burdensome, you could acquire information about customer preferences from third parties on the market and identify your target group using those insights.
For direct marketing to new customers, you also need their consent, but on the other hand, you may send direct marketing to your existing customers (e.g., to upsell or cross-sell your products), if you give them the opportunity to opt out from such communication.
Use of data for product improvement
As your product is on the market, you should adapt to new customer needs and market trends. Therefore, you would probably like to use the data about the current use of your product, to enhance your product in the future. You may use customer´s consent as the legal basis for this processing, but that would require a substantial effort for consent management.
Hence, you might consider the option of using legitimate interest as your legal basis. For this, you need a balancing test, in which you show that your interests override the rights and freedoms of data subjects. Emphasizing the benefits the data subjects would obtain from product improvement, and implementing additional measures, such as pseudonymization of personal data (for example through hashing of direct identifiers) could support your arguments. Every case is different, so try to take all facts into consideration and explain your conclusion thoroughly.
Conclusion
All decisions must be made having in mind the specifics of the case at hand, and the impact of the data processing on the rights and freedoms of data subjects needs to be considered. However, since the enactment of the GDPR, best practices have emerged that offer guidance for its implementation. It might be more convenient to implement those best practices, even though sometimes this is quite burdensome, than to risk paying hefty fines due to experimenting with non-standard approaches.