Reading time: 3 minutes
Jan Bosch is a research center director, professor, consultant and angel investor in startups. You can contact him at email@example.com.
Digitalization is concerned with the three key enabling digital technologies software, data and artificial intelligence. These build on each other as it typically is software that generates the data from systems in the field and it’s this data (preferably labeled) that forms the basis for machine learning and deep learning. The software-intensive systems industry, as the name suggests, has fully embraced the role and importance of software and many companies are actively experimenting with and prototyping ML/DL models. Data, however, remains a highly controversial subject in most of the companies I work with.
The reason data is so controversial is twofold. First, most companies sell a product and the customer then becomes the legal owner of the product. That means that any data generated by that product is considered to automatically be owned by the customer as well. As a result, the data can’t be used by the product company unless the customer gives permission to use it. And most customers are quite restrictive in doing so and typically only allow this if it directly benefits them and it doesn’t benefit others.
The second reason is that the whole system of laws surrounding GDPR and privacy is, in most contexts, poorly understood. Following the “better safe than sorry” policy, most individuals in companies that are involved in data generation, processing, analytics and storing are extremely restrictive in what data they’ll work with and generally treat it as a toxic material rather than a valuable asset.
These concerns are of course much better addressed in other ways than just avoiding data collection. The question of who owns the data from products sold by the company is simply a legal one: if the contract says that the company owns the data and can use it for its own purposes, then that’s what’s agreed with the customer. This is far from a new concept and the automotive industry has used it for decades to be able to use the data from its fleets of vehicles.
The GDPR question is, admittedly, complex but there are two important principles. First, if the customer agrees to data collection, you can use the data. Second, if the data is properly anonymized through, for example, isomorphic encryption or aggregation, the data is free to use as well. Many have used aggregation as the primary mechanism for anonymization, but with the emergence of ML/DL, it’s important to have the individual data entries and associated labels for the training of models.
Please note that I’m not a legal expert and you should do your own research, but the current practice of extreme restrictiveness is counter-productive both for companies and for the people using their products. Would you rather use a product that constantly gets better by learning from you and thousands of others or be stuck with a static product that never changes? And what do you think is better for the competitive position of your company?
Even in cases where the data shouldn’t be collected in a central location and stored for any reason, there are still good ways to use data-driven solutions. For instance, we have research ongoing in federated reinforcement learning where a population of products trains individually based on local data and then exchanges model parameters instead of the data it used for training. This allows it to continuously improve by combining the learnings from the entire population without sharing sensitive data.
Current highly restrictive approaches around data in many companies are holding us back from adopting the best practices in product development and evolution. In many ways, it’s akin to fighting with one arm tied behind your back as the lack of data causes you to operate based on your beliefs about the customers and market, rather than the facts. Data ownership can and should be managed in legal contracts and legal and privacy concerns can be addressed by technical solutions. Western society is built on the scientific method and the Enlightenment was a transition from operating based on beliefs to basing ourselves on facts. It’s time for you and your company to go through the same transition.