Spurred by widespread adoption of artificial intelligence and machine learning, “data” is becoming a key production factor, comparable in importance to capital, land, or labour in an increasingly digital economy. In spite of an ever-growing demand for third-party data in the B2B market, firms are generally reluctant to share their information. This is due to the unique characteristics of “data” as an economic good (a freely replicable, non-depletable asset holding a highly combinatorial and context-specific value), which moves digital companies to hoard and protect their “valuable” data assets, and to integrate across the whole value chain seeking to monopolise the provision of innovative services built upon them. As a result, most of those valuable assets still remain unexploited in corporate silos nowadays.
This situation is shaping the so-called data economy around a number of champions, and it is hampering the benefits of a global data exchange on a large scale. Some analysts have estimated the potential value of the data economy in US$2.5 trillion globally by 2025. Not surprisingly, unlocking the value of data has become a central policy of the European Union, which also estimated the size of the data economy in 827€ billion for the EU27 in the same period. Within the scope of the European Data Strategy, the European Commission is also steering relevant initiatives aimed to identify relevant cross-industry use cases involving different verticals, and to enable sovereign data exchanges to realise them.
Among individuals, the massive collection and exploitation of personal data by digital firms in exchange of services, often with little or no consent, has raised a general concern about privacy and data protection. Apart from spurring recent legislative developments in this direction, this concern has raised some voices warning against the unsustainability of the existing digital economics (few digital champions, potential negative impact on employment, growing inequality), some of which propose that people are paid for their data in a sort of worldwide data labour market as a potential solution to this dilemma.
From a technical perspective, we are far from having the required technology and algorithms that will enable such a human-centric data economy. Even its scope is still blurry, and the question about the value of data, at least, controversial. Research works from different disciplines have studied the data value chain, different approaches to the value of data, how to price data assets, and novel data marketplace designs. At the same time, complex legal and ethical issues with respect to the data economy have risen around privacy, data protection, and ethical AI practices.
In this dissertation, we start by exploring the data value chain and how entities trade data assets over the Internet. We carry out what is, to the best of our understanding, the most thorough survey of commercial data marketplaces. In this work, we have catalogued and characterised ten different business models, including those of personal information management systems, companies born in the wake of recent data protection regulations and aiming at empowering end users to take control of their data. We have also identified the challenges faced by different types of entities, and what kind of solutions and technology they are using to provide their services.
Then we present a first of its kind measurement study that sheds light on the prices of data in the market using a novel methodology. We study how ten commercial data marketplaces categorise and classify data assets, which categories of data command higher prices, and what features are driving the prices of data in the market.
Next we turn to topics related to data marketplace design. Particularly, we study 1) how buyers can select and purchase suitable data for their tasks without requiring a priori access to such data in order to make a purchase decision, and 2) how marketplaces can distribute payoffs for a data transaction combining data of different sources among the corresponding providers, be they individuals or firms. The difficulty of both problems grows exponentially with the number of data providers involved, and hence it is further exacerbated in a human-centric data economy where buyers have to choose among data of thousands of individuals, and where marketplaces have to distribute payoffs to thousands of people contributing personal data to a specific transaction. Using large datasets of taxi rides from Chicago, Porto and New York we show that the value of data is different for each individual, and cannot be approximated by its volume, and we develop algorithms and tools to reduce the complexity of both problems and to make data purchasing more profitable for buyers and more efficient for data marketplaces.
We conclude with a number of open issues and propose further research directions that leverage the contributions and findings of this dissertation. These include monitoring data transactions to better measure data markets, and complementing market data with actual transaction prices to build a more accurate data pricing tool. A human-centric data economy would also require that the contributions of thousands of individuals to machine learning tasks are calculated daily. For that to be feasible, we need to further optimise the efficiency of data purchasing and payoff calculation processes in data marketplaces. In that direction, we also point to some even more efficient alternatives beyond the ones presented in this dissertation. Finally, we discuss the challenges and potential technologies that will help with building a federation of standardised data marketplaces.
The data economy will develop fast in the upcoming years, and researchers from different disciplines will work together to unlock the value of data and make the most out of it. Maybe the proposal of getting paid for our data and our contribution to the data economy finally flies, or maybe it is other proposals such as the robot tax that are finally used to balance the power between individuals and tech firms in the digital economy. Still, I hope this work sheds light on the value of data, contributes to making the price of data more transparent and, eventually, to moving towards a human-centric data economy.
About Santiago Andrés
Santiago joined the Data Transparency Group at IMDEA Networks as a PhD Student in 2019. He is working on data economics and understanding the value of data. Prior to joining IMDEA, Santiago worked as principal at Axon Consulting, as Senior Manager at Deloitte and as Project Manager at Telefónica I+D. He has extensive experience in business consulting in the ICT sector with more than 110 projects in the fields of regulation and public policy, strategy and operations, network planning and techno-economic analysis. He has worked for major telecom operators, governments and regulatory bodies in more than 25 countries in Europe, Latin America and the Middle East. Santiago obtained his Telecommunications Engineering degree from UPM in 2001 and a Master in Economics from UNED in 2012.
PhD Thesis Advisor: Dr. Nikolaos Laoutaris, IMDEA Networks Institute, Madrid, Spain
University: University Carlos III of Madrid, Spain
Doctoral Program: Telematics Engineering
PhD Committee members:
- President: Georgios Smaragdakis, Full Professor, Chair and Section Head of Cybersecutity TU Delft
- Secretary: Ángel Cuevas Rumín, Associate Professor, Universidad Carlos III de Madrid
- Panel member: Pablo Rodríguez – Director – CTO Office Google