Data is the fuel of AI systems, meant to have a profound impact on society and the economy. However, bootstrapping a healthy market for data is far from straightforward. Buying a dataset without first testing it requires large amounts of faith and luck. Data marketplaces (DMs) are trying to circumvent this by, among other measures, deploying sandbox technologies that allow a customer to interact with data samples, e.g., to explore the data, test its quality, or train a model, but without being able to extract the data or the trained model from the sandbox. However, training models is expensive in terms of resource consumption and can lead to abuse on the part of greedy customers in a manner that resembles the abuse of free return policies of online retail shops.
In this presentation, I will propose a realistic model for DMs that charge consumers for testing datasets, thus incentivizing them to minimize the complexity of their search for datasets to buy. Since consumers with a limited budget cannot test all possible combinations of eligible data assets, we develop three purchasing strategies to test and select suitable data for their models. We also introduce the concept of valuation functions and simpler puppet valuation models, which allow consumers to reduce processing costs up to 20x while keeping 90% of the accuracy achieved when using the corresponding full ML model for deciding which data to buy.
Alexandr is a Ph.D. student in Telematics at IMDEA Networks Institute (+UC3M). His main area of research revolves about data valuation and its role in the data economy.
Este evento se impartirá en inglés