Data Economy: We are working towards developing a formal theory, and a set of methods and systems, for realising in practice the “data is the new oil” analogy, especially its human Centric version, in which individuals get compensated by online and offline services that collect and use their data [IEEE Internet Computing]. We are looking at fundamental questions and problems such as: (1) How do you split the value of a dataset among all the individuals and sources that contribute to it? [arXiv:2002.11193] [arXiv:1909.01137]; (2) As a data buyer, how do you select which of the available datasets to buy in an open data marketplace?; (3) How do you implement in practice a safe, fair, distributed, and transparent data marketplace?
Sensitive Personal Data and the Web: We are working on several algorithms, methodologies, and tools for shedding more light to what happens to our personal data, especially those that are deemed sensitive, on the web. For example with eyeWndr we developed an algorithm and a browser addon for detecting targeting in online advertising [ACM CoNEXT’19]. For targeting to work, trackers need to collect interests, intentions, and behaviors of users at a massive scale. In [ACM IMC’18] we showed that, unlike popular belief, most tracking flows carrying data of European citizens start and terminate within the EU. European Data Protection Authorities (DPA) could, therefore, investigate more easily matters of compliance with GDPR and other legislations. The latter becomes particularly important in the case that trackers collected sensitive personal data, e.g., related to health, political beliefs, sexual preference etc., that are protected by additional clauses under GDPR. In our most recent work, we developed automated classifiers for detecting web-pages that contain such sensitive data [ACM IMC’20]. Applying our classifiers to a snapshot of the entire English-speaking web we found that some 15% of it includes content of sensitive character.
Detection of Fake News in Social Media and the Web: As part of our ongoing research, we are developing algorithms and knowledge-extraction methods for detecting and analyzing fake news in social media and more general web platforms. As more people become reliant on information spread in their social media circles, they also become more vulnerable to manipulation and misinformation. Whether it is part of an intentional and organized campaign or simply the result of lack of knowledge in a general area, fake news represents one of the most important challenges of a modern digital society. Our approach relies on (1) creating efficient crawling methods that can provide large quantities of data, readily updated and in a scalable manner, (2) using state-of-the-art graph analysis and prediction algorithms, such as graph neural networks to perform detection, of possible fake-news sources, as well as to analyze the spread of such information through the network, (3) gain an understanding of false news occurrence and spread, depending on network type, user activity or factors external to the network itself. An important aspect is that the solutions thus found take into consideration user-needs, as well as the technological and legal constraints involved in this process. They are, furthermore, general, and can be readily applied to other types of information-spread paradigms, such as epidemic detection or cyberthreat detection, among others.
Early Warning Systems for Epidemics Spread: We are developing an early warning system for predicting epidemic spread and risk of contagion using mobile phone data to detect possible hospitalizations, tracking the risk connections with other users and detecting the most likely places of contagion. The solution is based on machine learning techniques and it poses many innovative advantages over the state of the art, which are that: (1) the data is already there for millions of people, (2) the coarse granularity of cell tower sectors is large enough to protect the anonymity of people but small enough to be useful when considering areas that may be more dangerous than others, (3) the solution can be obtained without any data leaving the data controller, (4) the solution can be presented either on web-pages that you have to visit to know in which areas to be more careful and/or in the form of a smartphone app that will warn you with a notification of danger whenever you enter risky areas. Moreover, (5) this solution is user-centered and (6) it can also be generalized so it can be adopted by different cities and focused in future infectious diseases, to predict the early spatial evolution and design spatio-temporal programs for disease control.
Example of risk map movie from London for the period of March and April, 2020:
Data Watermarking and Privacy by Design: In our most recent strand of work around privacy and the economics of data, we are looking at the role of digital watermarking, as an important enabler for trading personal data in a safe, but also accountable manner. Digital watermarking is only one pillar of our efforts towards establishing data exchange systems that are accountable and private by design. More details on this soon!