The Data Transparency Group (DTG) is employing a mix of network measurements, distributed systems building, algorithms, and machine learning to study problems and propose solutions to transparency issues related to data privacy, the economics of data, information and disinformation spread, and automated decision making via machine learning algorithms. The objective of the group is to tackle important problems at the forefront of the interplay between technology, society, public policy, and economics. On all of the above we take a holistic approach that goes from fundamental thinking and rethinking, all the way to developing code running on large systems and devices, including all the business challenges for transforming visions and ideas to real world services.
ACM SIGMETRICS. Stony Brook, New York, USA. June 2025
ISIT 2024 Workshop on Information-Theoretic Methods for Trustworthy Machine Learning. Athens, Greece. July 2024
ACM SIGMETRICS. Venice, Italy. June 2024
AAAI International Conference on Web and Social Media. Buffalo, New York, USA. June 2024
Workshop on Technology and Consumer Protection (ConPro ’24), co-located with IEEE Symposium on Security and Privacy. San Francisco, CA, USA. May 2024
International Conference on Data Engineering. Utrecht, Netherlands. May 2024
Workshop on Artificial Intelligence System with Confidential Computing (AISCC 2024), co-located with NDSS Symposium 2024. San Diego, CA, USA. February 2024
ACM Data Economy Workshop (DEC), co-located with ACM SIGCMOD 2023. Seattle, WA, USA. June 2023
ACM Data Economy Workshop (DEC), co-located with ACM SIGCMOD 2023. Seattle, WA, USA. June 2023
Data Economy: We are working towards developing a formal theory, and a set of methods and systems, for realising in practice the “data is the new oil” analogy, especially its human Centric version, in which individuals get compensated by online and offline services that collect and use their data [IEEE Internet Computing]. We are looking at fundamental questions and problems such as: (1) How do you split the value of a dataset among all the individuals and sources that contribute to it? [arXiv:2002.11193] [arXiv:1909.01137]; (2) As a data buyer, how do you select which of the available datasets to buy in an open data marketplace?; (3) How do you implement in practice a safe, fair, distributed, and transparent data marketplace?
Sensitive Personal Data and the Web: We are working on several algorithms, methodologies, and tools for shedding more light to what happens to our personal data, especially those that are deemed sensitive, on the web. For example with eyeWndr we developed an algorithm and a browser addon for detecting targeting in online advertising [ACM CoNEXT’19]. For targeting to work, trackers need to collect interests, intentions, and behaviors of users at a massive scale. In [ACM IMC’18] we showed that, unlike popular belief, most tracking flows carrying data of European citizens start and terminate within the EU. European Data Protection Authorities (DPA) could, therefore, investigate more easily matters of compliance with GDPR and other legislations. The latter becomes particularly important in the case that trackers collected sensitive personal data, e.g., related to health, political beliefs, sexual preference etc., that are protected by additional clauses under GDPR. In our most recent work, we developed automated classifiers for detecting web-pages that contain such sensitive data [ACM IMC’20]. Applying our classifiers to a snapshot of the entire English-speaking web we found that some 15% of it includes content of sensitive character.
Detection of Fake News in Social Media and the Web: As part of our ongoing research, we are developing algorithms and knowledge-extraction methods for detecting and analyzing fake news in social media and more general web platforms. As more people become reliant on information spread in their social media circles, they also become more vulnerable to manipulation and misinformation. Whether it is part of an intentional and organized campaign or simply the result of lack of knowledge in a general area, fake news represents one of the most important challenges of a modern digital society. Our approach relies on (1) creating efficient crawling methods that can provide large quantities of data, readily updated and in a scalable manner, (2) using state-of-the-art graph analysis and prediction algorithms, such as graph neural networks to perform detection, of possible fake-news sources, as well as to analyze the spread of such information through the network, (3) gain an understanding of false news occurrence and spread, depending on network type, user activity or factors external to the network itself. An important aspect is that the solutions thus found take into consideration user-needs, as well as the technological and legal constraints involved in this process. They are, furthermore, general, and can be readily applied to other types of information-spread paradigms, such as epidemic detection or cyberthreat detection, among others.
Early Warning Systems for Epidemics Spread: We are developing an early warning system for predicting epidemic spread and risk of contagion using mobile phone data to detect possible hospitalizations, tracking the risk connections with other users and detecting the most likely places of contagion. The solution is based on machine learning techniques and it poses many innovative advantages over the state of the art, which are that: (1) the data is already there for millions of people, (2) the coarse granularity of cell tower sectors is large enough to protect the anonymity of people but small enough to be useful when considering areas that may be more dangerous than others, (3) the solution can be obtained without any data leaving the data controller, (4) the solution can be presented either on web-pages that you have to visit to know in which areas to be more careful and/or in the form of a smartphone app that will warn you with a notification of danger whenever you enter risky areas. Moreover, (5) this solution is user-centered and (6) it can also be generalized so it can be adopted by different cities and focused in future infectious diseases, to predict the early spatial evolution and design spatio-temporal programs for disease control.
Example of risk map movie from London for the period of March and April, 2020:
Data Watermarking and Privacy by Design: In our most recent strand of work around privacy and the economics of data, we are looking at the role of digital watermarking, as an important enabler for trading personal data in a safe, but also accountable manner. Digital watermarking is only one pillar of our efforts towards establishing data exchange systems that are accountable and private by design. More details on this soon!
Previous Projects:
There are currently no job offers in this section.