The Data Transparency Group (DTG) is employing a mix of network measurements, distributed systems building, algorithms, and machine learning to study problems and propose solutions to transparency issues related to data privacy, the economics of data, information and disinformation spread, and automated decision making via machine learning algorithms. The objective of the group is to tackle important problems at the forefront of the interplay between technology, society, public policy, and economics. On all of the above we take a holistic approach that goes from fundamental thinking and rethinking, all the way to developing code running on large systems and devices, including all the business challenges for transforming visions and ideas to real world services.
ACM SIGMETRICS. Mumbai, India. June 2022
A PIMS Development Kit for New Personal Data Platforms (Accepted for publication)
IEEE Internet Computing. 10.1109/MIC.2022.3157356. IEEE. January 2022
ACM WebSci. Barcelona, Spain. January 2022
International AAAI Conference on Web and Social Media (ICWSM. Atlanta, Georgia, USA. January 2022
Robust adjusted discriminant analysis based on shrinkage with application to geochemical and environmental fields
Chemometrics and Intelligent Laboratory Systems. 10.1016/j.chemolab.2021.104488. Volume 221 , Elsevier. ISSN: 0169-7439. December 2021
Internet Measurement Conference. Virtual. November 2021
KDD Workshop on Data-driven Humanitarian Mapping: Harnessing Human-Machine Intelligence for High-Stake Public Policy and Resiliency Planning. Virtual. August 2021
Journal of Quality Technology. 10.1080/00224065.2021.1930617. June 2021
What do Information Centric Networks, Trusted Execution Environments, and Digital Watermarking have to do with Privacy, the Data Economy, and their future?
ACM SIGCOMM Computer Communication Review. Volume 51 , ACM. ISSN: 0146-4833. January 2021
3rd BYMAT Conference - Bringing Young Mathematicians Together. Valencia, Spain. December 2020
Data Economy: We are working towards developing a formal theory, and a set of methods and systems, for realising in practice the “data is the new oil” analogy, especially its human Centric version, in which individuals get compensated by online and offline services that collect and use their data [IEEE Internet Computing]. We are looking at fundamental questions and problems such as: (1) How do you split the value of a dataset among all the individuals and sources that contribute to it? [arXiv:2002.11193] [arXiv:1909.01137]; (2) As a data buyer, how do you select which of the available datasets to buy in an open data marketplace?; (3) How do you implement in practice a safe, fair, distributed, and transparent data marketplace?
Sensitive Personal Data and the Web: We are working on several algorithms, methodologies, and tools for shedding more light to what happens to our personal data, especially those that are deemed sensitive, on the web. For example with eyeWndr we developed an algorithm and a browser addon for detecting targeting in online advertising [ACM CoNEXT’19]. For targeting to work, trackers need to collect interests, intentions, and behaviors of users at a massive scale. In [ACM IMC’18] we showed that, unlike popular belief, most tracking flows carrying data of European citizens start and terminate within the EU. European Data Protection Authorities (DPA) could, therefore, investigate more easily matters of compliance with GDPR and other legislations. The latter becomes particularly important in the case that trackers collected sensitive personal data, e.g., related to health, political beliefs, sexual preference etc., that are protected by additional clauses under GDPR. In our most recent work, we developed automated classifiers for detecting web-pages that contain such sensitive data [ACM IMC’20]. Applying our classifiers to a snapshot of the entire English-speaking web we found that some 15% of it includes content of sensitive character.
Detection of Fake News in Social Media and the Web: As part of our ongoing research, we are developing algorithms and knowledge-extraction methods for detecting and analyzing fake news in social media and more general web platforms. As more people become reliant on information spread in their social media circles, they also become more vulnerable to manipulation and misinformation. Whether it is part of an intentional and organized campaign or simply the result of lack of knowledge in a general area, fake news represents one of the most important challenges of a modern digital society. Our approach relies on (1) creating efficient crawling methods that can provide large quantities of data, readily updated and in a scalable manner, (2) using state-of-the-art graph analysis and prediction algorithms, such as graph neural networks to perform detection, of possible fake-news sources, as well as to analyze the spread of such information through the network, (3) gain an understanding of false news occurrence and spread, depending on network type, user activity or factors external to the network itself. An important aspect is that the solutions thus found take into consideration user-needs, as well as the technological and legal constraints involved in this process. They are, furthermore, general, and can be readily applied to other types of information-spread paradigms, such as epidemic detection or cyberthreat detection, among others.
Early Warning Systems for Epidemics Spread: We are developing an early warning system for predicting epidemic spread and risk of contagion using mobile phone data to detect possible hospitalizations, tracking the risk connections with other users and detecting the most likely places of contagion. The solution is based on machine learning techniques and it poses many innovative advantages over the state of the art, which are that: (1) the data is already there for millions of people, (2) the coarse granularity of cell tower sectors is large enough to protect the anonymity of people but small enough to be useful when considering areas that may be more dangerous than others, (3) the solution can be obtained without any data leaving the data controller, (4) the solution can be presented either on web-pages that you have to visit to know in which areas to be more careful and/or in the form of a smartphone app that will warn you with a notification of danger whenever you enter risky areas. Moreover, (5) this solution is user-centered and (6) it can also be generalized so it can be adopted by different cities and focused in future infectious diseases, to predict the early spatial evolution and design spatio-temporal programs for disease control.
Data Watermarking and Privacy by Design: In our most recent strand of work around privacy and the economics of data, we are looking at the role of digital watermarking, as an important enabler for trading personal data in a safe, but also accountable manner. Digital watermarking is only one pillar of our efforts towards establishing data exchange systems that are accountable and private by design. More details on this soon!
There are currently no job offers in this section.
- Group leader: Nikolaos Laoutaris
- Email: email@example.com
- Contact phone: +34 91 4 816 995
- Fax: 3491481696
Office & Postal Address
- IMDEA Networks Institute
- Avda. del Mar Mediterraneo, 22
- 28918 Leganes (Madrid)