Online web services have grown dramatically in size and diversity in the last years, becoming essential components of our daily life and allowing us to conduct elementary tasks like working, getting informed, or keeping in contact with relatives and friends.
However, all the changes and evolution experimented on by the online web services had not have been possible without implementing a profitable economic model that sustains it. Despite a suitable percentage of these services being fee-based, they represent a lucrative business that generates billions of dollars, allowing the creation of some of the biggest companies in the world in terms of market capitalization, like Alphabet Inc. or Meta Inc. (Previously known as Facebook Inc.). Being costless and lucrative is possible due to an advertising-based monetization model, which consists of delivering ads to the users in exchange for their services (e.g., Facebook or YouTube). Although online advertising dates back to the middle of the 90s, its popularity has experienced an increase among brands and advertising agencies in the last decade, mainly due to its capacity to reach precise audiences at a low cost.
Converting online web services into advertising walls is a double-edged sword for the users. The capacity offered by online advertising to segment their audiences requires a massive collection of personal data from the users, including their web browsing histories or even more invasive data such as age, gender, or location to infer the online profile of the users. This data collection is possible due to implementing a complex tracking ecosystem by online advertising companies from which multiple stakeholders collect, process, and exchange information. The many privacy cases of abuse inflicted by this industry motivated the implementation of new regulatory efforts to protect consumers’ privacy in the last years. Some notable examples are the General Data Protection Regulation (GDPR)[1] in the European Union or the California Consumer Privacy Act Regulations (CCPA)[2]in California, USA. Further, these privacy regulations typically contain specific provisions and strict requirements for websites that provide sensitive material to end users, including sexual, religious, and health services.
Implementing new regulatory frameworks, alongside the growth of online web services, forces an endless evolution of current techniques to study and audit online web services. Furthermore, there is a need to emphasize the online advertising ecosystem, as it represents the primary economic support of a high percentage of web services. Also, the activities and abuses conducted by this ecosystem drove the implementation of current privacy regulations to control the use and collection of personal data.
This dissertation falls within the topics of Internet measurements, tackling the need for new measurement techniques and methodological approaches to audit and study online web services. Precisely, this dissertation analyzes three aspects of the web. First, we implement a methodology to study sensitive websites, including their potential lack of regulatory compliance. Then, we put into practice our approach by analyzing the pornographic web ecosystem, opening the debate on the need to study and identify web privacy problems from a macroscopic perspective, as the web contains semi-decoupled and highly sensitive subsystems. Second, we look deeply at the suitability and adequacy of domain classification services commonly used by the research community to conduct domain-dependent research studies, including those studying sensitive websites. Finally, we implement a novel methodology to audit the quality and performance of the profiles that Meta (Facebook) and Google create about the users and their ad targeting algorithms. This study also includes an analysis of the transparency tools these two companies offer to the users concerning the process of distributing tailored ads.
In summary, this dissertation brings new methodologies and results to increase our limited knowledge about the web.
Pelayo received his BSc in Computer Science and his M.S in Telematics Engineering from Universidad Carlos III de Madrid in 2016 and 2017, respectively. He extended his studies with a 10-month internship at NEC Lab Europe (Heidelberg, Germany). He is now enrolled in IMDEA Networks Institute as Ph.D. Student working at Global Computing Group. His research interests fall in the area of web privacy, regulatory compliance. and social networks. He has published in international peer-reviewed conferences such as ACM IMC (2019 and 2020) and WWW (2019).
Supervisor de tesis: Dr. Antonio Fernández Anta, IMDEA Networks Institute y Dr. Rubén Cuevas Rumín
Universidad: Universidad Carlos III de Madrid, España
Programa de doctorado: Ingeniería Telemática
Miembros del tribunal: