Online advertising has evolved into a key component of the Internet we know today. It is a very complex ecosystem that accomplishes to reach billions of users in a short period of time. It has global coverage, and it is able to target specific audiences based on demographic, geographic, and behavioral aspects. The capabilities offered by the online advertising ecosystem have opened a new era in research that has attracted the interest of the scientific community.
This thesis leverages the nature of online advertising and builds a novel methodology capable of inserting JavaScript code into an ad that runs every time it is displayed on a user’s device. This methodology opens up new measurement opportunities. Specifically, this methodology is applied for two different purposes in this thesis: (1) Performing network measurements from the end-user perspective, and (2) Auditing the transparency of the online advertising ecosystem from the advertisers’ perspective.
In the context of Internet measurements, this methodology is implemented in a solution referred to as AdTag. Its design – including technical, deployability, and economic factors – and its potential to analyze a wide range of aspects of Internet connectivity from the browser are discussed and evaluated. Several experiments are performed that prove the ability of AdTag to reach millions of nodes in a short period of time. Furthermore, the possibility of selecting the measurement nodes based on its geographical location is also demonstrated. In this thesis, we showcase the utility of AdTag to conduct network measurements in two specific use cases.
First, we study the DNS infrastructure, one of the most critical Internet systems. Our analysis addresses issues such us grasping the real DNS infrastructure configured by the ISPs, and understanding the end-users DNS choices, whether they use private ISPs’ resolvers or establish third-party DNS resolvers, to improve security and web performance. Harnessing the scale offered by the online advertising ecosystem, two ad campaigns have been launched, triggering more than 3M DNS lookups, which allow the identification and study of more than 76k recursive DNS resolvers supporting more than 25k eyeball ASes in 178 countries. The data analysis provides new insights into the DNS infrastructure, such as user preferences towards third-parties. Our results indicate that 13% of users use third-party DNS providers (such as Google, OpenDNS, Level 3, and Cloudflare). Besides, this research detects different deployment decisions of many ISPs that provide both mobile and fixed access networks to separate the DNS infrastructure that serves each access technology type.
The second considered use case consists of analyzing the browser market landscape with active measurements. We leverage AdTag to develop an active measurement platform to obtain the brand and the version of the device receiving the ad. We prove that the landscape picture obtained with our methodology is very similar to that offered by state-of-the-art techniques based on passive measurements. However, our solution presents some advantages over passive solutions: the ability to conduct geographically and demographically targeted measurements and its accessibility to a larger group of scientists and practitioners. The performance, accuracy, and capabilities of this methodology are analyzed through real experiments that, in total, produced more than 6M measurements.
The lack of transparency in the online advertising ecosystem motivates the second part of this thesis. In particular, we have developed Q-Tag, a novel methodology that serves to audit reported quality metrics so that advertisers can obtain trustable information about the real performance of their advertising campaigns.
The first version of Q-Tag was deployed in Google AdWords. The results reveal that AdWords seems to provide incomplete information to advertisers. In particular, they show that: (i) AdWords did not report 57% of the publishers where ad impressions from our campaigns were delivered, (ii) AdWords reports a large fraction of contextually significant impressions based on (undisclosed) criteria other than publisher’s theme, (iii) higher CPM investment does not lead to impressions being delivered to more popular publishers, (iv) AdWords does not offer default control of frequency cap (limit of impressions per user), (v) about 10% of ad impressions in two of the campaigns were delivered to IPs from Data Centers.
The second version of Q-Tag was developed to measure the \emph{viewability} metric. This standard metric serves to assess whether an ad impression was viewed or not by a user. \cleartag has been deployed in production by a Demand Side Platform (DSP) to measure the viewability rate of the ad campaigns. Taking advantage of the infrastructure of this DSP, the performance of Q-Tag has been compared with a commercial solution. Both techniques report a similar overall viewability rate of 50% (i.e., 50% of the ad impressions meet the viewability standard and thus are considered viewed). However, Q-Tag is able to measure the viewability metric in 93% of the ads served by the DSP, unlike 74% of the ads measured by the commercial solution.
In summary, the research conducted in this thesis showcases the potential of the proposed large-scale ad-based measurement. It offers a wider range of possibilities beyond those presented in this thesis. A methodology that can unravel different aspects of the Internet infrastructure and performance from the user perspective as well as provide an independent tool for advertisers to measure the quality of their advertising campaigns.
About Patricia Callejo
Patricia obtained her BSc in Audiovisual Systems Engineering from University Carlos III of Madrid in October 2015. She has pursued her MSc from the same university along with her PhD, both in the field of Telematics Engineering. Prior to her incorporation to IMDEA Networks as a PhD Student, Patricia Callejo worked as an Internship Student in the same Madrid based research institute. Between her studies and her work, she also performed research work in collaboration with the University Carlos III of Madrid, studying advertising behavior and data analysis in social networks.
The thesis defense will be conducted in English. Given the coronavirus crisis, the thesis defense will be online.
PhD Thesis Advisor: Dr. Rubén Cuevas, University Carlos III of Madrid, Spain
University: University Carlos III of Madrid, Spain
Doctoral Program: Telematics Engineering
PhD Committee members: