Comprehensive data-driven software analysis for the 21st century
IMDEA Networks is the beneficiary of this project
  • Financed by: Ministry of Science and Innovation (MICIN) and the European Union "NextGenerationEU"/PRTR EIN2020-112344
  • Duration: November 2020 to October 2022
  • Contact: Narseo VALLINA-RODRÍGUEZ, Principal Investigator for IMDEA Networks

Modern software development and deployment has experienced profound changes in the last decade with the massive adoption of agile development methodologies, the adoption of data-driven business models, and the rise of feature-rich mobile, web, and IoT platforms capable of accessing, storing, and processing sensitive personal and contextual data. The combination of these factors has created societal problems with far-reaching repercussions as demonstrated by recent privacy scandals. The development of new regulatory efforts like the GDPR to protect citizens’ privacy is unable to correct industry practices, and it has created new frictions between privacy engineering (and regulation) on the one end, and software engineering and software monetization models on the other end. In fact, it is unclear whether just policy and regulation is sufficient deterrent against both current and future industry malpractices without the use of technologies for law enforcement at scale.

Unfortunately, our current arsenal of software analysis methods is unable of obtaining a complete, empirical, and reproducible picture of modern software behavior due to their incomplete and misaligned assumptions and targets, as well as their inability to scale to millions of software samples. Most of the available tools focus on detecting malicious behaviors (often excluding personal data collection as it is considered a user choice) such as abuse, fraud, or well-known malware families. Second, they are unable to fully capture the complexity of todays software. Todays software relies on techniques like computation offloading to the cloud (which impedes the access to software logic and binaries), their increasing integration with the physical environment, the use of new human-interaction technologies like voice commands, the integration of third-party components to ease development and monetize software, or the use of anti-monitoring techniques. These features render current static and dynamic analysis techniques insufficient.

In this proposal, I seek for funding to support the preparation of an ERC proposal that aims to tackle the limitations of existing software analysis techniques. In this project, I aim to bridge the gap between the research, technical and policy communities to deal with emerging and future online privacy and security problems more effectively. First, I will systematize the knowledge of current software analysis techniques, identifying their limitations and strengths. Then, I will incorporate privacy engineering, policy, regulation, and consumer protection perspectives in software analysis methods to define the principles of next-generation tools. These techniques will be showcased by analyzing software released in major platforms to obtain a better understanding of software engineering practices and their friction with privacy engineering and regulatory requirements in the wild.

This project will help to bridge the gap between the research, technical and policy communities by incorporating their complementary perspectives and methodologies in a unifying research framework. My ambition is to establish the foundations for future cross-disciplinary privacy studies by designing, developing, and validating reproducible software analysis and research methods to transition from isolated and anecdotal observations to a holistic and fundamental knowledge of the software development processes, and their inherent online privacy risks.

The project EIN2020-112344 has been funded by the Ministry of Science and Innovation (MCIN/AEI/10.13039/501100011033) and the European Union “NextGenerationEU”/PRTR under the Call “Europa Investigación” 2020.