Smartphones have become inherent to the every day life of billions of people worldwide, and they are used to perform activities such as gaming, interacting with our peers or working. While extremely useful, smartphone apps also have drawbacks, as they can affect the security and privacy of users. Android devices hold a lot of personal data from users, including their social circles (e.g., contacts), usage patterns (e.g., app usage and visited websites) and their physical location. Like in most software products, Android apps often include third-party code (Software Development Kits or SDKs) to include functionality in the app without the need to develop it in-house. Android apps and third-party components embedded in them are often interested in accessing such data, as the online ecosystem is dominated by data-driven business models and revenue streams like advertising.
The research community has developed many methods and techniques for analyzing the privacy and security risks of mobile apps, mostly relying on two techniques: static code analysis and dynamic runtime analysis. Static analysis analyzes the code and other resources of an app to detect potential app behaviors. While this makes static analysis easier to scale, it has other drawbacks such as missing app behaviors when developers obfuscate the app’s code to avoid scrutiny. Furthermore, since static analysis only shows potential app behavior, this needs to be conﬁrmed as it can also report false positives due to dead or legacy code. Dynamic analysis analyzes the apps at runtime to provide actual evidence of their behavior. However, these techniques are harder to scale as they need to be run on an instrumented device to collect runtime data. Similarly, there is a need to stimulate the app, simulating real inputs to examine as many code-paths as possible. While there are some automatic techniques to generate synthetic inputs, they have been shown to be insufﬁcient.
In this thesis, we explore the beneﬁts of combining static and dynamic analysis techniques to complement each other and reduce their limitations. While most previous work has often relied on using these techniques in isolation, we combine their strengths in different and novel ways that allow us to further study different privacy issues on the Android ecosystem. Namely, we demonstrate the potential of combining these complementary methods to study three inter-related issues:
- A security analysis of the unauthorized access to permission-protected data without user consent. We use a novel technique that combines the strengths of static and dynamic analysis, by ﬁrst comparing the data sent by applications at runtime with the permissions granted to each app in order to ﬁnd instances of potential unauthorized access to permission protected data. Once we have discovered the apps that are accessing personal data without permission, we statically analyze their code in order to discover covert- and side-channels used by apps and SDKs to circumvent the permission system. This methodology allows us to discover apps using the MAC address as a surrogate for location data, two SDKs using the external storage as a covert-channel to share unique identiﬁers and an app using picture metadata to gain unauthorized access to location data.
- A novel SDK detection methodology that relies on obtaining signals observed both in the app’s code and static resources and during its runtime behavior. Then, we rely on a tree structure together with a conﬁdence based system to accurately detect SDK presence without the need of any a priory knowledge and with the ability to discern whether a given SDK is part of legacy or dead code. We prove that this novel methodology can discover third-party SDKs with more accuracy than state-of-the-art tools both on a set of purpose-built ground-truth apps and on a dataset of 5k publicly available apps. With these three case studies, we are able to highlight the beneﬁts of combining static and dynamic analysis techniques for the study of the privacy and security guarantees and risks of Android apps and third-party SDKs. The use of these techniques in isolation would not have allowed us to deeply investigate these privacy issues, as we would lack the ability to provide real evidence of potential breaches of legislation, to pin-point the speciﬁc way in which apps are leveraging cover and side channels to break Android’s permission system or we would be unable to adapt to an ever-changing ecosystem of Android third-party companies.
About Álvaro Feal
Álvaro Feal received his Bachelor’s in Computer Engineering from Universidade da Coruña and his Master’s in Software and Systems from Universidad Politécnica de Madrid. He is now a PhD student working at IMDEA Networks Institute under Prof. Narseo Vallina-Rodriguez’s advice. He works in analyzing privacy threats in the mobile and web ecosystem using static and dynamic analysis techniques as well as network measurements. He has published in different venues such as ConPro, CPDP, IMC, PETS and USENIX Security, receiving a Distinguished Paper Award in the latter.
PhD Thesis Advisor: Dr. Narseo Vallina-Rodríguez, IMDEA Networks Institute, Spain
University: University Carlos III of Madrid, Spain
Doctoral Program: Telematic Engineering
PhD Committee members:
- President: Nataliia Bielova, Tenured Research Scientist, INRIA, France
- Secretary: Guillermo Suarez-Tangil, Research Assistant Professor, IMDEA Networks Institute, Madrid, Spain
- Panel member: Ben Stock, Tenured Faculty, CISPA Helmholtz Center for Information Security, Germany