This thesis addresses two critical challenges in Federated Learning (FL): enhancing Byzantine robustness to thwart poisoning attacks and advancing privacy-preserving mechanisms to mitigate data leakage risks.
To address the first challenge, we develop a robust FL aggregation scheme based on subjective logic and residual-based attack detection.
This scheme retains historical data from previous rounds to build client reputations. Employing a combination of theoretical analysis, trace-driven simulation, and experimental validation with a prototype and real users, we show that our FL classifier can detect sensitive content with high accuracy, learn new labels rapidly, and remain robust against poisoning attacks from malicious users as well as imperfect input from non-malicious ones.
Additionally, we identify a major weakness in FL—its vulnerability to poisoning attacks—as a consequence of the one person one vote (henceforth 1p1v) principle underlying most contemporary aggregation rules. To overcome this, we introduce FedQV, a novel aggregation algorithm built upon the quadratic voting scheme, recently proposed as a better alternative to 1p1v-based elections. Our theoretical analysis establishes that FedQV is a truthful mechanism in which bidding according to one’s true valuation is a dominant strategy that achieves a convergence rate matching that of state-of-the-art methods. Furthermore, our empirical analysis using multiple real-world datasets validates the superior performance of FedQV against poisoning attacks. It also shows that combining FedQV with unequal voting budgets according to a reputation score increases its performance benefits even further. FedQV can also be easily combined with Byzantine-robust privacy-preserving mechanisms to enhance its robustness against both poisoning and privacy attacks.
To address the second challenge, we first characterize the privacy offered by pruning. We establish information-theoretic upper bounds on the information leakage from pruned FL and we experimentally validate them under state-of-the-art privacy attacks across different FL pruning schemes. Furthermore, we introduce Priprune, a privacy-aware algorithm for pruning in FL. Priprune uses defense pruning masks, which can be applied locally after any pruning algorithm, and adapts the defense pruning rate to jointly optimize privacy and accuracy. A key idea in the design of Priprune is pseudo-pruning: it undergoes defense pruning within the local model and only sends the pruned model to the server; while the weights pruned out by defense mask are withheld locally for future local training rather than being removed.
We show that Priprune significantly improves the privacy-accuracy tradeoff compared to state-of-the-art pruned FL schemes.
Addressing the critical challenges of robust aggregation and privacy-aware model training is crucial for integrating FL into practical scenarios. Through rigorous theoretical analysis and comprehensive empirical evaluations, this thesis demonstrates that the proposed methods significantly enhance FL’s robustness and privacy preservation. By providing innovative tools and strategies to overcome these obstacles, this thesis establishes FL as a secure and privacy-preserving learning paradigm, paving the way for its widespread adoption across diverse domains and applications.
Tianyue Chu is a final-year PhD candidate at IMDEA Networks Institute and Universidad Carlos III de Madrid in Spain, advised by Dr.Nikolaos Laoutaris. Her research focuses on the privacy and security implications in machine learning and distributed learning. During her PhD, she has published her research in top-tier conferences such as NDSS, ACM SIGMETRICS, and the IEEE ISIT. Tianyue also serves on the TPC of IEEE SECON 2023,2024, AISCC NDSS 2024, and ACM S3 2024.
PhD Thesis Advisor: Nikolaos Laoutaris, IMDEA Networks Institute, Madrid
University: Universidad Carlos III de Madrid
Doctoral Program: Telematic Engineering
PhD Committee members: