It is notoriously difficult to make distributed systems reliable. This becomes even harder in the case of the widely-deployed systems that are heterogeneous (multiple implementations) and federated (multiple administrative entities). The set of routers in charge of the Internet’s inter-domain routing is a prime example of such a system.
We argue that a key step in making these systems reliable is the need to automatically explore the system behavior to check for potential faults. In this talk, I will describe the design and implementation of DiCE, a system for online testing of heterogeneous and federated distributed systems. DiCE runs concurrently with the production system by leveraging distributed checkpoints and isolated communication channels. DiCE orchestrates the exploration of relevant system states by controlling the inputs that drive system actions. While respecting privacy among different administrative entities, DiCE detects faults by checking for violations of properties that capture the desired system behavior. We demonstrate the ease of integrating DiCE with a BGP router and a DNS server, the building blocks of two vital services in the Internet. Our evaluation in the testbed shows that DiCE quickly and successfully detects three important classes of faults, resulting from configuration mistakes, policy conflicts and programming errors.
Joint work with Marco Canini, Vojin Jovanovic, Daniele Venzano, Gautam Kumar, Dejan Novakovic, Boris Spasojevic, and Olivier Crameri.
Who is Dejan Kostić?
Dejan Kostić obtained his Ph.D. in Computer Science at the Duke University, under Amin Vahdat. He spent the last two years of his studies and a brief stay as a postdoctoral scholar at the University of California, San Diego. He received his Master of Science degree in Computer Science from the University of Texas at Dallas, and his Bachelor of Science degree in Computer Engineering and Information Technology from the University of Belgrade (ETF), Serbia. In January 2006, he started as a tenure-track assistant professor at the School of Computer and Communications Sciences at EPFL (Ecole Polytechnique Fédérale de Lausanne), Switzerland. In 2010, he received a European Research Council (ERC) Starting Investigator Award. His interests include Distributed Systems, Computer Networks, Operating Systems, and Mobile Computing.
The conference will be conducted in English