Publish/subscribe (pub/sub) is a popular communication paradigm in the design of large-scale distributed systems. We are witnessing an increasingly widespread use of the pub/sub for wide array of applications both in industry and academia and yet there is a lack of detailed study of a large-scale real-world pub/sub system.
Inspired by the peer-assisted solution used by Spotify to stream music, we explore a similar solution to disseminate messages of Spotify pub/sub to the users. The task of distributing the workload among user peers and datacenter servers prompts a fundamental problem: How to select a subset of pub/sub workload to be served by datacenter servers in a manner to maximize satisfaction requirements of users under resource constraints?
In our recent work, we provide, to the best of our knowledge, the first formal treatment of the above problem by introducing two metrics that capture subscriber satisfaction in the presence of limited resources. This allows us to formulate the problem as two new flavors of maximum coverage optimization problems.
Unfortunately, both variants of the problem prove to be NP-hard. By subsequently providing formal approximation bounds and heuristics, we show, however, that efficient approximations can be attained. We validate our approach using real-world traces from Spotify and show that our solutions can be executed periodically in real-time in order to adapt to workload variations.
Further, we try to answer the following three fundamental questions: Given a pub/sub workload, (1) what is the minimum amount of resources needed to satisfy all the subscribers, (2) what is a cost-effective way to allocate resources for the given workload, and (3) what is the cost of hosting it on a public Infrastructure-as-a-Service (IaaS) provider like Amazon EC2.
About Vinay Setty
Vinay Setty is a PhD candidate at Networks and Distributed Systems group, Department of Informatics, University of Oslo, Norway. His advisors are Prof. Roman Vitenberg, Professor, University of Oslo and Prof. Maarten van Steen, professor at Department of Informatics, VU University Amsterdam. His broad research areas include data dissemination in large-scale distributed systems, publish/subscribe systems, peer-to-peer, peer-assisted and cloud-assisted architectures, large-scale data/trace analysis, personalization and processing of Big Data in real-time.
Prior to starting PhD at University of Oslo he received master’s degree from Computer Science department of University of Saarland, Germany. He wrote my master’s thesis in the databases group headed by Prof. Gerhard Weikum at Max Planck Institute for Informatics, Germany.
This event will be conducted in English