Reinforcement Learning (RL), in which an agent is trained to make the most favourable decisions in the long run, is an established technique in artificial intelligence. Its popularity has increased in the recent past, largely due to the development of deep neural networks spawning deep reinforcement learning algorithms such as Deep Q-Learning. The latter have been used to solve previously insurmountable problems, such as playing the famed game of “Go” that previous algorithms could not. Many such problems suffer the curse of dimensionality, in which the sheer number of possible states is so overwhelming that it is impractical to explore every possible option.
While these recent techniques have been successful, they may not be strictly necessary or practical for some applications such as cloud provisioning. In these situations, the action space is not as vast and workload data required to train such systems is not as widely shared, as it is considered commercially sensitive by the Application Service Provider (ASP). Given that provisioning decisions evolve over time in sympathy to incident workloads, they fit into the sequential decision process problem that legacy RL was designed to solve. However, because of the high correlation of time series data, states are not independent of each other, and the legacy Markov Decision Processes (MDPs) must be cleverly adapted to create robust provisioning algorithms.
As the first contribution of this thesis, we exploit the knowledge of both the application and configuration to create an adaptive provisioning system leveraging stationary Markov distributions. We then develop algorithms that, with neither application nor configuration knowledge, solve the underlying Markov Decision Process (MDP) to create provisioning systems. Our Q-Learning algorithms factor in the correlation between states and the consequent transitions between them to create provisioning systems that do not only adapt to workloads, but can also exploit similarities between them, thereby reducing the retraining overhead. Our algorithms also exhibit convergence in fewer learning steps given that we restructure the state and action spaces to avoid the curse of dimensionality without the need for the function approximation approach taken by deep Q-Learning systems.
A crucial use-case of future networks will be the support of low-latency applications involving highly mobile users. With these in mind, the European Telecommunications Standards Institute (ETSI) has proposed the Multi-access Edge Computing (MEC) architecture, in which computing capabilities can be located close to the network edge, where the data is generated. Provisioning for such applications therefore entails migrating them to the most suitable location on the network edge as the users move. In this thesis, we also tackle this type of provisioning by considering vehicle platooning or Cooperative Adaptive Cruise Control (CACC) on the edge. We show that our Q-Learning algorithm can be adapted to minimize the number of migrations required to effectively run such an application on MEC hosts, which may also be subject to traffic from other competing applications.
Constantine Ayimba received his M.Sc. in Wireless Communications from Lund University (Sweden) in 2016 where he was a Swedish Institute Scholar. He is currently pursuing his PhD at the IMDEA Networks Institute and the University Carlos III of Madrid (Spain) where his research focuses on machine learning for Network Management.
PhD Thesis Advisor: Dr. Vincenzo Mancuso, IMDEA Networks Institute, Spain
Co-advisor: Dr. Paolo Casari, University of Trento, Italy
University: University Carlos III of Madrid, Spain
Doctoral Program: Telematics Engineering
PhD Committee members: