With the emergence of edge computing, the problem of offloading jobs between an Edge Device (ED) and an Edge Server (ES) received significant attention in the past. Motivated by the fact that an increasing number of applications are using Machine Learning (ML) inference from the data samples collected at the EDs, we study the problem of offloading inference jobs by considering the following novel aspects: in contrast to a typical computational job, the processing time of an inference job depends on the size of the ML model, and recently proposed Deep Neural Networks (DNNs) for resource-constrained devices provide the choice of scaling the model size. Considering multiple ML models are available at the ED, and a powerful ML model at the ES, we study the problem of distributing jobs between the different available Machine Learning Models with the goal of maximising the total inference accuracy subject to a time constraint T.
To address this problem, we propose an approximation algorithm: Accuracy Maximisation using LP-Relaxation and Rounding (AMR2). In the case all the samples are identical in size we propose Accuracy Maximisation using Dynamic Programming (AMDP). As a proof of concept, we show the results of AMR2 implemented for an Image Classification application.
About Andrea Fresa
Andrea Fresa is a Ph.D. Researcher at IMDEA Networks Institute. He received his bachelor’s and master’s degrees in Computer Science Engineering at the University of Naples Federico II. He held his research for his Master Thesis in the IoT Team at Ericsson Research in Helsinki, where he focused on developing a platform for Controlling devices from heterogeneous IoT eco-systems by converting Data and Interaction Model to a Common Model. Currently, he is part of Edge Networks Group, and his main research interest is the design of algorithms for Edge Computing.
This event will be conducted in English