Efficient resource management in cloud data centersa machine learning approach

  1. Lorido Botrán, Tania
Dirigida por:
  1. Sergio Huerta Lara Director/a
  2. Borja Sanz Urquijo Director/a

Universidad de defensa: Universidad de Deusto

Fecha de defensa: 10 de enero de 2019

Tribunal:
  1. Emilio Santiago Corchado Rodríguez Presidente
  2. Diego López de Ipiña González de Artaza Secretario/a
  3. Muhammad Khurram Bhatti Vocal

Tipo: Tesis

Resumen

This thesis analyses the resource allocation problem in cloud data centers. Those typically run millions of applications in a virtualized environment (Virtual Ma- chines or containers) that serves are the middleware layer between the physical servers and the user applications. Virtualization brings many benefits in resource sharing, with the easiness in creating and destroying instances, migrating VM from one physical machine to another, and great support for an ”on-demand” computational model. However, all these benefits do not come for free. They impose new challenges in resource sharing among different user applications. This dissertation studies the different problems associated with the resource management problem in cloud data centers, that are mainly: (1) Virtual Machine placement, (2) application auto-scaling, and (3) interference detection. The resources offered by physical servers, organized in several data centers, are provided in the form of abstract compute units that are implemented as Virtual Machines (VMs). Each VM is assigned a pre-configured set of resources, including: number of cores, amount of memory, disk and network bandwidth.Virtualized data centers support a large variety of applications, including batch jobs (typically used for scientific applications), and web applications (e.g. an online bookshop). Each application is deployed on a set of VMs, which can be allocated to any collection of physical servers in the data center. The problem of assigning a physical location to each VM is known as VM placement and it is performed by the manager of the cloud infrastructure. Cloud computing environments offer the user the capability of running their applications in an elastic manner, using only the resources they need, and paying for what they use. However, to take advantage of this flexibility, it is advisable to use an auto-scaling technique that adjusts the resources to the incoming workload, both reducing the overall cost and complying with the Service Level Objective. We propose a taxonomy consisting of five categories to classify state-of-art auto- scaling techniques: static threshold-based rules, time series analysis, control theory, reinforcement learning and queuing theory. Furthermore, we present a comparison of some auto-scaling techniques (both reactive and proactive) proposed in the literature, plus a new approaches based on rules with dynamic thresholds. Results show that dynamic thresholds avoid the bad performance derived from a bad threshold selection. VMs or containers with different resource usage needs may be co-located in the same physical machine. Resource sharing (including CPU, memory or cache) may cause resource contention bottleneck, i.e., two VMs (or group of containers) compete for the same resources, but the resource capacity is not enough for both of them. This leads to anomalies in the resource usage of the application that may penalize application performance. The noisy neighbor problem occurs when an application takes away the resources assigned to another one. We propose a lightweight unsupervised algorithm that is able to effectively detect these anomalies on different types of applications. Our detection algorithm is based on clustering comparison and uses some novel distance metrics. The final contribution of this thesis is a set of new distance metrics that target the comparison of hard clusterings.