Here is a (not-so-short) overview of my research:

Brownout or How to Deal with Capacity Shortage

I contributed to a self-adaptive software engineering technique called brownout to increase cloud application resilience to capacity shortage. The developer only needs to mark a code as optional and an external controller decides for which requests to enable optional code, so as to maintain a given target response time.

For example, online shops usually offer end-users recommendations of similar products they might be interested in. No doubt, recommender engines greatly increase the user experience, which translates to higher owner revenue. However, due to their sophistication, such engines are highly demanding on computing resources. By selectively activating or deactivating recommendations, the application’s capacity requirements can be controlled at the expense of end-user experience.

Brownout can be employed either with a periodic controller or an event-driven controller. It helps avoid capacity shortage in the following situations:

To identify such optional code, our methodology for retrofitting brownout in existing applications can be used.

Brownout, or more generally any graceful degradation scheme, can be used to increase data-center utilization by trimming peak resource demand, provided the right monetary incentives are in place.

Research in this direction has lead to the following artefacts:

Post-Copy Live Migration

libvirt is a library that abstracts various virtualization technologies (e.g., Xen, qemu, VMware) and allows to securely manage virtual machines on a server, either locally or remotely. I contributed with post-copy migration support for qemu, which is a technique to achieve near-zero migration downtime. Post-copy ensure that live migration terminates, even if the application dirties memory faster than the network can transfer dirty pages.

Post-copy is a building block for business continuity as a service.

Vertical Elasticity

Reducing the granularity of allocating resources, both in amount and duration, increases efficiency. In cloud computing, the course-grained allocation of whole virtual machines to an application – called horizontal elasticity – can be replaced by fine-grained allocation of fraction of CPU cores, megabytes of memory for durations as short as 1 second – called vertical elasticity. The challenge consists in devising performance models and controllers to accurately predict and allocate the amount of computing and memory capacity that an application requires so as to meet a given target performance.

I worked on CPU elasticity controller to reach a given average and tail response time. I also worked on coordinating the proposed CPU elasticity controller with a memory elasticity controller with a fuzzy-logic controller. Steering a cloud benchmark application towards CPU-hungriness or memory-hungriness can easily be achieved using httpmon.

Such controllers can then be used for performance-based service differentiation. Application are marked with a class – gold, silver and bronze – and a target performance – response time or throughput – and the system allocates resources so that higher-priority classes never observe performance degradation before lower-priority classes.

Cooperative Resource Management for HPC

I worked on designing and prototyping resource management systems that better support certain classes of HPC applications, in particular:

This research has lead to the following prototype code and publications: