Brownout or How to Deal with Capacity Shortage
I contributed to a self-adaptive software engineering technique called brownout to increase cloud application resilience to capacity shortage. The developer only needs to mark a code as optional and an external controller decides for which requests to enable optional code, so as to maintain a given target response time.
For example, online shops usually offer end-users recommendations of similar products they might be interested in. No doubt, recommender engines greatly increase the user experience, which translates to higher owner revenue. However, due to their sophistication, such engines are highly demanding on computing resources. By selectively activating or deactivating recommendations, the application’s capacity requirements can be controlled at the expense of end-user experience.
Brownout can be employed either with a periodic controller or an event-driven controller. It helps avoid capacity shortage in the following situations:
Flash-crowds: An application suddenly becomes popular due to it being referenced from a popular website.
Overbooking mis-predictions: The cloud management systems has admitted more applications than available capacity in the data-center. Coordination between the management system and the brownout application is necessary to inform the system that, despite its responsiveness, the application has disabled optional code to deal with capacity shortage.
Cascading failures: Several replicas of an application fail at the same time. A brownout-aware load-balancer is required to optimize the amount of optional content served among replicas.
To identify such optional code, our methodology for retrofitting brownout in existing applications can be used.
Brownout, or more generally any graceful degradation scheme, can be used to increase data-center utilization by trimming peak resource demand, provided the right monetary incentives are in place.
Research in this direction has lead to the following artefacts:
Post-Copy Live Migration
libvirt is a library that abstracts various virtualization technologies (e.g., Xen, qemu, VMware) and allows to securely manage virtual machines on a server, either locally or remotely. I contributed with post-copy migration support for qemu, which is a technique to achieve near-zero migration downtime. Post-copy ensure that live migration terminates, even if the application dirties memory faster than the network can transfer dirty pages.
Post-copy is a building block for business continuity as a service.
Reducing the granularity of allocating resources, both in amount and duration, increases efficiency. In cloud computing, the course-grained allocation of whole virtual machines to an application – called horizontal elasticity – can be replaced by fine-grained allocation of fraction of CPU cores, megabytes of memory for durations as short as 1 second – called vertical elasticity. The challenge consists in devising performance models and controllers to accurately predict and allocate the amount of computing and memory capacity that an application requires so as to meet a given target performance.
I worked on CPU elasticity controller to reach a given average and tail response time. I also worked on coordinating the proposed CPU elasticity controller with a memory elasticity controller with a fuzzy-logic controller. Steering a cloud benchmark application towards CPU-hungriness or memory-hungriness can easily be achieved using httpmon.
Such controllers can then be used for performance-based service differentiation. Application are marked with a class – gold, silver and bronze – and a target performance – response time or throughput – and the system allocates resources so that higher-priority classes never observe performance degradation before lower-priority classes.
Cooperative Resource Management for HPC
I worked on designing and prototyping resource management systems that better support certain classes of HPC applications, in particular:
Moldable applications can run on more or fewer computing resources, but once started the resource allocation can no longer be changed. Most non-embarrassing parallel applications based on MPI are of this type: Assuming their memory requirements are met, they can be executed on a wide range of number of CPU cores. However, since data partitioning is only performed at initialization, the number of CPU cores needs to be constant during execution.
Evolving applications change their resource requirements during execution. For example, a numeric simulation that uses adaptive-mesh refinement may require more or less memory depending on whether turbulences appear. Note that, in case of evolving applications, the change in resource allocation is demanded by the application itself.
Malleable application may change the amount of resources they use during execution. For example, an embarrassingly parallel application may run on fewer or more CPU cores depending on how many are allocated to it. The change in resource allocation is imposed by the system. Malleable applications can be used to fill idle capacity as required to compensate for uncertainly in resource requirements of another, evolving application.
This research has lead to the following prototype code and publications:
CooRM-e is a resource manager for evolving and malleable applications. For futher details, please refer to this paper. The software artifact is deposited at the French Agency for Software Protection under reference 2013-12-18, IDDN.FR.001.520004.000.S.P.2013.000.10100
CooRM-m is a resource manager for moldable applications. For futher details, please refer to this paper. The software artifact is deposited at the French Agency for Software Protection under reference 2013-12-13, IDDN.FR.001.500041.000.S.P.2013.000.10100