Planning to optimise stack settings by hand? Good luck with that...

Here at SKIPJAQ we use machine learning to automatically optimise certain settings in the stack of an application. The goal of optimisation is to improve the performance of an application in order to unlock an exhaustive list of benefits, but for the purposes of this blog post we'll assume you already know what speeding up an application is liable to do to things like revenue, churn, productivity and so on.

In a situation where a customer uses our platform to optimise the 'full stack' of an application - i.e. the operating system, webserver and runtime underlying the application - our machine learning engine is tasked with tuning approximately 40 settings simultaneously.

It should be fairly obvious that tuning 40 interconnected and interrelated settings simultaneously is very difficult, even for a machine, but what if a person wanted to tune just ONE setting using brute force (i.e. testing every possible parameter for that setting to discover the optimum parameter to use)? How would that work out?

Let's assume they/you wanted to find the optimum parameter for the memory setting in the runtime layer of an application. Even if you are optimising the stack of a small service, this parameter will have over 4000 possible values. Even if we remove practical considerations related to executing tests, testing each value will take a minimum of 1 hour. That means that testing every value for just the memory setting will take you more than six months - and remember, that's a generous estimate.

At this point some readers may be thinking that six months of testing won't really be required, because a pattern is likely to emerge after a short while. Crucially, this isn't the case. The relationship between stack settings and latency (the load time delays our customers need to reduce in order for their businesses to thrive) is non-intuitive. The chart below gives a good idea of what this non-intuitive relationship looks like. The blue dots at the bottom of the troughs indicate the position of the best-performing values, the red dot indicates the default value our dummy customer is currently using. More importantly, the line demonstrating the relationship between different values and latency plots a rollercoaster-like route across the graph.

So. It's not possible to guess what value to allocate to the memory parameter; you need to test every value. One of many problems with this approach is obvious: you don't have six months to get it right. Your application code will have changed hundreds of times while you've been optimising that one setting - and that means the optimum setting you've discovered will no longer be the optimum when you come to apply it to your stack.

If you're going to turbocharge the performance of your service/application/website and unlock an impressively long list of benefits through the process of optimising stack settings, you need a solution that works quickly - so that whenever you have a new build you can run the process and stay optimised. Performance testing and optimisation need to be continuous, which is why it's only a matter of time before optimisation becomes an accepted part of IT's fabled Continuous Delivery cycle.

To return to our problem, then: you can't guess what the optimum value for one setting is going to be, and you are more likely to win the lottery twice over than you are to guess a configuration for 40 settings which performs better than the default configuration you're already running. Configuring via brute force testing is extremely time-consuming, and therefore also extremely expensive - another reason why almost all of the companies we meet haven't attempted it to date, despite the lucrative rewards attached to successful optimisation.

It is worth mentioning that all of the constraints mentioned above apply to the performance novice as much as to the incredibly rare and much-sought-after figure of the Performance Engineer - an individual that specialises in performance tuning and, more frequently, in fixing major performance issues. There are only perhaps one to two hundred such engineers worldwide - and as far as we can tell they all work for Netflix or Amazon. Even if you could hire one, their work is certain to take many months, cost you a small fortune, and come with no guarantee of success.

N.B. if you need to optimise your front-end (i.e. the way your website renders images, and chooses which content to pre-load, etc) before you start optimising stack settings, a performance engineer should be able to make more of an immediate impact - but as we say, they're very hard to find, let alone hire!

Brute force testing is out, machine learning is in

SKIPJAQ uses advanced machine learning techniques to explore the massive parameter space (i.e. the sheer number of possible configurations) and run experiments that home in on the optimum configuration for application stack settings.

Review the images below to see how our machine learning engine first successfully approximates the relationship between different memory parameter values and application latency and then builds on that approximation over time using an application's performance history, and by applying lessons learnt from optimisation cycles taking place across our customer base.

With SKIPJAQ's platform approach you can point our machine learning-driven optimisation engine at your application, set it going, and simply wait for it to spit out a list of settings that, when applied, will turbocharge the performance of your application. You can launch an optimisation cycle when you arrive in the morning and have your recommended settings in hand by the time you leave at the end of the day. Quite simply, with SKIPJAQ you can now optimise application performance cost-effectively at speed, at scale, and without expert knowledge. That's why we believe we've solved a problem that has dogged application owners and businesses for decades.

Get in touch with colin@skipjaq.com if you'd like to know more.