Can cloud IT engineers be expected to remember to switch off resources when they are done?

by Peter Shi

If I can’t for the life of me remember to switch off the kitchen light, how can cloud IT engineers be expected to remember to switch off resources when they’re done?

The way the home kitchen light gets turned off at the moment is via an entertained and gentle reminder from my girlfriend. She’s entertained because I just can’t seem to remember to do it, even though I genuinely want to.

Experiencing this struggle reminded me of how easy it is for cloud IT engineers to leave cloud resources running when they didn’t intend to – i.e. “forgotten spend”. The struggle also implies that reminders to turn off resources from someone else after the fact simply isn’t effective.

img 1

Luckily, the cost of leaving the kitchen light on is minimal – but what about cloud resources?

img 2

Ouch.

Over the last few years, I’ve come across several cases of more than $100k p.m. in forgotten spend that was left on for months. After discovering these resources, it took all of 10 minutes to turn them off. $100k p.m. in additional profit with 10 minutes of work? Absolutely stellar ROI if you ask me. Now if you want to avoid this scenario, the common knowledge answer seems to be to turn on billing alerts. Unfortunately this doesn’t work well at scale. Allow me to explain.

Imagine a scenario where your cloud spend looks like this – fairly consistent profile with a peak of $8k per day. Based on this trend trend, you may decide to set an alert that will warn you if spend goes over $8k per day.

img 3

After a few days, daily spend doubles and you get an email.

img 4

This alert would work as intended if you’re the person who is both responsible for generating the increase and the one who receives the alert. You may be reminded to turn off the test resources, or if the increase was expected then a new alert at $16k per day should be set. Pretty good solution for smaller environments.

However, if you’re instead responsible for overall IT spend across a large org. but not responsible for generating the increase, then I’m sorry to say you’re in for some pain as it could be near impossible to do anything useful with the alert. Sure you may be able to quash forgotten spend here and there, but not everyone wants to leave their day job to play cloud spend whack-a-mole.

Getting everyone across the org to set cloud spend alerts at an individual level might be possible, but this requires cat herding capability, authority, willingness to add engineering process overhead, and cloud cost allocation capability – an exceedingly rare combination.

In light of this and other cloud cost management challenges, some organisations lock down cloud budgets and even repatriate workloads back to on-prem in an effort to reduce cloud spend against budget.

So how can we remember to turn off the lights when done? I think the best answer is that we don’t. The solution lies in re-framing the problem and making non-compliance harder than compliance.

img 5

Notice the design difference between these faucets? The second public faucet works without a park officer to ask people to turn off taps nor does it use a sign to remind people to turn off the tap. It simply turns off by itself when you stop pressing the button.

Applying this to cloud, organisations can implement a system where a default “turn-off 7 days from now” policy is applied to anything created in test accounts unless the user manually adjusts that setting. Reporting can then reveal if any team is abusing the ability to change the default via a trust and verify model.

Gradually layering on business systems and rules that mitigate risk whilst enabling users to move at speed is what good governance looks like on the cloud.

Personally, the consequence of leaving the kitchen light on isn’t large enough to warrant investment in smart-home technology that automatically turns off the lights. l’ll stick to using this desktop background instead.

img 6


> See other blogs

> Contact us
Goldiserv logo

Goldiserv