Mental models for FinOps adoption
by Peter Shi
This post is targeted towards leaders who oversee large cloud IT deployments, and are looking to implement effective cloud cost management practices (a.k.a. FinOps).
This will be the first in a two part series about adopting FinOps in larger organizations with significant cloud footprints.
For clarity, effective cloud cost management implies:
- embedded capability for efficient cloud consumption
- ability to spend to budget
- having accurate forecasts
- achieving expected ROI on cloud investments (e.g., improved agility)
When organizations invest resources to improve some aspect of the org. whether it be culture, innovation, security, or cloud financial performance - I believe it is:
- Helpful to learn from the experiences of others
- Valuable to have a mental model of the domain / challenge at hand
With these beliefs in mind, below are experiences and mental models that hopefully help in your cloud cost management endeavours.
Cloud spend can be made efficient post-procurement, yet most orgs. are not structured to take advantage of this
Many orgs. I work with, have procurement and finance teams, whose responsibility includes improving spend efficiency (i.e., paying less for the same outcome).
These teams tend to pay for themselves multiple times over due to the savings they’re able to realize on behalf of the org.
Most of these savings are realized through means such as vendor selection and price negotiations.
Price negotiations applies to high-volume cloud spending with service rates locked-in by procurement/finance via negotiations.
However, when it comes to cloud, total actual spend doesn’t come from the agreed cost estimate, but rather from:
💰 Total spend = [Rate] x [Consumption Volume]
Procurement can impact the Rate (i.e., price per unit) by some percentage.
For argument's sake, lets say rate can be influenced by 5% via procurement team efforts.
Consumption Volume depends on what the broader business (e.g., business teams, developers, infra teams) decides to deploy on the cloud.
To be more precise:
💰 Total spend = [Rates given the mix of services selected] x
[Consumption volume of each selected service]
Architecture and service mix decisions are primarily driven by infra. team familiarity with cloud services, and business requirements.
These decisions have an immense impact on cost.
The diagram example above shows two different ways to set up a highly available microservice on AWS cloud.
On the left is the traditional architecture which contains a load balancer, at least 2 instances in an ASG, and a primary and backup database.
On the right is a serverless architecture with an API Gateway Endpoint, a Lambda function, and a NoSQL DynamoDB database and table.
Both architectures achieve the same business outcome but the one on the right (depending on service demand) can be 10x to 100x cheaper, and requires little to no maintenance.
In most cases, the pushback to using the serverless architecture is a lack of familiarity with the method rather than any business requirement (barring some exceptions).
In cases where serverless cannot be used, activities such as clean-up, resizing, container tuning, and reservations often yields 20% to 50% savings.
On cloud, far more efficiency can be found away from the negotiation table than at the table.
However many orgs. are only equipped to seek efficiency through rate-optimization means, and lack the systems (people, tools, and processes) required to manage cost and ROI outside of the procurement function.
There is natural tendency towards higher cloud costs
Show me the incentives and I will show you the outcome. -Charlie Munger
Many KPIs of IT managers, developers, and infra. teams can be improved by accepting higher costs.
Uptime can often be improved by picking larger resources that are less likely to run out of memory.
Performance can be improved by picking larger resources that are less likely be CPU constrained.
Product development velocity can be improved by reducing time-spend on application code optimization, and instead opting for a larger resource.
Velocity is also improved by sticking to known (but less efficient) architectural methods.
Naturally, these IT KPIs align with what business teams want too - high performance, availability, and velocity.
Buying managed cloud services through a 3rd party using a cost-plus contract also creates tendency for higher costs.
As such, counteracting forces need to be put in place to encourage efficient practices. These 'forces' could include fast feedback tools (e.g., cost alerts), efficiency automation tools (e.g., resource on-off automation), and leadership support - to name a few.
Motivation trumps other factors in achieving results
When embarking on a cost reduction project, or when trying to improve cloud cost management practices - it’s important to ask:
“How motivated are those responsible, in actually doing this work?”
In one case, a customer asked for a cost review of their SAP deployment that was well above budget.
After suggesting changes that would result in 82% in savings, estimated to take three days’ of effort to remediate, the response from the infra. manager was
“I think we’ll have time for that in 6 months”
The budget owner was not pleased.
In this case, the 6-month lead time may be legitimate as there was other business critical work that must be completed first.
However, it is also possible that the infra. manager simply felt the optimization work was not something they wanted to do.
In another case, I met with a CTO and finance leader who wanted cloud cost management advice. My suggestions included:
- clearly communicate objectives with engineers and celebrate wins
- establish a cadence between Finance and Engineering
- put up cost and efficiency dashboards
6 months later, the org. achieved a 35% reduction in their cloud costs, and the CTO shared that some managers had to ask their staff to stop optimizing costs that were too small to warrant attention.
Although tools such as dashboards helped, results were more-so driven by clear leadership communication and support which in turn affected motivation, interest, and prioritization.
Orgs. that do not backup FinOps initiatives with clear leadership support typically end up with unused tools, and FinOps hires that are not able to be effective in their roles.
The value-add of a FinOps function changes over time
When beginning the FinOps journey, the best ROI is likely found through clean-up activates.
One customer realized $4M USD in annual savings from only 4 days of clean-up effort from 1 engineer.
As time passes, reactive savings-generating activity begins to run dry and focus should instead shift towards activities such as:
- setting-up cost and efficiency visibility
- building a culture of efficiency and ROI awareness / accountability
- enabling edge-teams (e.g. IT managers) with methods, tools, and relevant processes
- establish processes required to meet budgets and produce accurate forecasts
- create new-starter onboarding material and knowledge-share forums re: cloud cost matters
The skills required for building this capability is different to the skills needed for clean-up efforts, with the former requires greater capacity for cross-functional influence and effective communication.
As such, the team skill-mix should change over time assuming cloud cost maturity matters to the organisation.
In the fullness of time, a highly successful FinOps practice may choose to wind-down its operations and reduce its headcount once sufficient capability is embedded in the org., similar to how Patty McCord left Netflix after successfully implementing a high-performance culture.
I hope this blog has provided you with new insight into cloud adoption and FinOps.
There’s immense potential for cloud spend can be made efficient post-procurement, yet most organizations are not structured to take advantage of this aspect of cloud.
Counteracting mechanisms are required to address the tendency towards higher cloud costs.
Motivation trumps other factors (e.g., tooling) in achieving results.
And the value-add of the FinOps function changes over time, along with the optimal skills mix.
The next blog in this series will expand on why larger orgs. may benefit from having a FinOps Operating Model.
> See other blogs
> Contact us