The Three Engineering Habits Driving up your Cloud Costs

Your cloud costs keep climbing, and you know why.

Engineers provision more than they need. Development environments run around the clock. Resources get added but never removed. Everyone agrees this needs to change, but it doesn’t.

This isn’t about bad engineers or poor planning. These patterns emerge from how engineering teams are incentivized.

I’ve lived through all of these habits at various points in time, and understanding these patterns is the first step to addressing them.

Performance first, cost later

When building new features or services, engineers focus on getting things working fast and reliably. Cost optimization gets deferred to “later.” That is, after the feature ships and things stabilize.

But later never comes.

I’ve seen situations where services are spun up with generous resource allocation to prove a concept. The plan was always to optimize once we validated the approach.

A year later, those services were still running on the same oversized instances because new priorities kept taking over. New features, urgent cost spikes in other areas, production incidents.

Teams have the data showing they could rightsize, but something more urgent always wins.

When caution becomes expensive

Ask an engineer how much memory their service needs, and they’ll add a buffer. If testing shows 2GB works, they’ll request 4GB. If 4GB seems sufficient, they’ll go with 8GB to be safe.

This isn’t carelessness. It’s rational behavior, and I’ve done it myself.

I’ve been an on-call SRE. I know what it’s like to get paged at 2am because a service ran out of memory. It sucks. There’s a post-mortem. You have to explain what went wrong and how you’ll prevent it from happening again.

The same fear applies to deleting resources. Engineers are afraid of removing the wrong thing. That old load balancer might still be in use. That database snapshot could be critical for compliance. That test environment someone spun up six months ago might be needed for an upcoming audit. So things stay provisioned, just in case.

Overprovisioning has no equivalent cost. No one gets paged when a service uses half its allocated memory. There’s no post-mortem for running a database instance one size larger than necessary.

When the incentive structure punishes resource shortages but not resource excess, teams will consistently choose the expensive, safe option.

Treating cloud resources like on-premise infrastructure

Development and staging environments run 24/7. Databases stay provisioned at peak capacity through nights and weekends. Services scale up for traffic spikes but never scale back down.

I’ve experienced this firsthand during a cloud migration. People, myself included, carry over mental models from on-premise infrastructure, where you purchased hardware once and ran it continuously.

Even though people know cloud infrastructure operates differently, it can take time to internalize it.

It takes time to let go of old engineering habits and build new ones, even when you understand how cloud billing works.

Why these habits persist

The immediate pain of an outage is intense and visible. Systems go down, customers complain, and engineers have to explain what happened.

The pain of overspending is gradual and diffuse. Costs accumulate across many services and many months. By the time someone notices the bill has grown, it’s hard to trace back to specific decisions.

No single engineer chose to spend an extra $50,000 this quarter. It emerged from a thousand small choices about safety margins and resource allocation.

Engineering teams get measured on uptime, latency, feature velocity, and deployment frequency. These metrics rarely account for cost efficiency.

Until the incentives change these habits will continue. People behave rationally within the system they’re in. The problem is the system itself.

FinOps Weekly
Jill Kay
Articles: 111