GCP Cloud Cost Optimization News & Updates

Welcome the Google Cloud FinOps & Cloud Cost Optimization updates.

Every week, we’ll update this page with all the news that can help you do cloud cost optimization in GCP.

March 5, 2026

BigQuery GA: monitor cross-region replication latency and egress bytes in Cloud Monitoring

BigQuery added GA metrics for cross-region replication latency and network egress bytes in Cloud Monitoring.

These metrics give visibility into replication performance and egress volumes so teams can find high-cost transfers and optimize data placement.

Compute Engine: managed compute.managed._ constraints for org-wide VM policy enforcement (GA)

Compute Engine introduced managed replacement constraints (compute.managed._) for Organization Policy Service, with Policy Simulator and dry run support.

That enables centralized enforcement to limit VM types and configurations across an organization to control cloud spend and compliance.

Compact placement policies GA for Flex-start VMs, colocate VMs for lower cross-node cost

Google Cloud made compact placement policies generally available for Flex-start VMs (AI Hypercomputer / Compute Engine).

These policies let you colocate VMs to minimize network hops and improve latency-sensitive AI/ML workload efficiency.

You can reduce cross-node communication overhead and improve utilization, which helps lower operational and networking costs for distributed workloads.

February 19, 2026

Carbon Footprint methodology update: AI inference emissions allocated at SKU level

See Google Cloud’s Carbon Footprint methodology update (effective Jan 2026) for AI inference allocation.

Google updated Carbon Footprint calculations to allocate AI inference emissions at the SKU level following the AI energy/emissions framework, starting Jan 2026.

And this change increases reported emissions for AI-powered services (for example, Vertex AI) while improving transparency for sustainability reporting.

Compute Engine instance flexibility for bulk VM creation (GA) to reduce failed launches

Read about instance flexibility becoming generally available for bulk VM creation.

Compute Engine made instance flexibility generally available for bulk VM creation, letting you provide a list of acceptable machine types and allowing the platform to provision based on capacity and quota.

And that reduces failed launches and manual rework when provisioning many VMs at once, improving large-scale provisioning resiliency.

Hyperdisk Exapools GA for massive pooled block storage and predictable performance

Learn about Hyperdisk Exapools GA for very large block storage needs.

Google announced Hyperdisk Exapools generally available, enabling purchase of bulk storage+performance from 500 TiB up to 5 EiB and sharing across up to 500,000 disks.

And this pooled model supports massive AI/ML workloads with predictable performance and the potential for unit-cost savings for very large storage users.

February 13, 2026

Compute flexible CUDs expanded to all Cloud Billing accounts to simplify committed discounts

Google Cloud automatically migrated Cloud Billing accounts to the new spend‑based compute flexible committed use discounts (CUDs), broadening CUD coverage across Compute Engine, GKE, and Cloud Run SKUs. Google moved accounts to a spend‑based flexible CUD model that applies across multiple compute SKUs, simplifying how commitments are consumed.

Additionally, broader coverage across Compute Engine, GKE, and Cloud Run makes it easier to apply committed discounts to actual usage patterns.

Capacity Planner adds Cloud Storage egress bandwidth and Spot GPU visibility

Google Cloud’s Capacity Planner preview now shows Cloud Storage egress bandwidth usage and GPUs attached to Spot VMs. The update gives teams visibility into egress bandwidth and which GPUs are attached to Spot VMs so you can forecast bandwidth and GPU capacity needs.

Plus, that helps avoid quota bottlenecks and better plan capacity for spot-based GPU workloads.

Cloud Monitoring MCP server and OTLP metric ingestion (Preview) to improve observability pipelines

Google Cloud added a Cloud Monitoring MCP server (preview) and OTLP metric ingestion paths via OpenTelemetry Collectors. The MCP server lets agents and AI apps interact with time series data, while OTLP ingestion via OpenTelemetry Collectors gives additional ingestion paths for metrics.

Also, these changes improve observability pipelines and integration flexibility for telemetry sources.

February 6, 2026

Carbon Footprint corrected Cloud Run emissions

Important: Google Cloud fixed incomplete Cloud Run emissions data for Nov–Dec 2025 in Carbon Footprint and advised customers to backfill transfers to see corrected data.

Google updated Carbon Footprint to correct previously missing Cloud Run emissions for that period so reported emissions now reflect actual usage.

Consequently, customers who rely on Carbon Footprint for sustainability accounting should backfill transfers to ensure their historical reporting is accurate.

January 30, 2026

Vertex AI Search lets you change pricing models to match forecasting needs

Vertex AI Search now supports two pricing models and switching between them.

You can choose general pay‑as‑you‑go or a configurable monthly subscription for apps and data stores, and change between models as needed.

Additionally, that flexibility affects predictability versus consumption-based spend, which matters for forecasting and chargeback.

Therefore, product owners can pick the model that best balances predictable costs and variable usage for their search workloads.

Cloud SQL fast clone (same zone) goes GA for MySQL and PostgreSQL

Cloud SQL added GA support for fast clone within the same zone for MySQL and PostgreSQL.

Fast clone gives near‑instant cloning for dev/test and analytics workloads instead of full volume copies, speeding environment provisioning.

Also, clones are cheaper and faster than full copies, which reduces storage costs and time‑to‑test for CI workflows.

N4A (Axion/Arm) machine family is generally available

Google Compute announced GA for the N4A machine family powered by Axion Arm processors.

N4A supports 1–64 vCPUs, up to 512 GB memory, and standard/highmem/highcpu and custom types, adding another price‑performance option based on Arm architecture.

Moreover, that means workloads that can run on Arm may realize different price‑performance tradeoffs useful for FinOps evaluations.

January 23, 2026

Committed‑use recommendations support more machine types

Google Cloud expanded resource‑based Committed Use Discount (CUD) recommendations to support additional machine series.

Recommendations are available in the FinOps hub, Recommender API, and via BigQuery exports for programmatic analysis.

Backup & DR Service cost reports generally available

Google Cloud Backup & DR Service launched cost reports in GA to provide resource‑specific billing insights.

The reports expose backup‑related spend so teams can analyze retention, replication, and protection costs.

BigQuery: Gemini Cloud Assist surfaces job‑history analysis (preview)

BigQuery previewed Gemini Cloud Assist to analyze job history and surface slow or resource‑intensive queries.

The feature helps identify which queries are high cost or slow, speeding FinOps investigations into query-driven spend. That lets teams prioritize query optimization and tune cost‑heavy SQL patterns more quickly.

Cloud Monitoring: Application Monitoring dashboards show associated trace spans,link traces to costly ops

Cloud Monitoring dashboards for Application Monitoring now display associated trace spans for registered App Hub applications.

This improves observability by helping teams correlate trace spans to slow or costly operations shown in dashboards.

January 9, 2026

View future availability for GPU VMs, H4D VMs, or TPUs

Generally available: You can view future resource availability before you create a future reservation request in calendar mode. This action helps increase the likelihood that Google Cloud approves your request.

January 2, 2026

GKE gains In-place Pod Resize and Writable cgroups to improve resource efficiency

GKE (Kubernetes 1.35) went GA with In-place Pod Resize (change CPU/memory without restart) and Writable cgroups.

In-place resizing helps tighten right‑sizing by letting you adjust pod resources without downtime, reducing waste and disruption.

December 26, 2025

Get per-resource Pub/Sub usage in Billing exports to BigQuery for precise message cost allocation

Cloud Billing detailed export to BigQuery now includes granular Pub/Sub snapshot, subscription, and topic usage via resource.name/resource.global_name fields.

That enables accurate, per-resource cost analysis for Pub/Sub so messaging costs can be aligned to teams and workloads in downstream FinOps reporting.

Additionally, with this detail in BigQuery exports, chargeback and anomaly detection for Pub/Sub spend becomes much more reliable.

Vertex AI Agent Engine runtime pricing lowered and upcoming metering changes announced

Google lowered runtime pricing for Vertex AI Agent Engine and announced that Sessions, Memory Bank, and Code Execution will begin charging on January 28, 2026.

The price reduction plus the explicit metering start date affects forecasting for agentic workloads and requires updating budgets ahead of January 28, 2026.

Also, teams should take note now so they can adjust spend models and alerts before the new chargeable features begin metering.

Understand Google SecOps/Chronicle billing components for better security cost attribution

Google published documentation that explains Google SecOps / Chronicle billing components, how to track usage and associated costs.

That helps FinOps teams attribute and forecast security-related spend more accurately by breaking down the billing components.

Additionally, this documentation is useful when aligning security costs to teams or applications in internal chargebacks.

December 18, 2025

Cloud SQL enhanced backups: GA centralizes backups and enforces retention

Google Cloud announced Cloud SQL enhanced backups are generally available, centralizing backups in a Backup & DR management project with enforced retention, granular scheduling, and point‑in‑time recovery after deletion.

Enhanced backups centralize backups into a Backup & DR management project, enforce retention, support granular schedules, and restore point‑in‑time recovery (PITR) even after deletion. You can reduce surprise backup costs and ensure retention policies align with both compliance and cost targets.

Create future reservation requests in calendar mode (GA): reserve GPUs/TPUs/H4D up to 90 days

Compute Engine made future reservation requests in calendar mode generally available so you can reserve high‑demand GPUs, TPUs, or H4D resources up to 90 days in advance.

Calendar‑mode future reservation requests let you reserve specific high‑demand accelerators and H4D resources for scheduled windows up to 90 days ahead. That feature improves planning accuracy for time‑boxed workloads like model training or large HPC runs.

AlloyDB adds C4 machine series (GA) for extreme sizes and price/perf planning

AlloyDB for PostgreSQL now supports the C4 machine series (GA) using 6th‑gen Intel Xeon Granite Rapids with sizes up to 288 vCPUs and 2232 GiB RAM.

AlloyDB’s C4 machines use 6th‑gen Intel Xeon (Granite Rapids) and offer very large sizes, up to 288 vCPUs and 2232 GiB of RAM. For FinOps teams evaluating large DB migrations (for example, high‑end OLTP or analytical DBs), C4 changes the cost calculus for single‑node scale.

December 11, 2025

BigQuery: autonomous embedding generation (preview) to reduce engineering overhead

BigQuery introduced autonomous embedding generation on tables (preview) that maintains embeddings as data changes and enables semantic search via AI.SEARCH. This automates embedding upkeep so teams don’t need custom pipelines to keep embeddings in sync with table updates. As a result, it reduces engineering work and supports more cost‑efficient semantic workloads inside BigQuery. Plus, using native embeddings keeps storage and compute within BigQuery for simpler cost attribution.

Spanner Query Insights: new client & request columns for better cost attribution

Spanner added CLIENT_IP_ADDRESS, API_CLIENT_HEADER, USER_AGENT_HEADER, SERVER_REGION, PRIORITY, and TRANSACTION_TYPE columns to the oldest active queries table in Query Insights. Those columns improve observability for performance and offer richer metadata for cost and usage attribution across workloads.

Compute Engine: VM Extension Manager (preview) to manage guest agent extensions at scale

Compute Engine previewed VM Extension Manager to install and manage guest agent extensions (like Ops Agent, SAP Agent) across fleets via policies. This lowers operational overhead by automating agent deployment and ensures consistent observability coverage across VMs.

AlloyDB Query Plan Management: stabilize query plans to avoid regressions

AlloyDB introduced query plan management to monitor and capture execution plans and allow forcing approved plans to prevent regressions. That helps DBAs stabilize performance and avoid unexpected slowdowns that could spike resource usage and cost.

December 4, 2025

GKE TPU7x (Ironwood) preview — 7th‑gen TPU for large ML training

Google Kubernetes Engine announced a TPU7x (Ironwood) preview for GKE Standard/Autopilot clusters.

TPU7x is Google’s 7th‑generation TPU with 2307 BF16 TFLOPs and 192 GB HBM per chip, enabling higher ML training performance and new cost/performance tradeoffs for large models on supported GKE versions.

November 29, 2025

See reservation consumption and tag reservations in billing exports

Compute Engine added reservation consumption visibility and BigQuery billing export labels for reservation consumption.

Compute Engine now exposes a consumedReservation field in VM details and added two billing‑export labels that indicate reservation consumption and the unused reservation portion for BigQuery billing exports.

Additionally, this improves cost transparency for reservation utilization and makes chargeback/Showback and cross‑project billing more accurate by surfacing how much reserved capacity is actually used.

Fast‑starting GKE nodes (Autopilot) now GA to reduce over‑provisioning

GKE fast‑starting nodes are generally available for Autopilot clusters. Fast‑starting nodes let compatible workloads provision nodes quickly, reducing startup latency and enabling faster scaling for short‑lived jobs.

Also, this capability helps lower over‑provisioning for bursty or transient workloads, improving cost‑efficiency by reducing the need to keep idle capacity warm.

Reduce transfer and ingestion costs with incremental Salesforce transfers to BigQuery (preview)

BigQuery Data Transfer Service now supports incremental transfers from Salesforce (Preview). The new incremental mode moves only changed data rather than full extracts, cutting duplicate transfer volume and lowering transfer and ingestion costs.

New TPU7x (Ironwood) in preview for large AI workloads

Cloud TPU announced TPU7x (Ironwood) in preview. TPU7x is a new TPU generation aimed at large‑scale AI training and inference, offering improved performance and cost‑effectiveness for LLMs and heavy workloads.

C4 Granite Rapids machine types (Preview) for tuned price‑performance

Compute Engine’s C4 general‑purpose series added Granite Rapids machine types in preview. The C4 series now supports larger machine types on Intel Xeon 6 (Granite Rapids) including local SSD and bare‑metal variants, providing higher‑performance VM options.

November 22, 2025

Cut Cloud Build costs for simple deploys — deploy source artifacts directly to Cloud Run (preview)

Cloud Run preview now supports deploying source artifacts directly to Cloud Run without invoking Cloud Build.

By bypassing Cloud Build for supported flows, teams can reduce CI/CD build time and the associated Cloud Build costs for simple deploys.

November 15, 2025

Autoclass now supports buckets with hierarchical namespace for automatic storage tiering

Google Cloud Storage Autoclass can now be enabled for buckets that use a hierarchical namespace (HNS). Enabling Autoclass on HNS buckets means more workloads can automatically tier to lower‑cost storage classes.

GKE logging agent processes logs up to 2× faster and uses fewer node resources

GKE updated its logging agent (in GKE 1.34.1+) to process logs up to twice as fast while using fewer node resources. Faster processing and lower resource usage reduces observability overhead on nodes and frees node capacity.

N4D VMs (Axion/Neoverse N3) preview and N4D GA on Compute Engine for more price/perf options

Google introduced the N4D machine series (preview) and announced generally available N4D VMs powered by 5th Gen AMD EPYC (Turin) with Titanium I/O offload, offering up to 64–96 vCPUs and DDR5 memory options. N4D provides another general‑purpose VM family that may improve price‑performance for compute workloads with better I/O characteristics.

November 8, 2025

Cost Anomaly Detection is GA

Google Cloud announced Cost Anomaly Detection is generally available and enabled by default for all customers and projects.

Alerts are auto‑enabled and sent to Billing Administrators; the Anomaly dashboard includes root‑cause analysis so teams can quickly see what caused a spike. Importantly, the GA release uses AI‑generated thresholds based on historical spend so you get relevant alerts without extra tuning.

Also, you can filter alerts by absolute dollars or by percentage deviation, and the improved algorithm supports immediate protection even for new projects with no spend history — all offered free as part of Google’s cost management tools.

Prioritize busy workloads with BigQuery reservation groups (Preview)

BigQuery introduced reservation groups in preview so you can group reservations and prioritize idle slot sharing within that group. This gives more control over slot allocation, letting high‑priority workloads borrow idle slots from grouped reservations. As a result, you can improve utilization of purchased slots and reduce the need to buy extra capacity.

See which VMs are using reservations (GA)

Compute Engine now lets you view which reservation a VM is consuming and list VMs tied to a reservation (GA). You can make better decisions around committed use, rightsizing, and whether to purchase or adjust reservations. And because you can list VMs per reservation, consolidation and reassignment opportunities become clearer.

November 1, 2025

Cloud SQL for PostgreSQL now cancels high‑memory connections to avoid OOM failures

Cloud SQL release notes: proactive cancellation of high memory usage connections (Oct 23, 2025). Cloud SQL for PostgreSQL now detects high memory usage connections and works to cancel them proactively to prevent out-of-memory (OOM) failures.

That reduces the risk of instance crashes or forced restarts that can cause downtime and unexpected operational cost (for example, emergency scaling or recovery work).

Plus, by preventing OOM events, teams can expect more stable database behavior and fewer emergency ops cycles.

October 25, 2025

Use Gemini CLI + GKE Inference Quickstart to pick cost‑effective LLM setups

GKE’s Inference Quickstart integrates with the Gemini CLI (via MCP) to generate manifests and give data‑driven cost/performance recommendations for LLM inference on GKE.

You can install the Gemini CLI and the GKE MCP extension with commands like brew install gemini-cli and gemini extensions install https://github.com/GoogleCloudPlatform/gke-mcp.git, and then ask the CLI for recommendations such as the cheapest models, performance comparisons across accelerators, or to generate a deployable manifest.

October 11, 2025

BigQuery updates: daily on‑demand limits, reservations, transfers, and job priorities

Google Cloud posted BigQuery release notes covering several cost and resource management changes (Oct 07–08, 2025). BigQuery now sets a default QueryUsagePerDay limit of 200 TiB for all new projects using on‑demand pricing; existing projects got defaults based on their last 30 days of usage. Projects with custom cost controls or reservations aren’t affected. If the new limit could block your workloads, the release notes recommend creating a custom cost control.
Reservation management got more flexibility: you can set labels on reservations for organization and billing analysis, specify which reservation to use at query runtime, and attach IAM policies directly to reservations for finer access control. All these features are GA.
BigQuery Data Transfer Service can now import reporting data (including custom reports) from Google Analytics 4 (GA), and preview support was added for PayPal and Stripe as transfer sources. Dataform workflows can now choose job priority: interactive jobs (start immediately) or batch jobs (lower priority, cost‑efficient). Finally, note that starting March 17, 2026 the Data Transfer Service will require bigquery.datasets.setIamPolicy and bigquery.datasets.getIamPolicy permissions to create or update transfer setups.

October 4, 2025

Get Finer Control Over Your BigQuery Spend

BigQuery’s workload management features are now generally available, giving you more precise control over slot consumption. The autoscaler can now scale more immediately and in smaller increments of 50 slots instead of 100.

This is a huge win for managing bursty workloads and controlling costs. You can also now set maximum slot limits and specify which reservation a query uses, enabling much tighter governance over your BigQuery spend.

Track Your GKE Costs with More Granularity

Google Kubernetes Engine (GKE) cost allocation is now generally available, allowing you to see cost breakdowns by cluster, namespace, and labels. This data is exported to BigQuery for detailed analysis.

This gives FinOps teams precise visibility into Kubernetes spend, making it much easier to attribute costs to specific teams or projects. It’s a crucial tool for showback, chargeback, and identifying optimization opportunities.

September 27, 2025

Cost-Effective Compute for Short Workloads with Flex-start VMs

Google Cloud announced that Flex-start VMs have reached General Availability. These instances, which can run for up to seven days, use a provisioning model that pulls from a secure capacity pool, increasing the chance of getting high-demand resources.

This is suitable for short-duration workloads that can start at any time, such as batch inference or model fine-tuning. For FinOps, it provides a cost-effective option for workloads that do not require the guarantees of standard or on-demand instances.

24th August, 2025

Track the Environmental Impact of Your AI

Here’s an interesting update on the “GreenOps” front. Google Cloud is making it easier to measure the environmental impact of your AI inference workloads.

You can now get data on the carbon emissions associated with running your AI models.

As sustainability becomes a bigger part of FinOps, this is a great tool for understanding and reporting on the carbon cost of your cloud usage, not just the financial cost.

17th August, 2025

New Cost Management Tools for Google Cloud

They’ve just launched two new tools: Optimization Hub and Cost Explorer.

Optimization Hub gives you a central place to see all of your cost-saving recommendations, from idle resources to right-sizing suggestions. Cost Explorer is a new tool that makes it easier for developers to visualize and understand their spending.

These are fantastic new additions that bring Google’s native FinOps tooling to a new level, making it much easier for everyone to manage their GCP costs.

10th August, 2025

Google Gets Top Marks for FinOps!

Google is celebrating a big win! The analyst firm IDC has named Google Cloud a Leader in their MarketScape for Cloud FinOps.

This recognizes Google’s major investments in providing strong, user-friendly cost management tools.

For customers, it’s a good sign that Google is committed to helping you manage and optimize your cloud spend effectively.

25th July, 2025

Google CUDs Get More Flexible for Custom Pricing!

This is a big deal for customers with special pricing! Google Cloud’s Committed Use Discounts (CUDs) now support multiple prices for the same SKU.

Before, if you had different negotiated prices for the same virtual machine type, applying a CUD could be tricky.

Now, the system can handle it automatically, making it much easier to take full advantage of your commitments even with complex, custom pricing deals.

Older Updates

Here are additional FinOps updates we made that involve information about AWS Cloud Cost Optimization updates