March 12, 2026 · 6 min read

Kubernetes Cost Optimization: The Complete Guide to Cutting K8s Bills

Practical Kubernetes cost optimization strategies: right-sizing with VPA, spot nodes, ResourceQuota, Kubecost, and how to cut your K8s cloud bills by 40%.

Kubernetes Cost Optimization: The Complete Guide to Cutting K8s Bills

30-40% of Kubernetes spend is typically unused reserved capacity. That’s not an estimate — it’s what we see consistently when we run cost audits on clusters that haven’t been optimized. Workloads are over-provisioned, idle resources accumulate, and no one is tracking which teams are responsible for which spend.

This guide walks through every lever available for Kubernetes cost optimization, from request right-sizing to spot node strategy to FinOps tooling. We’ll go from “vague awareness that K8s is expensive” to “specific actions with dollar estimates.”


Why Kubernetes Clusters Overspend

Before attacking the problem, understand why it happens:

  1. Developers over-request resources — because under-requesting causes OOMKills and CPU throttling. Teams learn: request more than you need.
  2. No chargeback or showback — without cost visibility per team/namespace, there’s no incentive to right-size.
  3. Cluster autoscaler provisions for peaks — nodes that were needed at 2am Friday are still running at 2pm Tuesday.
  4. Reserved capacity is underutilized — reserved instances or savings plans purchased for predicted peak load often run at 20-30% utilization.
  5. No governance at namespace level — without ResourceQuota, any team can request unlimited resources.

Step 1: Get Visibility with Kubecost

You can’t optimize what you can’t measure. Kubecost is the industry-standard tool for Kubernetes cost visibility. Install it before making any other changes.

# Install Kubecost via Helm
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm repo update
helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --create-namespace \
  --set kubecostToken="<your-token>"

# Access the UI
kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090:9090

Kubecost allocates costs by namespace, deployment, pod, label, and team. Within 24 hours of installation, you’ll see:

  • Cost per namespace — find the top 3 cost centers
  • Idle cost — resources provisioned but not used
  • Efficiency score — ratio of requested vs actually used

A cluster efficiency score below 50% is a strong signal that right-sizing is urgently needed. We frequently see scores of 20-35% on first audit.


Step 2: Right-Sizing with VPA and Goldilocks

Vertical Pod Autoscaler (VPA) analyzes historical CPU and memory usage and recommends (or automatically sets) optimal resource requests and limits.

Start with Recommender mode only — never jump straight to Auto mode on production workloads.

# Install VPA
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler.yaml

# Create a VPA object in Recommendation mode
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"  # Recommender only — no automatic changes

After 24-48 hours, check recommendations:

kubectl describe vpa my-app-vpa

You’ll see lowerBound, target, and upperBound recommendations. Use target as your new resource requests, then gradually adjust limits.

Goldilocks is a UI wrapper around VPA that visualizes recommendations across all namespaces — much easier to act on at scale:

helm install goldilocks fairwinds-stable/goldilocks \
  --namespace goldilocks \
  --create-namespace

# Label a namespace to enable VPA recommendations
kubectl label namespace production goldilocks.fairwinds.com/enabled=true

Typical savings from right-sizing: 20-35% of compute spend. The largest wins are usually on Java-based workloads that were given 4GB memory requests “just in case” but actually use 800MB at steady state.


Step 3: Spot/Preemptible Node Strategy

Spot instances (AWS) and preemptible nodes (GCP) offer 60-90% savings over on-demand pricing — but they can be terminated with short notice (2 minutes on AWS, 30 seconds on GCP).

The key is making your workloads tolerate interruptions. Here’s a practical framework:

Tier your workloads:

Workload TypeSpot-safe?Notes
Batch jobs, ML trainingYesDesign for restart
Stateless web services with replicas ≥3YesDisruption is tolerable
Stateful services (databases)NoRisk of data corruption
Single-replica deploymentsNoInterruption = outage

Node pool design for EKS:

# On-demand node group for critical workloads
- name: critical-on-demand
  instanceType: m6i.xlarge
  desiredCapacity: 3

# Spot node group for tolerant workloads
- name: batch-spot
  instanceTypes:
    - m6i.2xlarge
    - m5.2xlarge      # Multiple instance types increase availability
    - m5n.2xlarge
  capacityType: SPOT
  desiredCapacity: 10

Taint spot nodes so only explicitly tolerant workloads land on them:

# Add to node group config
taints:
  - key: "spot"
    value: "true"
    effect: "NoSchedule"

# Add to workload deployments that can handle spot
tolerations:
  - key: "spot"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"

Use Pod Disruption Budgets (PDB) to prevent too many replicas being evicted at once:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: my-app

Typical savings from spot strategy: 40-60% of node compute cost for workloads that can be spot-tolerant. Combined with right-sizing, this is often the largest lever.


Step 4: Namespace ResourceQuota and LimitRange

ResourceQuota prevents any single team or namespace from consuming unlimited cluster resources. Without it, one runaway deployment can starve other workloads.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-a
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    pods: "50"
    persistentvolumeclaims: "10"

LimitRange sets default requests/limits so containers without explicit resource specs don’t get scheduled with no limits at all (which is the default behavior — dangerous):

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-a
spec:
  limits:
    - default:
        cpu: "500m"
        memory: "512Mi"
      defaultRequest:
        cpu: "100m"
        memory: "128Mi"
      type: Container

The combination of ResourceQuota + LimitRange means:

  1. Every container gets sensible defaults even if developers forget to set requests
  2. No namespace can grow unbounded and starve the cluster
  3. Kubecost can accurately attribute costs to teams

Step 5: Detect and Remove Idle Workloads

Idle workloads are invisible waste — staging environments left running, feature branch deployments never deleted, test namespaces from 6 months ago.

Tools for detecting idle workloads:

# Find deployments with 0 replicas (paused or crashed)
kubectl get deployments -A | grep "0/0"

# Find pods with very low CPU usage (requires metrics-server)
kubectl top pods -A --sort-by=cpu | tail -20

# Kubecost idle cost report
# In the Kubecost UI: Cost Allocation → Filter by "Idle"

Namespace lifecycle automation — use tools like Kube-Downscaler to automatically scale down non-production namespaces outside business hours:

# Annotation on a namespace or deployment
annotations:
  downscaler/uptime: "Mon-Fri 07:00-20:00 Europe/London"

This alone can cut staging environment costs by 65% (16 hours/day × 2 weekend days = ~68% of week is off-hours).


Step 6: Storage and Networking Cost Reduction

Compute gets the attention, but storage and networking costs are often 15-25% of total K8s spend and frequently ignored.

Storage optimization:

  • Audit PersistentVolumeClaims: kubectl get pvc -A — look for PVCs not bound to any running pod (status Bound but pod deleted)
  • Use storage classes with the right performance tier — gp3 vs gp2 on EKS (gp3 is cheaper and faster)
  • Set PVC retention policy: reclaimPolicy: Delete for ephemeral workloads

Networking cost reduction:

  • Cross-AZ traffic is the hidden killer — same-AZ scheduling via topologySpreadConstraints can cut data transfer costs significantly
  • Use internal load balancers where external isn’t needed
  • Aggregate small services behind a single ingress controller rather than one LoadBalancer per service
# Prefer same-zone scheduling
topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: my-app

Putting It Together: A 90-Day Cost Optimization Plan

Week 1-2: Install Kubecost, identify top 3 cost centers, baseline efficiency score.

Week 3-4: Deploy VPA in recommendation mode across all namespaces. Identify the 10 most over-provisioned workloads.

Month 2: Implement ResourceQuota and LimitRange for all namespaces. Right-size the top 10 workloads based on VPA recommendations.

Month 3: Implement spot node pools for tolerant workloads. Set up namespace downscaling for dev/staging. Establish monthly cost review process.

Typical 90-day outcome: 35-50% reduction in K8s compute spend.


Ready to Cut Your K8s Bill?

These strategies require careful implementation — right-sizing production workloads without causing incidents takes experience with failure modes.

K8s Cost Optimization service at kubernetes.ae — our team conducts a cost audit, implements optimizations, and guarantees measurable savings.

Get Expert Kubernetes Help

Talk to a certified Kubernetes expert. Free 30-minute consultation — actionable findings within days.

Talk to an Expert