Kubernetes Cost Optimization: The Complete Guide to Cutting K8s Bills
Practical Kubernetes cost optimization strategies: right-sizing with VPA, spot nodes, ResourceQuota, Kubecost, and how to cut your K8s cloud bills by 40%.
30-40% of Kubernetes spend is typically unused reserved capacity. That’s not an estimate — it’s what we see consistently when we run cost audits on clusters that haven’t been optimized. Workloads are over-provisioned, idle resources accumulate, and no one is tracking which teams are responsible for which spend.
This guide walks through every lever available for Kubernetes cost optimization, from request right-sizing to spot node strategy to FinOps tooling. We’ll go from “vague awareness that K8s is expensive” to “specific actions with dollar estimates.”
Why Kubernetes Clusters Overspend
Before attacking the problem, understand why it happens:
- Developers over-request resources — because under-requesting causes OOMKills and CPU throttling. Teams learn: request more than you need.
- No chargeback or showback — without cost visibility per team/namespace, there’s no incentive to right-size.
- Cluster autoscaler provisions for peaks — nodes that were needed at 2am Friday are still running at 2pm Tuesday.
- Reserved capacity is underutilized — reserved instances or savings plans purchased for predicted peak load often run at 20-30% utilization.
- No governance at namespace level — without ResourceQuota, any team can request unlimited resources.
Step 1: Get Visibility with Kubecost
You can’t optimize what you can’t measure. Kubecost is the industry-standard tool for Kubernetes cost visibility. Install it before making any other changes.
# Install Kubecost via Helm
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm repo update
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost \
--create-namespace \
--set kubecostToken="<your-token>"
# Access the UI
kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090:9090
Kubecost allocates costs by namespace, deployment, pod, label, and team. Within 24 hours of installation, you’ll see:
- Cost per namespace — find the top 3 cost centers
- Idle cost — resources provisioned but not used
- Efficiency score — ratio of requested vs actually used
A cluster efficiency score below 50% is a strong signal that right-sizing is urgently needed. We frequently see scores of 20-35% on first audit.
Step 2: Right-Sizing with VPA and Goldilocks
Vertical Pod Autoscaler (VPA) analyzes historical CPU and memory usage and recommends (or automatically sets) optimal resource requests and limits.
Start with Recommender mode only — never jump straight to Auto mode on production workloads.
# Install VPA
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler.yaml
# Create a VPA object in Recommendation mode
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Off" # Recommender only — no automatic changes
After 24-48 hours, check recommendations:
kubectl describe vpa my-app-vpa
You’ll see lowerBound, target, and upperBound recommendations. Use target as your new resource requests, then gradually adjust limits.
Goldilocks is a UI wrapper around VPA that visualizes recommendations across all namespaces — much easier to act on at scale:
helm install goldilocks fairwinds-stable/goldilocks \
--namespace goldilocks \
--create-namespace
# Label a namespace to enable VPA recommendations
kubectl label namespace production goldilocks.fairwinds.com/enabled=true
Typical savings from right-sizing: 20-35% of compute spend. The largest wins are usually on Java-based workloads that were given 4GB memory requests “just in case” but actually use 800MB at steady state.
Step 3: Spot/Preemptible Node Strategy
Spot instances (AWS) and preemptible nodes (GCP) offer 60-90% savings over on-demand pricing — but they can be terminated with short notice (2 minutes on AWS, 30 seconds on GCP).
The key is making your workloads tolerate interruptions. Here’s a practical framework:
Tier your workloads:
| Workload Type | Spot-safe? | Notes |
|---|---|---|
| Batch jobs, ML training | Yes | Design for restart |
| Stateless web services with replicas ≥3 | Yes | Disruption is tolerable |
| Stateful services (databases) | No | Risk of data corruption |
| Single-replica deployments | No | Interruption = outage |
Node pool design for EKS:
# On-demand node group for critical workloads
- name: critical-on-demand
instanceType: m6i.xlarge
desiredCapacity: 3
# Spot node group for tolerant workloads
- name: batch-spot
instanceTypes:
- m6i.2xlarge
- m5.2xlarge # Multiple instance types increase availability
- m5n.2xlarge
capacityType: SPOT
desiredCapacity: 10
Taint spot nodes so only explicitly tolerant workloads land on them:
# Add to node group config
taints:
- key: "spot"
value: "true"
effect: "NoSchedule"
# Add to workload deployments that can handle spot
tolerations:
- key: "spot"
operator: "Equal"
value: "true"
effect: "NoSchedule"
Use Pod Disruption Budgets (PDB) to prevent too many replicas being evicted at once:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-app-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: my-app
Typical savings from spot strategy: 40-60% of node compute cost for workloads that can be spot-tolerant. Combined with right-sizing, this is often the largest lever.
Step 4: Namespace ResourceQuota and LimitRange
ResourceQuota prevents any single team or namespace from consuming unlimited cluster resources. Without it, one runaway deployment can starve other workloads.
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-quota
namespace: team-a
spec:
hard:
requests.cpu: "10"
requests.memory: 20Gi
limits.cpu: "20"
limits.memory: 40Gi
pods: "50"
persistentvolumeclaims: "10"
LimitRange sets default requests/limits so containers without explicit resource specs don’t get scheduled with no limits at all (which is the default behavior — dangerous):
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: team-a
spec:
limits:
- default:
cpu: "500m"
memory: "512Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
type: Container
The combination of ResourceQuota + LimitRange means:
- Every container gets sensible defaults even if developers forget to set requests
- No namespace can grow unbounded and starve the cluster
- Kubecost can accurately attribute costs to teams
Step 5: Detect and Remove Idle Workloads
Idle workloads are invisible waste — staging environments left running, feature branch deployments never deleted, test namespaces from 6 months ago.
Tools for detecting idle workloads:
# Find deployments with 0 replicas (paused or crashed)
kubectl get deployments -A | grep "0/0"
# Find pods with very low CPU usage (requires metrics-server)
kubectl top pods -A --sort-by=cpu | tail -20
# Kubecost idle cost report
# In the Kubecost UI: Cost Allocation → Filter by "Idle"
Namespace lifecycle automation — use tools like Kube-Downscaler to automatically scale down non-production namespaces outside business hours:
# Annotation on a namespace or deployment
annotations:
downscaler/uptime: "Mon-Fri 07:00-20:00 Europe/London"
This alone can cut staging environment costs by 65% (16 hours/day × 2 weekend days = ~68% of week is off-hours).
Step 6: Storage and Networking Cost Reduction
Compute gets the attention, but storage and networking costs are often 15-25% of total K8s spend and frequently ignored.
Storage optimization:
- Audit PersistentVolumeClaims:
kubectl get pvc -A— look for PVCs not bound to any running pod (statusBoundbut pod deleted) - Use storage classes with the right performance tier — gp3 vs gp2 on EKS (gp3 is cheaper and faster)
- Set PVC retention policy:
reclaimPolicy: Deletefor ephemeral workloads
Networking cost reduction:
- Cross-AZ traffic is the hidden killer — same-AZ scheduling via
topologySpreadConstraintscan cut data transfer costs significantly - Use internal load balancers where external isn’t needed
- Aggregate small services behind a single ingress controller rather than one LoadBalancer per service
# Prefer same-zone scheduling
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: my-app
Putting It Together: A 90-Day Cost Optimization Plan
Week 1-2: Install Kubecost, identify top 3 cost centers, baseline efficiency score.
Week 3-4: Deploy VPA in recommendation mode across all namespaces. Identify the 10 most over-provisioned workloads.
Month 2: Implement ResourceQuota and LimitRange for all namespaces. Right-size the top 10 workloads based on VPA recommendations.
Month 3: Implement spot node pools for tolerant workloads. Set up namespace downscaling for dev/staging. Establish monthly cost review process.
Typical 90-day outcome: 35-50% reduction in K8s compute spend.
Ready to Cut Your K8s Bill?
These strategies require careful implementation — right-sizing production workloads without causing incidents takes experience with failure modes.
→ K8s Cost Optimization service at kubernetes.ae — our team conducts a cost audit, implements optimizations, and guarantees measurable savings.
Get Expert Kubernetes Help
Talk to a certified Kubernetes expert. Free 30-minute consultation — actionable findings within days.
Talk to an Expert