Kubernetes In-Place Pod Resize (1.35 GA): Build Your Own ScaleOps in a Weekend
Kubernetes 1.35 shipped in-place pod resize as GA. Here is how kubelet rewrites cgroup values on running pods, what ScaleOps, StormForge, and Cast AI actually do on top of it, and a ~500 line Go controller that replicates the core rightsizing loop.
Kubernetes 1.35 shipped in December 2025 with one feature that quietly undercuts a pile of commercial rightsizing SaaS: in-place pod vertical scaling is now GA. You can change a running pod’s CPU and memory allocation without a restart, a reschedule, or an IP change. The kubelet writes new values directly to Linux cgroups, and the kernel applies them to the already-running process.
This is the exact mechanism ScaleOps, StormForge, and Cast AI use for their “no restart rightsizing” claim. It is not proprietary. It is a Kubernetes feature any controller can call. This post covers how the feature actually works, what commercial tools add on top, and a ~500 line Go controller that replicates the core loop in a weekend.
What In-Place Pod Resize Actually Does
Before 1.27, the resources.requests and resources.limits fields on a Pod were immutable after admission. Any change required delete-and-recreate. This was not a Linux limitation - cgroups have supported live resize for over a decade - it was a Kubernetes declarative-model limitation. The scheduler had already made a placement decision based on requests, and mutating them retroactively broke the contract.
Kubernetes 1.27 (May 2023) introduced the feature as alpha with a new /resize subresource on the Pod object. 1.33 moved it to beta (feature gate on by default). 1.35 (December 2025) graduated it to GA with stable API guarantees.
The API flow:
- A controller (VPA, ScaleOps, your own) writes new resource values to
Pod.spec.containers[*].resourcesvia the/resizesubresource. This is the only API path that can mutate resources on a running pod. - The API server accepts the request and sets a
Resizecondition on the Pod status. Possible states:Proposed,InProgress,Deferred(node cannot accommodate right now, retry later),Infeasible(will never fit on this node). - The kubelet on the target node picks up the change and writes the new values to the container’s cgroup files directly.
- The Linux kernel enforces the new limits immediately on the running process. No SIGTERM, no container restart, no new PID.
The cgroup files that matter:
/sys/fs/cgroup/kubepods.slice/.../<container>/
cpu.max # CPU limit - format "quota period" (e.g. "50000 100000" = 0.5 CPU)
cpu.weight # CPU request mapped to relative share weight
memory.max # Memory limit - hard cap, exceeding triggers OOM kill
memory.high # Memory soft throttle (cgroup v2)
On cgroup v2 systems (which is every modern Linux kernel from 2022 onwards), the kubelet writes directly to these files. The kernel re-evaluates throttling and memory pressure on the next scheduler tick. For the running process, the change is invisible at the userspace level - no signal, no syscall interruption, just a different ceiling.
Requests vs Limits at the Kernel Level
The mental model of “requests are a scheduler hint, limits are a hard cap” is mostly right but hides what actually happens at the kernel level. This matters for understanding which resize operations are safe in-place.
CPU requests map to cpu.weight - a relative share value (range 1-10000, default 100). Under CPU contention, the kernel Completely Fair Scheduler allocates CPU time proportionally to the weight. Not a guarantee of any specific vCPU count, just a share during contention. Changing this in-place is always safe. The kernel adjusts scheduling on the next tick, and nothing visible to userspace changes.
CPU limits map to cpu.max - a CFS bandwidth quota (format: quota period, microseconds per period). When the container exceeds the quota in a 100ms period, it is throttled until the next period. Changing this in-place is safe for increases. Decreases are also safe but the process will experience more throttling immediately if it was already consuming more than the new limit.
Memory requests map to nothing at the kernel level. They are purely a scheduler-visible field used for placement. In-place changes have no kernel effect - only the scheduler cares for future placement decisions.
Memory limits map to cgroup.memory.max - a hard cap. Exceeding this triggers the Linux OOM killer, which kills one of the processes in the cgroup. Increases are safe. Decreases are dangerous: if the process’s resident set size (RSS) already exceeds the new limit when kubelet writes it, the kernel OOM-kills the process within milliseconds. There is no grace period.
This is why Kubernetes ships a resizePolicy field per resource:
resources:
requests: { cpu: "250m", memory: "256Mi" }
limits: { cpu: "1", memory: "512Mi" }
resizePolicy:
- resourceName: cpu
restartPolicy: NotRequired # CPU resize safe in-place
- resourceName: memory
restartPolicy: RestartContainer # Memory resize restarts the container
For JVM, Redis, PostgreSQL, and any application that reads memory config at boot, RestartContainer is the correct policy on memory changes - the restart is required anyway to make the new ceiling usable. For generic stateless Go / Node / Python workloads, NotRequired is safe.
What ScaleOps Actually Does On Top
Strip away the dashboard and the GPT-flavoured marketing, and ScaleOps is a control loop with a policy engine. The core loop is small:
- Scrape metrics - actual CPU and memory usage per container from metrics-server or Prometheus (typically 1-minute resolution, 14-day retention).
- Compute new values - P95 or P99 of recent usage, plus a headroom factor (typically 15-30%), bounded by floor/ceiling policy.
- Classify workload - detect JVM (JAR main class,
-Xmxflag), Gunicorn (master process + fixed workers), PostgreSQL (shared memory segments), etc. Pick the rightresizePolicyand cadence. - Call
/resize- patch the Pod subresource with the new values. - Monitor
Resizecondition - ifDeferred(node oversubscribed), back off and retry later. IfInfeasible, trigger a rolling replace via Deployment rollout instead. - Fall back to rolling replace for workloads where
resizePolicy: RestartContaineris required on the changed resource, or where in-place is not viable.
Everything else is productization: multi-cluster UI, RBAC for policy changes, audit trails, bin-packing optimization (consolidating the freed capacity onto fewer nodes via Karpenter / Cluster Autoscaler tuning), workload-type libraries with hundreds of heuristics, automatic rollback on SLI degradation.
Build Your Own: ~500 Line Go Controller
The minimal PoC, sketched in Go with client-go:
package main
import (
"context"
"fmt"
"time"
corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/resource"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/rest"
metrics "k8s.io/metrics/pkg/client/clientset/versioned"
)
func main() {
cfg, _ := rest.InClusterConfig()
client, _ := kubernetes.NewForConfig(cfg)
mc, _ := metrics.NewForConfig(cfg)
ticker := time.NewTicker(10 * time.Minute)
for range ticker.C {
rightsizeAll(client, mc, "production")
}
}
func rightsizeAll(c *kubernetes.Clientset, mc *metrics.Clientset, ns string) {
pods, _ := c.CoreV1().Pods(ns).List(context.TODO(), metav1.ListOptions{})
for _, pod := range pods.Items {
if !eligible(&pod) {
continue
}
usage, err := fetchUsage(mc, &pod)
if err != nil {
continue
}
newSpec := computeNewRequests(usage, &pod)
if sameAsCurrent(newSpec, &pod) {
continue
}
applyResize(c, &pod, newSpec)
}
}
func computeNewRequests(u usage, p *corev1.Pod) corev1.ResourceList {
cpu := int64(float64(u.p95CPU) * 1.25) // 25% headroom over P95
mem := int64(float64(u.p95Mem) * 1.30) // 30% headroom over P95
// floor at 50m CPU and 64Mi mem to avoid starvation
if cpu < 50 { cpu = 50 }
if mem < 64*1024*1024 { mem = 64 * 1024 * 1024 }
return corev1.ResourceList{
corev1.ResourceCPU: *resource.NewMilliQuantity(cpu, resource.DecimalSI),
corev1.ResourceMemory: *resource.NewQuantity(mem, resource.BinarySI),
}
}
func applyResize(c *kubernetes.Clientset, p *corev1.Pod, r corev1.ResourceList) {
patch := fmt.Sprintf(`{"spec":{"containers":[{"name":"%s","resources":{"requests":%s}}]}}`,
p.Spec.Containers[0].Name, toJSON(r))
_, err := c.CoreV1().Pods(p.Namespace).Patch(
context.TODO(),
p.Name,
types.StrategicMergePatchType,
[]byte(patch),
metav1.PatchOptions{},
"resize", // <-- the /resize subresource
)
if err != nil {
// log; next tick will retry
return
}
// watch for Resize condition to confirm apply or handle Deferred/Infeasible
}
func eligible(p *corev1.Pod) bool {
// skip DaemonSets, Jobs, workloads with annotation rightsize.io/skip=true
// skip JVM / Gunicorn / Postgres (detected via container command/args heuristics)
// only handle Running pods older than 10 minutes
return p.Status.Phase == corev1.PodRunning &&
time.Since(p.CreationTimestamp.Time) > 10*time.Minute
}
Plus a fetchUsage() that calls the metrics API or Prometheus HTTP API for recent samples, a sameAsCurrent() that does relative-change thresholding (skip resize if new value is within 10% of current), and an eligible() that encodes the workload-type exclusions. Total: somewhere between 400 and 700 lines of Go depending on how deep you go on the policy heuristics.
With Claude Code pairing, this is a weekend project, not a quarter. Drop it into a cluster as a Deployment with appropriate RBAC (pods/resize verb, metrics.k8s.io read), run it in a namespace, observe savings.
The 30% Commercial Tools Still Earn
The PoC above covers the core loop. It does not cover what turns a prototype into a production platform:
Workload type library. ScaleOps ships heuristics for hundreds of workload signatures. JVM detection via container command + JAR metadata. Python concurrency detection via process count. Database detection via image names, volume mounts, and shared-memory behaviour. Replicating this deeply takes months, not days.
Multi-cluster governance. Organization-wide policies, team-level RBAC, per-namespace cost budgets, chargeback reports. Single-cluster PoCs skip this entirely.
Bin-packing integration. Rightsizing pods is half the win. Consolidating the freed capacity onto fewer nodes via coordinated Karpenter / Cluster Autoscaler tuning is the other half. Requires tight integration with node provisioner state and careful handling of pod disruption budgets.
Safety and rollback. When a rightsize causes SLI degradation (latency spike, error rate increase), the tool must detect and automatically roll back within minutes. Requires integration with SLO platforms, PrometheusAlert receivers, and Argo Rollouts-style analysis.
Audit trail and compliance. Who changed what, when, and why. Required for SOC 2, ISO 27001, and regulated-industry audits.
Operator burden. Someone has to maintain the controller, upgrade it across Kubernetes versions, and handle the long tail of edge cases. Commercial tools amortize this across their customer base.
Build vs Buy: The 2026 Calculus
The honest 2026 decision framework for Kubernetes rightsizing:
Build if:
- Single cluster, one or two teams
- Homogeneous workloads (mostly stateless Go / Node / Python services)
- You already run a platform team with Go / controller-runtime experience
- Your current rightsizing SaaS bill is under $30k/year
- You can absorb a ~2 week build + ongoing ~10% of one engineer’s time
Buy if:
- Multi-cluster, multi-team, heterogeneous workload mix
- You run stateful workloads (databases, JVM-heavy apps, Kafka) that need sophisticated handling
- You need enterprise governance, audit, and RBAC across teams
- Your current over-provisioning exceeds $500k/year (commercial tool ROI is obvious)
- You do not have platform engineering capacity for in-cluster controllers
The shifted reality: the middle ground used to be “buy, because building is a quarter”. It is now “build, because building is a sprint”. A pile of commercial rightsizing startups built their pitch on the gap between Kubernetes primitives and usable automation. That gap has narrowed. AI-assisted development has narrowed it further.
Signal Worth Tracking
In-place pod resize going GA is a category-defining moment for Kubernetes FinOps tooling. Before 1.35, commercial rightsizing tools were genuinely easier than building - they abstracted feature gates, alpha APIs, and the absence of a stable resize subresource. After 1.35, the underlying primitive is stable, standard, and well-documented.
Expect 2026 to see:
- VPA 1.4+ with native
InPlaceOnlymode - More open-source controllers (KEDA resource-scaling plugins, Fairwinds updates to Goldilocks)
- Pricing pressure on the commercial category - particularly at the low end
- Consolidation - smaller rightsizing pure-plays either get acquired (StormForge/F5 was the early signal) or pivot to bin-packing / multi-cluster governance
For platform teams: this is a good moment to rerun the build-vs-buy math. The answer may have changed.
Further Reading
- Kubernetes docs: Resize CPU and Memory Resources assigned to Containers
- KEP-1287: In-Place Update of Pod Resources
- Kubernetes 1.27 release blog: In-place Resource Resize alpha
- Related post: Kubernetes Cost Optimization Tools 2026: Cast AI vs ScaleOps vs StormForge
- Related post: Kubernetes Node Sizing: Right-Size Your Cluster and Cut Costs
Need help deciding? We run build-vs-buy assessments for Kubernetes rightsizing tooling, covering workload inventory, in-place resize readiness, and multi-cluster governance design. Get in touch.
Frequently Asked Questions
What is Kubernetes in-place pod resize?
In-place pod resize is a Kubernetes feature that lets you change a running pod's CPU and memory requests and limits without restarting the container. Introduced as alpha in Kubernetes 1.27 (May 2023), promoted to beta in 1.33, and went GA in Kubernetes 1.35 (December 2025). It exposes a new /resize subresource on the Pod object and a Resize status condition. When a resize is accepted, the kubelet writes new values directly to the container's Linux cgroup files (cpu.max, cpu.weight, memory.max) and the kernel enforces the new limits on the already-running process. No pod restart, no reschedule, no IP change.
How does ScaleOps change CPU and memory without restarting pods?
ScaleOps uses the native Kubernetes in-place pod resize feature. Its controller observes workload metrics, computes new CPU and memory values, and calls the Pod /resize subresource. The actual cgroup rewrite is done by kubelet. ScaleOps adds a policy engine on top for workload-type detection (JVM, stateful, batch, stateless), smart rollouts for workloads that cannot hot-swap resources, and bin-packing across the cluster after rightsizing. The no-restart mechanism itself is a standard Kubernetes feature available to any controller, including ones you can build in-house.
Which workloads cannot be resized in-place?
Several common workload types do not benefit from in-place resize even when kubelet succeeds at changing the cgroup. JVM applications with fixed -Xmx ignore the new cgroup memory ceiling because the heap size is set at startup. Python Gunicorn and Ruby Unicorn applications with fixed worker counts cannot use new CPU beyond their concurrency cap. PostgreSQL shared_buffers, Redis maxmemory, and similar config-file-driven limits are read once at boot. Memory decreases are risky because if the current RSS exceeds the new limit, the kernel OOM-kills the process immediately. For these cases, Kubernetes resizePolicy: RestartContainer forces a restart for the affected resource, or a rolling replace is triggered instead of in-place.
Do I still need ScaleOps or StormForge if in-place resize is GA?
Depends on scale and maturity. For small to mid-size clusters running generic stateless workloads, a home-built controller using in-place resize + Prometheus metrics covers roughly 70% of what commercial tools deliver. For large multi-cluster environments with heterogeneous workloads, commercial tools still earn their price through workload-type detection, bin-packing optimization, multi-cluster governance, audit trails, JVM heap awareness, and SRE-grade safety (automated rollback on SLI degradation). The 2026 build-vs-buy calculation has shifted materially, but not flipped entirely.
What Kubernetes versions support in-place pod resize?
Kubernetes 1.27 through 1.32 had the feature behind the InPlacePodVerticalScaling feature gate (alpha), which required cluster admin activation and was not recommended for production. Kubernetes 1.33 promoted it to beta (feature gate on by default). Kubernetes 1.35 (December 2025) graduated it to GA - no feature gate required, stable API guarantees. Clusters below 1.27 cannot use in-place resize at all and require delete-recreate for any resource change. EKS, GKE, and AKS support it on their 1.35+ releases.
What is the difference between VPA and in-place pod resize?
VPA (Vertical Pod Autoscaler) is a controller that recommends or applies resource changes; in-place pod resize is the Kubernetes mechanism VPA uses to apply those changes without restart. Before 1.27 alpha, VPA in auto mode had to evict and recreate pods to apply recommendations - disruptive and slow. VPA 1.2+ introduced the InPlaceOrRecreate update mode, which tries in-place first and falls back to eviction. So VPA is the brain; in-place resize is the hands. ScaleOps and StormForge replace VPA with more sophisticated brains using the same hands.
How do I test in-place pod resize on an existing cluster?
On Kubernetes 1.35+ clusters, run: kubectl patch pod
Complementary NomadX Services
Get Expert Kubernetes Help
Talk to a certified Kubernetes expert. Free 30-minute consultation - actionable findings within days.
Talk to an Expert