Kubernetes Resource Limits and Requests
Avoid OOMKilled with correct sizing.
Understanding Resource Requests and Limits
Misconfigured resource settings are the #1 cause of Kubernetes reliability issues. Requests determine scheduling, limits prevent resource starvation, and the ratio between them determines your Quality of Service class. Getting this right is critical for both stability and cost efficiency.
Requests vs Limits
- Requests — The guaranteed amount of CPU/memory. The scheduler uses requests to decide which node can accommodate a pod. If a node has 2 CPU and 3.5GB allocated via requests, a pod requesting 1 CPU won't fit.
- Limits — The maximum amount a container can use. Exceeding memory limits triggers an OOMKill. Exceeding CPU limits causes throttling.
Quality of Service (QoS) Classes
| QoS Class | Condition | Eviction Priority |
|---|---|---|
| Guaranteed | Requests = Limits for all containers | Last to be evicted |
| Burstable | At least one container has requests < limits | Evicted after BestEffort |
| BestEffort | No requests or limits set | First to be evicted |
Practical Sizing Strategy
- Start with metrics — Run your application for at least 48 hours and observe actual CPU and memory usage
- Set requests to P95 usage — This covers 95% of normal operation
- Set limits to 2-3x requests — Allow burst capacity for traffic spikes
- Never set CPU limits too tight — CPU throttling causes latency spikes that are hard to debug
Example Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service
spec:
replicas: 3
template:
spec:
containers:
- name: api
image: api:v1.2.0
resources:
requests:
cpu: 250m # 0.25 CPU cores
memory: 256Mi # 256 MB
limits:
cpu: "1" # 1 CPU core (4x request)
memory: 512Mi # 512 MB (2x request)Common Mistakes
- No requests set — Pods get BestEffort QoS and are first to be evicted under pressure
- Requests too high — Wastes cluster resources; nodes appear full but are actually idle
- Memory limits too low — Causes frequent OOMKills and pod restarts
- CPU limits too tight — Causes throttling; application appears slow but uses less than 50% CPU in metrics
- Same values for all services — A Go API server needs different resources than a Java application
Detecting OOMKilled Pods
# Find OOMKilled pods
kubectl get pods --all-namespaces -o json | jq '.items[] | select(.status.containerStatuses[]?.lastState.terminated.reason == "OOMKilled") | .metadata.name'
# Check current resource usage vs limits
kubectl top pods --containersVertical Pod Autoscaler (VPA)
Let Kubernetes recommend or automatically set resource values based on actual usage:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-service
updatePolicy:
updateMode: "Off" # Start with recommendations onlyUse updateMode: "Off" initially to get recommendations without automatic changes. Review suggestions before switching to "Auto".
Eazy SaaS Tip: We run a resource audit for our Kubernetes clients quarterly. In most cases, we find 30-40% of cluster resources are allocated but unused. Right-sizing these deployments directly reduces your cloud bill.