Module 3: Developer Track - The "Zero Request" Myth
Duration: 90 minutes
Learning Objectives
By the end of this module, you will be able to:
-
Debug "OOMKilled" events and CPU Throttling in production
-
Understand Kubernetes QoS Classes (Guaranteed, Burstable, BestEffort)
-
Implement Horizontal Pod Autoscaler (HPA) using right-sized metrics
-
Use historical Prometheus data to accurately size resource requests
Understanding Quality of Service (QoS)
Many developers feel that requiring CPU/Memory requests is "hacky" or restrictive. This module proves why Zero-Requests are dangerous, especially on bare-metal or constrained environments.
The Three QoS Classes
Kubernetes assigns every pod to one of three QoS classes based on its resource configuration:
| QoS Class | Definition | Eviction Priority | Use Case |
|---|---|---|---|
Guaranteed |
|
Lowest (protected) |
Critical production workloads |
Burstable |
Has requests, but |
Medium |
Most production apps |
BestEffort |
NO requests or limits |
Highest (first to die) |
Dev/test only |
|
The Danger of BestEffort: When a node runs out of memory, Kubernetes evicts pods in this order:
If you deploy without requests, your pod is first in line for termination. |
How to Check Your Pod’s QoS Class
oc get pod -n capacity-workshop -l app=besteffort-app -o jsonpath='{.items[0].status.qosClass}'
Burstable
|
Why does The This is intentional in production clusters. Platform engineers use ResourceQuota to prevent true BestEffort pods from silently starving other workloads. The
|
Now check a properly configured pod:
oc get pod -n capacity-workshop -l app=guaranteed-app -o jsonpath='{.items[0].status.qosClass}'
Guaranteed
Lab 3: The Throttling Simulator
|
All terminal commands in this lab run on your student cluster. If your SSH session ended, reconnect before continuing:
SSH password: After connecting, log in to your student cluster:
|
In this hands-on lab, you’ll experience the pain of zero-request deployments, then fix them using Prometheus historical data.
Part 1: The Trap - Deploy Without Requests
We’ve pre-deployed a load generator in your capacity-workshop namespace. Let’s examine its configuration:
oc get deployment load-generator -n capacity-workshop -o yaml | grep -A 10 "resources:"
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
This is a Burstable pod (requests < limits). Now let’s look at the besteffort-app:
oc get deployment besteffort-app -n capacity-workshop -o yaml | grep -A 5 "resources:"
resources:
requests:
cpu: 10m
memory: 16Mi
limits:
cpu: 100m
memory: 64Mi
Requests and limits are set, but intentionally minimal — 10m CPU and 16Mi memory requests are far below what a real workload needs. Because requests < limits, this pod has a Burstable QoS class, not BestEffort. Under memory pressure it sits near the bottom of the eviction stack, just above a true BestEffort pod — making it one of the first Burstable pods to be evicted.
Part 2: The Squeeze - Launch a Noisy Neighbor
Now we’ll create a "noisy neighbor" pod that consumes excessive CPU, forcing the scheduler to throttle other pods.
First, check current CPU usage:
oc adm top pods -n capacity-workshop
NAME CPU(cores) MEMORY(bytes)
besteffort-app-5598f76d5b-9jtz4 1m 12Mi
load-generator-67567c576-hdvkj 102m 64Mi
Now scale up the noisy-neighbor deployment (currently at 0 replicas):
oc scale deployment noisy-neighbor -n capacity-workshop --replicas=1
Wait 30 seconds for it to start consuming CPU, then check again:
oc adm top pods -n capacity-workshop
NAME CPU(cores) MEMORY(bytes)
besteffort-app-5598f76d5b-9jtz4 1m 12Mi
load-generator-67567c576-hdvkj 45m 64Mi ← THROTTLED!
noisy-neighbor-7d9c8b6f5d-xk2p9 980m 128Mi ← Consuming everything
|
What Just Happened? The
Real-World Impact: If this were a production API, response times would spike from 50ms to 500ms+, causing user-facing outages. |
Download Lab Scripts
Download the right-sizer script, kube-burner, and the stress configs before continuing.
mkdir -p ~/module-03 && cd ~/module-03
BASE=https://raw.githubusercontent.com/tosin2013/capacity-planning-lab-guide/main
curl -fsSO $BASE/content/modules/ROOT/examples/module-03/resource-right-sizer.sh
curl -fsSO $BASE/content/modules/ROOT/examples/module-03/stress-config.yaml
curl -fsSO $BASE/content/modules/ROOT/examples/module-03/stress-pod.yaml
curl -fsSO $BASE/content/modules/ROOT/examples/module-03/oom-demo.sh
chmod +x resource-right-sizer.sh oom-demo.sh
ls -lh
-rwxr-xr-x. 1 lab-user lab-user 3.2K Apr 20 oom-demo.sh
-rwxr-xr-x. 1 lab-user lab-user 4.1K Apr 20 resource-right-sizer.sh
-rw-r--r--. 1 lab-user lab-user 512 Apr 20 stress-config.yaml
-rw-r--r--. 1 lab-user lab-user 480 Apr 20 stress-pod.yaml
Now download kube-burner — a Kubernetes workload generator used to deploy the throttling stress scenario:
cd ~/module-03
curl -sL https://github.com/kube-burner/kube-burner/releases/download/v2.6.1/kube-burner-V2.6.1-linux-x86_64.tar.gz \
| tar -xz kube-burner
chmod +x kube-burner
./kube-burner version
Version: 2.6.1
Git Commit: ...
Build Date: ...
Part 3: Observe CPU Throttling Metrics
CFS (Completely Fair Scheduler) throttling only fires when a container tries to use more than its own CPU limit — it is not caused by noisy neighbors. To generate a real, Prometheus-visible throttling event, you will use kube-burner to deploy a dedicated stress pod that has a tight CPU limit (200m) and immediately saturates it.
cd ~/module-03
./kube-burner init -c stress-config.yaml
INFO[...] 📁 Creating object cpu-throttle-demo-1-1 in namespace capacity-workshop
INFO[...] ✅ Job cpu-throttle-demo completed
Verify the stress pod is running and already hitting its CPU limit:
oc get pod -n capacity-workshop -l app=cpu-throttle-demo
oc adm top pod -n capacity-workshop -l app=cpu-throttle-demo
NAME READY STATUS RESTARTS AGE
cpu-throttle-demo-1-1 1/1 Running 0 15s
NAME CPU(cores) MEMORY(bytes)
cpu-throttle-demo-1-1 200m 2Mi ← Capped at limit
Wait 90 seconds for Prometheus to scrape the throttling metric, then run the right-sizer against the stress pod:
sleep 90
POD_SELECTOR="cpu-throttle-demo.*" NAMESPACE=capacity-workshop ~/module-03/resource-right-sizer.sh
══════════════════════════════════════════════════
Module 3 — Resource Right-Sizer
══════════════════════════════════════════════════
[INFO] Namespace : capacity-workshop
[INFO] Pod selector : cpu-throttle-demo.*
[INFO] Window : 7 days
[INFO] Step 1/4 — Locating Prometheus route …
[OK] Prometheus : https://prometheus-k8s-openshift-monitoring.apps.student.example.com
[INFO] Step 2/4 — Minting Prometheus service-account token …
[OK] Token acquired (1287 chars)
[INFO] Step 3/4 — Measuring CPU throttling rate …
[OK] Throttling rate : 0.810 (81.0% of CPU time throttled — SEVERE)
[WARN] Throttling above 5% indicates the CPU limit is too low for this workload.
Kernel CFS is pausing the container to stay within its CPU limit.
[INFO] Step 4/4 — Calculating 95th-percentile resource usage (1d window) …
┌────────────────────────────────────────────────────────────┐
│ RIGHT-SIZING RECOMMENDATIONS │
├────────────────────────────────────────────────────────────┤
│ P95 CPU usage (raw) : 200.0m cores │
│ P95 Memory usage (raw) : 2.0Mi │
├────────────────────────────────────────────────────────────┤
│ Recommended CPU request : 200m (= P95, rounded up) │
│ Recommended CPU limit : 400m (= 2× request) │
│ Recommended Memory request: 8Mi (= P95 + 20% buffer) │
│ Recommended Memory limit : 8Mi (= P95 + 50% buffer) │
└────────────────────────────────────────────────────────────┘
Now run the right-sizer against your actual load-generator to get its right-sizing recommendation for Part 4:
NAMESPACE=capacity-workshop ~/module-03/resource-right-sizer.sh
|
Throttling Explained: When a container tries to use more CPU than its limit, the Linux CFS (Completely Fair Scheduler) throttles it by pausing the process periodically. This is a per-container mechanism — it has nothing to do with noisy neighbors.
The stress pod above is trying to use ~1000m CPU but is capped at 200m — the kernel throttles it to 80%+ of its allowed time. This is invisible to application developers unless they’re monitoring kernel metrics! |
Part 4: The Fix - Right-Size Using Historical Data
The right-sizer script you ran in Part 3 queried the 95th-percentile CPU and memory usage over the past 7 days. The RIGHT-SIZING RECOMMENDATIONS table and Apply with: command at the bottom of its output are your starting point — your numbers will differ from anyone else’s, and that is the point. The right-sizer is reporting the truth about your specific workload history.
|
Why 95th percentile? The right-sizer bases recommendations on P95 of historical usage — not the peak, not the median:
A cluster that has seen heavy test traffic today will show higher P95 values than one running only baseline load. Both answers are correct for their environment. |
Run the right-sizer with APPLY=true — it queries Prometheus, prints the recommendations, and immediately applies the oc set resources command and waits for the rollout to complete:
NAMESPACE=capacity-workshop APPLY=true ~/module-03/resource-right-sizer.sh
The script will print the RIGHT-SIZING RECOMMENDATIONS table, apply the resources, wait for the rollout, and confirm the QoS class — all in one step. You should see Burstable at the end, confirming the pod has a request lower than its limit.
|
To achieve Guaranteed QoS, requests must equal limits:
Use Guaranteed for latency-critical services that cannot tolerate throttling or eviction. For most applications, Burstable is the right choice — it allows CPU bursts while still giving the scheduler an accurate picture of your baseline needs. |
Experiment: What request value breaks HPA?
Before continuing to Part 5, try deliberately setting bad request values. Understanding the failure mode is more valuable than getting the right answer on the first try. You will create the HPA in Part 5 — keep these numbers in mind when you see the TARGETS percentage there.
Set a request that is far too high — HPA will never fire even under full stress:
oc set resources deployment load-generator -n capacity-workshop \
--requests=cpu=500m --limits=cpu=500m
Wait for the pod to restart with the new request, then check actual consumption:
oc rollout status deployment/load-generator -n capacity-workshop --timeout=60s
oc adm top pods -n capacity-workshop
With a 500m request, even heavy usage will look like a tiny percentage to HPA — it will never scale out.
Then set a request that is far too low — HPA will fire at idle:
oc set resources deployment load-generator -n capacity-workshop \
--requests=cpu=1m --limits=cpu=500m
Wait for the rollout, then check consumption again:
oc rollout status deployment/load-generator -n capacity-workshop --timeout=60s
oc adm top pods -n capacity-workshop
With a 1m request, even idle usage will produce an enormous percentage — HPA will scale to max immediately.
When ready to continue, reset to the right-sizer recommendation:
NAMESPACE=capacity-workshop APPLY=true ~/module-03/resource-right-sizer.sh
Part 5: The Scale - Implement HPA
With an accurate request in place, the HPA controller has a meaningful baseline to work from.
|
The Formula That Drives HPA:
HPA asks: "How many copies would I need so that each one runs at exactly the target utilization?" The request value is the denominator — change it and you change everything:
This is why right-sizing matters: it makes the formula meaningful. |
Create an HPA targeting 75% CPU utilization:
oc autoscale deployment load-generator -n capacity-workshop \
--min=1 --max=5 --cpu 75%
oc get hpa -n capacity-workshop
The TARGETS column may show <unknown> for ~30 seconds while the metrics server warms up — that is normal. Re-run until you see a real percentage. Note that number: it reflects your request value and your current load.
Now apply sustained stress:
oc set env deployment/load-generator -n capacity-workshop TARGET_RPS=500
In stress mode (TARGET_RPS≥500) the load-generator runs wrk continuously with 8 threads and 500 connections. Wait 90 seconds for the pod to roll out and the HPA to observe a full averaging window, then check:
oc get hpa -n capacity-workshop load-generator
Now check actual CPU consumption alongside the HPA reading:
oc adm top pods -n capacity-workshop
Compare the CPU(cores) value against the HPA TARGETS percentage — the formula current_cpu / request is exactly what produces that number.
|
Interpret your own numbers using the formula: The TARGETS column shows
The result depends entirely on your P95-sized request. A right-sized request tracks your actual workload closely, so the percentage may stay below 75% — which means the HPA correctly decided no scale-out was needed. That is not a failure; it is the formula working as designed.
Does the REPLICAS count you observe match what the formula predicts? If REPLICAS is still 1 and TARGETS is above 75%, wait another 30–60 seconds — HPA has a built-in stabilization window to prevent thrashing. |
Experiment: Tune the request, observe the HPA response
This is the primary HPA demonstration. The most effective way to build intuition is to deliberately change the request and predict the outcome before you run oc get hpa. Try each of these in order:
# Step A: Tiny request → HPA fires immediately at extreme percentage
oc set resources deployment load-generator -n capacity-workshop \
--requests=cpu=2m --limits=cpu=500m
Wait for the rollout to complete before reading the HPA — the new pod must be running with the new request value before the HPA percentage is meaningful:
oc rollout status deployment/load-generator -n capacity-workshop --timeout=60s
oc get hpa -n capacity-workshop load-generator
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
load-generator Deployment/load-generator cpu: 333%/75% 1 5 5 ...
Now cross-check the actual live CPU consumption against the HPA percentage — this is what drives the formula:
oc adm top pods -n capacity-workshop
The CPU(cores) column shows raw usage. Divide it by the request you just set (2m) to see why the HPA percentage is so extreme — if the pod is using, say, 80m CPU with a 2m request, that is 40× the request, which appears as ~4000% in TARGETS.
# Step B: Large request → HPA goes silent, percentage near zero
oc set resources deployment load-generator -n capacity-workshop \
--requests=cpu=500m --limits=cpu=500m
Wait for the rollout, then check:
oc rollout status deployment/load-generator -n capacity-workshop --timeout=60s
oc get hpa -n capacity-workshop load-generator
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
load-generator Deployment/load-generator cpu: 0%/75% 1 5 1 ...
Check actual usage again — notice that the raw CPU(cores) number is similar to Step A, but the HPA percentage has collapsed:
oc adm top pods -n capacity-workshop
The pod is doing the same work as before, but a 500m request makes even 80m of real usage look like only ~16%. The formula is unchanged; only the denominator (the request value) changed.
For each change, write down your prediction before running oc get hpa — given this request and the current CPU usage, what utilization percentage do you expect? Then check if you were right. That prediction skill — reasoning about how request values affect scaling decisions — is what this lab is building.
|
When you have finished exploring, reset |
NAMESPACE=capacity-workshop APPLY=true ~/module-03/resource-right-sizer.sh
Confirm the pod load returned to the expected range after the right-sized request is restored:
oc adm top pods -n capacity-workshop
The CPU(cores) value here is your actual workload baseline — it should produce a HPA TARGETS percentage close to what you observed when you first created the HPA earlier in this part.
Part 6: Debug OOMKilled Events
Run the OOM demo script — it deploys a pod with a 4Mi memory limit that immediately tries to allocate 100MB, waits for the kernel to kill it, then walks through the standard debugging workflow:
NAMESPACE=capacity-workshop ~/module-03/oom-demo.sh
══════════════════════════════════════════════════
Module 3 — OOM Kill Demo
══════════════════════════════════════════════════
[INFO] Namespace : capacity-workshop
[INFO] Pod name : oom-demo
[INFO] Step 1/4 — Deploying oom-demo pod (4Mi memory limit) …
[OK] Pod created — it will try to allocate 100MB against a 4Mi limit.
[INFO] Step 2/4 — Waiting for OOMKill (up to 30s) …
[OK] OOMKilled confirmed after ~3s.
[INFO] Step 3/4 — Checking OOMKilling events …
LAST SEEN TYPE REASON OBJECT MESSAGE
3s Warning OOMKilling Pod/oom-demo Memory limit reached, killed process ...
[INFO] Step 4/4 — Inspecting last termination state …
Exit code : 137 (128 + 9 SIGKILL = 137 means OOMKilled)
Memory limit: 4Mi (the container was killed for exceeding this)
[OK] This is the same signature you will see on real production OOMKills.
Standard debug workflow:
1. oc get events -n <namespace> --field-selector reason=OOMKilling
2. oc get pod <pod> -o jsonpath='{.status.containerStatuses[0].lastState.terminated}'
3. NAMESPACE=<ns> POD_SELECTOR=<pod> ~/module-03/resource-right-sizer.sh ← P95 memory
4. Raise the memory limit by the P95 + 50% buffer the right-sizer recommends.
[INFO] Cleaning up oom-demo pod …
[OK] Done. oom-demo pod removed.
|
OOMKilled Troubleshooting Checklist:
Exit Code 137 always means OOMKilled (128 + 9 SIGKILL signal). |
Lab 3 Summary: Developer Best Practices
You’ve now experienced:
✅ QoS classes (Guaranteed, Burstable, BestEffort) ✅ CPU throttling impact on application performance ✅ Right-sizing requests using Prometheus 95th percentile ✅ HPA configuration with accurate targets ✅ OOMKilled debugging
The Developer Social Contract
|
Resource Requests Are Not Restrictions: Many developers resist setting requests because they feel like "quotas" or "limits." This module proves the opposite:
Accurate requests are a contract between you and the infrastructure team. Without them:
|
Key Takeaways
-
BestEffort QoS (no requests) = First to die under node pressure
-
CPU throttling is invisible to apps but destroys performance
-
Use Prometheus 95th percentile for right-sizing, not guesswork
-
HPA requires accurate requests to scale correctly
-
OOMKilled = Exit Code 137, check memory limits vs actual usage
Next Steps
In Module 4: Infrastructure Track - Fleet Architecture & Sizing, we shift to the platform engineer perspective. You’ll learn node density math, etcd limits, and when to split clusters versus grow them.
The skills from this module (accurate requests) directly feed into Module 4’s capacity calculations. If requests are wrong, density calculations are wrong, and you’ll buy the wrong amount of infrastructure.
Further reading
-
Kubernetes requests and limits: why both teams are right — Red Hat blog post expanding on QoS classes, CPU throttling, P95 right-sizing, and the HPA formula covered in this module.