Module 3: Developer Track - The "Zero Request" Myth

Duration: 90 minutes

Learning Objectives

By the end of this module, you will be able to:

Debug "OOMKilled" events and CPU Throttling in production
Understand Kubernetes QoS Classes (Guaranteed, Burstable, BestEffort)
Implement Horizontal Pod Autoscaler (HPA) using right-sized metrics
Use historical Prometheus data to accurately size resource requests

Understanding Quality of Service (QoS)

Many developers feel that requiring CPU/Memory requests is "hacky" or restrictive. This module proves why Zero-Requests are dangerous, especially on bare-metal or constrained environments.

The Three QoS Classes

Kubernetes assigns every pod to one of three QoS classes based on its resource configuration:

QoS Class Definition Eviction Priority Use Case

QoS Class	Definition	Eviction Priority	Use Case
Guaranteed	`requests == limits` for all containers	Lowest (protected)	Critical production workloads
Burstable	Has requests, but `requests < limits`	Medium	Most production apps
BestEffort	NO requests or limits	Highest (first to die)	Dev/test only

Guaranteed

requests == limits for all containers

Lowest (protected)

Critical production workloads

Burstable

Has requests, but requests < limits

Medium

Most production apps

BestEffort

NO requests or limits

Highest (first to die)

Dev/test only

The Danger of BestEffort:

When a node runs out of memory, Kubernetes evicts pods in this order:

BestEffort pods (no requests) - killed first
Burstable pods exceeding requests
Guaranteed pods (only if node is critically low)

If you deploy without requests, your pod is first in line for termination.

How to Check Your Pod’s QoS Class

oc get pod -n capacity-workshop -l app=besteffort-app -o jsonpath='{.items[0].status.qosClass}'

Sample Output

Burstable

Why does besteffort-app show Burstable, not BestEffort?

The capacity-workshop namespace has a ResourceQuota that enforces requests.cpu, limits.cpu, requests.memory, and limits.memory. When a quota tracks those fields, Kubernetes rejects any pod that omits them entirely. The besteffort-app therefore carries deliberately minimal resources (10m CPU / 16Mi memory request, 100m CPU / 64Mi memory limit) so it can be admitted. Because requests < limits, Kubernetes classifies it as Burstable, not BestEffort.

This is intentional in production clusters. Platform engineers use ResourceQuota to prevent true BestEffort pods from silently starving other workloads. The besteffort-app illustrates the near-BestEffort scenario: it is Burstable but its requests are so low that it will be the first Burstable pod evicted under memory pressure. You can still observe a true BestEffort pod in a scratch namespace that has no quota:

oc new-project scratch-qos-demo
oc run besteffort-test --image=quay.io/prometheus/busybox:latest -- sleep 3600
oc get pod besteffort-test -n scratch-qos-demo -o jsonpath='{.status.qosClass}'
oc delete project scratch-qos-demo

Now check a properly configured pod:

oc get pod -n capacity-workshop -l app=guaranteed-app -o jsonpath='{.items[0].status.qosClass}'

Sample Output

Guaranteed

Lab 3: The Throttling Simulator

All terminal commands in this lab run on your student cluster. If your SSH session ended, reconnect before continuing:

ssh lab-user@

SSH password:

After connecting, log in to your student cluster:

oc login  -u kubeadmin -p  \
  --insecure-skip-tls-verify=true

In this hands-on lab, you’ll experience the pain of zero-request deployments, then fix them using Prometheus historical data.

Part 1: The Trap - Deploy Without Requests

We’ve pre-deployed a load generator in your capacity-workshop namespace. Let’s examine its configuration:

oc get deployment load-generator -n capacity-workshop -o yaml | grep -A 10 "resources:"

Sample Output

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 256Mi

This is a Burstable pod (requests < limits). Now let’s look at the besteffort-app:

oc get deployment besteffort-app -n capacity-workshop -o yaml | grep -A 5 "resources:"

Sample Output

resources:
  requests:
    cpu: 10m
    memory: 16Mi
  limits:
    cpu: 100m
    memory: 64Mi

Requests and limits are set, but intentionally minimal — 10m CPU and 16Mi memory requests are far below what a real workload needs. Because requests < limits, this pod has a Burstable QoS class, not BestEffort. Under memory pressure it sits near the bottom of the eviction stack, just above a true BestEffort pod — making it one of the first Burstable pods to be evicted.

Part 2: The Squeeze - Launch a Noisy Neighbor

Now we’ll create a "noisy neighbor" pod that consumes excessive CPU, forcing the scheduler to throttle other pods.

First, check current CPU usage:

oc adm top pods -n capacity-workshop

Sample Output

NAME                              CPU(cores)   MEMORY(bytes)
besteffort-app-5598f76d5b-9jtz4   1m           12Mi
load-generator-67567c576-hdvkj    102m         64Mi

Now scale up the noisy-neighbor deployment (currently at 0 replicas):

oc scale deployment noisy-neighbor -n capacity-workshop --replicas=1

Wait 30 seconds for it to start consuming CPU, then check again:

oc adm top pods -n capacity-workshop

Sample Output

NAME                              CPU(cores)   MEMORY(bytes)
besteffort-app-5598f76d5b-9jtz4   1m           12Mi
load-generator-67567c576-hdvkj    45m          64Mi   ← THROTTLED!
noisy-neighbor-7d9c8b6f5d-xk2p9   980m         128Mi  ← Consuming everything

What Just Happened?

The load-generator dropped from 102m CPU to 45m CPU (56% performance degradation) because:

noisy-neighbor has no CPU limit
It’s consuming all available CPU on the node
load-generator is being CPU throttled by the Linux kernel

Real-World Impact: If this were a production API, response times would spike from 50ms to 500ms+, causing user-facing outages.

Download Lab Scripts

Download the right-sizer script, kube-burner, and the stress configs before continuing.

mkdir -p ~/module-03 && cd ~/module-03
BASE=https://raw.githubusercontent.com/tosin2013/capacity-planning-lab-guide/main
curl -fsSO $BASE/content/modules/ROOT/examples/module-03/resource-right-sizer.sh
curl -fsSO $BASE/content/modules/ROOT/examples/module-03/stress-config.yaml
curl -fsSO $BASE/content/modules/ROOT/examples/module-03/stress-pod.yaml
curl -fsSO $BASE/content/modules/ROOT/examples/module-03/oom-demo.sh
chmod +x resource-right-sizer.sh oom-demo.sh
ls -lh

Sample Output

-rwxr-xr-x. 1 lab-user lab-user 3.2K Apr 20 oom-demo.sh
-rwxr-xr-x. 1 lab-user lab-user 4.1K Apr 20 resource-right-sizer.sh
-rw-r--r--. 1 lab-user lab-user  512 Apr 20 stress-config.yaml
-rw-r--r--. 1 lab-user lab-user  480 Apr 20 stress-pod.yaml

Now download kube-burner — a Kubernetes workload generator used to deploy the throttling stress scenario:

cd ~/module-03
curl -sL https://github.com/kube-burner/kube-burner/releases/download/v2.6.1/kube-burner-V2.6.1-linux-x86_64.tar.gz \
  | tar -xz kube-burner
chmod +x kube-burner
./kube-burner version

Sample Output

Version: 2.6.1
Git Commit: ...
Build Date: ...

Part 3: Observe CPU Throttling Metrics

CFS (Completely Fair Scheduler) throttling only fires when a container tries to use more than its own CPU limit — it is not caused by noisy neighbors. To generate a real, Prometheus-visible throttling event, you will use kube-burner to deploy a dedicated stress pod that has a tight CPU limit (200m) and immediately saturates it.

cd ~/module-03
./kube-burner init -c stress-config.yaml

Sample Output

INFO[...] 📁 Creating object cpu-throttle-demo-1-1 in namespace capacity-workshop
INFO[...] ✅ Job cpu-throttle-demo completed

Verify the stress pod is running and already hitting its CPU limit:

oc get pod -n capacity-workshop -l app=cpu-throttle-demo
oc adm top pod -n capacity-workshop -l app=cpu-throttle-demo

Sample Output

NAME                        READY   STATUS    RESTARTS   AGE
cpu-throttle-demo-1-1       1/1     Running   0          15s

NAME                    CPU(cores)   MEMORY(bytes)
cpu-throttle-demo-1-1   200m         2Mi            ← Capped at limit

Wait 90 seconds for Prometheus to scrape the throttling metric, then run the right-sizer against the stress pod:

sleep 90
POD_SELECTOR="cpu-throttle-demo.*" NAMESPACE=capacity-workshop ~/module-03/resource-right-sizer.sh

Sample Output

══════════════════════════════════════════════════
  Module 3 — Resource Right-Sizer
══════════════════════════════════════════════════
[INFO]  Namespace    : capacity-workshop
[INFO]  Pod selector : cpu-throttle-demo.*
[INFO]  Window       : 7 days

[INFO]  Step 1/4 — Locating Prometheus route …
[OK]    Prometheus : https://prometheus-k8s-openshift-monitoring.apps.student.example.com

[INFO]  Step 2/4 — Minting Prometheus service-account token …
[OK]    Token acquired (1287 chars)

[INFO]  Step 3/4 — Measuring CPU throttling rate …
[OK]    Throttling rate : 0.810  (81.0% of CPU time throttled — SEVERE)
[WARN]  Throttling above 5% indicates the CPU limit is too low for this workload.
        Kernel CFS is pausing the container to stay within its CPU limit.

[INFO]  Step 4/4 — Calculating 95th-percentile resource usage (1d window) …

  ┌────────────────────────────────────────────────────────────┐
  │              RIGHT-SIZING RECOMMENDATIONS                  │
  ├────────────────────────────────────────────────────────────┤
  │ P95 CPU usage (raw)       :    200.0m cores                │
  │ P95 Memory usage (raw)    :      2.0Mi                     │
  ├────────────────────────────────────────────────────────────┤
  │ Recommended CPU request   :    200m  (= P95, rounded up)   │
  │ Recommended CPU limit     :    400m  (= 2× request)        │
  │ Recommended Memory request:      8Mi  (= P95 + 20% buffer) │
  │ Recommended Memory limit  :      8Mi  (= P95 + 50% buffer) │
  └────────────────────────────────────────────────────────────┘

Now run the right-sizer against your actual load-generator to get its right-sizing recommendation for Part 4:

NAMESPACE=capacity-workshop ~/module-03/resource-right-sizer.sh

Throttling Explained:

When a container tries to use more CPU than its limit, the Linux CFS (Completely Fair Scheduler) throttles it by pausing the process periodically. This is a per-container mechanism — it has nothing to do with noisy neighbors.

0.0 = No throttling (healthy)
0.5 = 50% of CPU time is paused (severe degradation)
1.0 = 100% throttled (essentially frozen)

The stress pod above is trying to use ~1000m CPU but is capped at 200m — the kernel throttles it to 80%+ of its allowed time.

This is invisible to application developers unless they’re monitoring kernel metrics!

Part 4: The Fix - Right-Size Using Historical Data

The right-sizer script you ran in Part 3 queried the 95th-percentile CPU and memory usage over the past 7 days. The RIGHT-SIZING RECOMMENDATIONS table and Apply with: command at the bottom of its output are your starting point — your numbers will differ from anyone else’s, and that is the point. The right-sizer is reporting the truth about your specific workload history.

Why 95th percentile?

The right-sizer bases recommendations on P95 of historical usage — not the peak, not the median:

99th percentile: captures rare spikes → over-provisioning, wasted capacity
50th percentile (median): too aggressive → frequent throttling under normal load
95th percentile: covers 95% of real workload patterns while ignoring outliers

A cluster that has seen heavy test traffic today will show higher P95 values than one running only baseline load. Both answers are correct for their environment.

Run the right-sizer with APPLY=true — it queries Prometheus, prints the recommendations, and immediately applies the oc set resources command and waits for the rollout to complete:

NAMESPACE=capacity-workshop APPLY=true ~/module-03/resource-right-sizer.sh

The script will print the RIGHT-SIZING RECOMMENDATIONS table, apply the resources, wait for the rollout, and confirm the QoS class — all in one step. You should see Burstable at the end, confirming the pod has a request lower than its limit.

To achieve Guaranteed QoS, requests must equal limits:

oc set resources deployment critical-app -n capacity-workshop \
  --requests=cpu=200m,memory=256Mi \
  --limits=cpu=200m,memory=256Mi

Use Guaranteed for latency-critical services that cannot tolerate throttling or eviction. For most applications, Burstable is the right choice — it allows CPU bursts while still giving the scheduler an accurate picture of your baseline needs.

Experiment: What request value breaks HPA?

Before continuing to Part 5, try deliberately setting bad request values. Understanding the failure mode is more valuable than getting the right answer on the first try. You will create the HPA in Part 5 — keep these numbers in mind when you see the TARGETS percentage there.

Set a request that is far too high — HPA will never fire even under full stress:

oc set resources deployment load-generator -n capacity-workshop \
  --requests=cpu=500m --limits=cpu=500m

Wait for the pod to restart with the new request, then check actual consumption:

oc rollout status deployment/load-generator -n capacity-workshop --timeout=60s
oc adm top pods -n capacity-workshop

With a 500m request, even heavy usage will look like a tiny percentage to HPA — it will never scale out.

Then set a request that is far too low — HPA will fire at idle:

oc set resources deployment load-generator -n capacity-workshop \
  --requests=cpu=1m --limits=cpu=500m

Wait for the rollout, then check consumption again:

oc rollout status deployment/load-generator -n capacity-workshop --timeout=60s
oc adm top pods -n capacity-workshop

With a 1m request, even idle usage will produce an enormous percentage — HPA will scale to max immediately.

When ready to continue, reset to the right-sizer recommendation:

NAMESPACE=capacity-workshop APPLY=true ~/module-03/resource-right-sizer.sh

Part 5: The Scale - Implement HPA

With an accurate request in place, the HPA controller has a meaningful baseline to work from.

The Formula That Drives HPA:

Desired Replicas = ceil( Current Replicas × (Current CPU / CPU Request) / Target% )

HPA asks: "How many copies would I need so that each one runs at exactly the target utilization?"

The request value is the denominator — change it and you change everything:

Request too low → percentage is always huge → HPA scales to max immediately (flapping)
Request too high → percentage is always tiny → HPA ignores real load
Request matches actual usage → percentage tracks real load → HPA fires at the right moment

This is why right-sizing matters: it makes the formula meaningful.

Create an HPA targeting 75% CPU utilization:

oc autoscale deployment load-generator -n capacity-workshop \
  --min=1 --max=5 --cpu 75%

oc get hpa -n capacity-workshop

The TARGETS column may show <unknown> for ~30 seconds while the metrics server warms up — that is normal. Re-run until you see a real percentage. Note that number: it reflects your request value and your current load.

Now apply sustained stress:

oc set env deployment/load-generator -n capacity-workshop TARGET_RPS=500

In stress mode (TARGET_RPS≥500) the load-generator runs wrk continuously with 8 threads and 500 connections. Wait 90 seconds for the pod to roll out and the HPA to observe a full averaging window, then check:

oc get hpa -n capacity-workshop load-generator

Now check actual CPU consumption alongside the HPA reading:

oc adm top pods -n capacity-workshop

Compare the CPU(cores) value against the HPA TARGETS percentage — the formula current_cpu / request is exactly what produces that number.

Interpret your own numbers using the formula:

The TARGETS column shows <actual%>/<threshold%>. Plug your values into:

Desired Replicas = ceil( Current Replicas × actual% / target% )

The result depends entirely on your P95-sized request. A right-sized request tracks your actual workload closely, so the percentage may stay below 75% — which means the HPA correctly decided no scale-out was needed. That is not a failure; it is the formula working as designed.

40%/75% → ceil(1 × 0.53) = 1 replica — workload fits, no scale needed
140%/75% → ceil(1 × 1.87) = 2 replicas — workload exceeds one copy’s capacity
333%/75% → ceil(1 × 4.44) = 5 replicas — request too small, HPA flaps to max

Does the REPLICAS count you observe match what the formula predicts? If REPLICAS is still 1 and TARGETS is above 75%, wait another 30–60 seconds — HPA has a built-in stabilization window to prevent thrashing.

Experiment: Tune the request, observe the HPA response

This is the primary HPA demonstration. The most effective way to build intuition is to deliberately change the request and predict the outcome before you run oc get hpa. Try each of these in order:

# Step A: Tiny request → HPA fires immediately at extreme percentage
oc set resources deployment load-generator -n capacity-workshop \
  --requests=cpu=2m --limits=cpu=500m

Wait for the rollout to complete before reading the HPA — the new pod must be running with the new request value before the HPA percentage is meaningful:

oc rollout status deployment/load-generator -n capacity-workshop --timeout=60s
oc get hpa -n capacity-workshop load-generator

Expected: TARGETS well above 75%, REPLICAS scales toward max (5)

NAME             REFERENCE                   TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
load-generator   Deployment/load-generator   cpu: 333%/75%   1         5         5          ...

Now cross-check the actual live CPU consumption against the HPA percentage — this is what drives the formula:

oc adm top pods -n capacity-workshop

The CPU(cores) column shows raw usage. Divide it by the request you just set (2m) to see why the HPA percentage is so extreme — if the pod is using, say, 80m CPU with a 2m request, that is 40× the request, which appears as ~4000% in TARGETS.

# Step B: Large request → HPA goes silent, percentage near zero
oc set resources deployment load-generator -n capacity-workshop \
  --requests=cpu=500m --limits=cpu=500m

Wait for the rollout, then check:

oc rollout status deployment/load-generator -n capacity-workshop --timeout=60s
oc get hpa -n capacity-workshop load-generator

Expected: TARGETS near 0%, REPLICAS scales back down to min (1)

NAME             REFERENCE                   TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
load-generator   Deployment/load-generator   cpu: 0%/75%   1         5         1          ...

Check actual usage again — notice that the raw CPU(cores) number is similar to Step A, but the HPA percentage has collapsed:

oc adm top pods -n capacity-workshop

The pod is doing the same work as before, but a 500m request makes even 80m of real usage look like only ~16%. The formula is unchanged; only the denominator (the request value) changed.

For each change, write down your prediction before running oc get hpa — given this request and the current CPU usage, what utilization percentage do you expect? Then check if you were right. That prediction skill — reasoning about how request values affect scaling decisions — is what this lab is building.

When you have finished exploring, reset load-generator back to your right-sized values before continuing to Part 6. The right-sizer re-applies the CPU and memory values it recommended in Part 4. If you skip this, Part 6 will not behave as expected.

NAMESPACE=capacity-workshop APPLY=true ~/module-03/resource-right-sizer.sh

Confirm the pod load returned to the expected range after the right-sized request is restored:

oc adm top pods -n capacity-workshop

The CPU(cores) value here is your actual workload baseline — it should produce a HPA TARGETS percentage close to what you observed when you first created the HPA earlier in this part.

Part 6: Debug OOMKilled Events

Run the OOM demo script — it deploys a pod with a 4Mi memory limit that immediately tries to allocate 100MB, waits for the kernel to kill it, then walks through the standard debugging workflow:

NAMESPACE=capacity-workshop ~/module-03/oom-demo.sh

Sample Output

══════════════════════════════════════════════════
  Module 3 — OOM Kill Demo
══════════════════════════════════════════════════
[INFO]  Namespace : capacity-workshop
[INFO]  Pod name  : oom-demo

[INFO]  Step 1/4 — Deploying oom-demo pod (4Mi memory limit) …
[OK]    Pod created — it will try to allocate 100MB against a 4Mi limit.

[INFO]  Step 2/4 — Waiting for OOMKill (up to 30s) …
[OK]    OOMKilled confirmed after ~3s.

[INFO]  Step 3/4 — Checking OOMKilling events …
LAST SEEN   TYPE      REASON      OBJECT         MESSAGE
3s          Warning   OOMKilling  Pod/oom-demo   Memory limit reached, killed process ...

[INFO]  Step 4/4 — Inspecting last termination state …

  Exit code   : 137  (128 + 9 SIGKILL = 137 means OOMKilled)
  Memory limit: 4Mi  (the container was killed for exceeding this)

[OK]    This is the same signature you will see on real production OOMKills.

  Standard debug workflow:
  1. oc get events -n <namespace> --field-selector reason=OOMKilling
  2. oc get pod <pod> -o jsonpath='{.status.containerStatuses[0].lastState.terminated}'
  3. NAMESPACE=<ns> POD_SELECTOR=<pod> ~/module-03/resource-right-sizer.sh  ← P95 memory
  4. Raise the memory limit by the P95 + 50% buffer the right-sizer recommends.

[INFO]  Cleaning up oom-demo pod …
[OK]    Done. oom-demo pod removed.

OOMKilled Troubleshooting Checklist:

Check memory limit: oc get pod <pod> -o jsonpath='{.spec.containers[0].resources.limits.memory}'
Check actual usage before kill: Run the right-sizer script with the pod name to see historical memory usage:
```
NAMESPACE=capacity-workshop POD_SELECTOR="<pod-name>" ~/module-03/resource-right-sizer.sh
```
Increase memory limit if usage legitimately exceeds limit
Investigate memory leak if usage grows unbounded over time

Exit Code 137 always means OOMKilled (128 + 9 SIGKILL signal).

Lab 3 Summary: Developer Best Practices

You’ve now experienced:

✅ QoS classes (Guaranteed, Burstable, BestEffort) ✅ CPU throttling impact on application performance ✅ Right-sizing requests using Prometheus 95th percentile ✅ HPA configuration with accurate targets ✅ OOMKilled debugging

The Developer Social Contract

Resource Requests Are Not Restrictions:

Many developers resist setting requests because they feel like "quotas" or "limits." This module proves the opposite:

Requests = Promise to Scheduler: "I need this much to run"
Limits = Safety Net: "Don’t let me consume more than this"
No Requests = Russian Roulette: "Kill me first when the node is full"

Accurate requests are a contract between you and the infrastructure team. Without them:

The scheduler is blind (poor placement decisions)
Your pods are first to evict
HPA doesn’t work correctly
Forecasting is impossible

Key Takeaways

BestEffort QoS (no requests) = First to die under node pressure
CPU throttling is invisible to apps but destroys performance
Use Prometheus 95th percentile for right-sizing, not guesswork
HPA requires accurate requests to scale correctly
OOMKilled = Exit Code 137, check memory limits vs actual usage

Next Steps

In Module 4: Infrastructure Track - Fleet Architecture & Sizing, we shift to the platform engineer perspective. You’ll learn node density math, etcd limits, and when to split clusters versus grow them.

The skills from this module (accurate requests) directly feed into Module 4’s capacity calculations. If requests are wrong, density calculations are wrong, and you’ll buy the wrong amount of infrastructure.

Module 3: Developer Track - The "Zero Request" Myth

Learning Objectives

Understanding Quality of Service (QoS)

The Three QoS Classes

How to Check Your Pod’s QoS Class

Lab 3: The Throttling Simulator

Part 1: The Trap - Deploy Without Requests

Part 2: The Squeeze - Launch a Noisy Neighbor

Download Lab Scripts

Part 3: Observe CPU Throttling Metrics

Part 4: The Fix - Right-Size Using Historical Data

Experiment: What request value breaks HPA?

Part 5: The Scale - Implement HPA

Experiment: Tune the request, observe the HPA response

Part 6: Debug OOMKilled Events

Lab 3 Summary: Developer Best Practices

The Developer Social Contract

Key Takeaways

Next Steps

Further reading