Module 2: The Mathematics of Forecasting

Duration: 60 minutes

Learning Objectives

By the end of this module, you will be able to:

  • Understand why linear CPU/Memory trending fails for microservices architectures

  • Calculate "Pod Velocity" using Prometheus metrics

  • Build custom forecasting dashboards in Red Hat Advanced Cluster Management (RHACM)

  • Project quarterly node requirements based on deployment patterns

Why Linear Models Fail

Traditional capacity planning uses linear extrapolation:

If CPU usage grows 5% per month, in 12 months we'll need 60% more CPU.

This breaks for microservices because:

  • New services deploy in bursts (not linear growth)

  • Resource requests vary wildly per service

  • Seasonal traffic patterns create spikes

  • Deployment velocity (new pods/month) matters more than raw CPU trends

The Pod Velocity Model

Instead of tracking CPU percentage, track how many pods are being deployed over time:

Pod Velocity = Rate of pod creation over time

If your organization deploys 50 new microservices per quarter, and each requires an average of 3 replicas with 200m CPU each, you need:

50 services × 3 replicas × 200m CPU = 30,000m CPU (30 cores) per quarter

This is far more accurate than linear CPU trending because it accounts for workload growth patterns.

The Formula:

Quarterly Node Requirement = (Pod Velocity × Avg Replicas × Avg CPU Request) / Node Allocatable CPU

Example: - Pod Velocity: 50 new services/quarter - Avg Replicas: 3 - Avg CPU Request: 200m (0.2 cores) - Node CPU: 8 cores allocatable

(50 × 3 × 0.2) / 8 = 30 / 8 = 3.75 nodes per quarter

Round up → You need 4 new worker nodes every quarter.

Lab 2: The Pod Velocity Calculator & ACM Dashboard

All terminal commands in this lab run on your student cluster. If your SSH session ended, reconnect before continuing:

ssh lab-user@

SSH password:

In this lab, you’ll use purpose-built scripts to query Prometheus directly from the command line, run the forecasting formula automatically, and create a centralized RHACM dashboard — without having to navigate any browser UIs until the final step.

Download Lab Scripts

Download both lab scripts at the start. They are used throughout this module.

mkdir -p ~/module-02 && cd ~/module-02
BASE=https://raw.githubusercontent.com/tosin2013/capacity-planning-lab-guide/main
curl -fsSO $BASE/content/modules/ROOT/examples/module-02/pod-velocity-calculator.sh
curl -fsSO $BASE/content/modules/ROOT/examples/module-02/create-acm-dashboard.sh
curl -fsSO $BASE/content/modules/ROOT/examples/module-02/module-2---pod-velocity-forecast.yaml
chmod +x *.sh
ls -lh
Sample Output
-rwxr-xr-x. 1 lab-user lab-user  11K Apr 20 create-acm-dashboard.sh
-rw-r--r--. 1 lab-user lab-user 4.2K Apr 20 module-2---pod-velocity-forecast.yaml
-rwxr-xr-x. 1 lab-user lab-user 8.8K Apr 20 pod-velocity-calculator.sh

Part 1: Run the Pod Velocity Calculator

The pod-velocity-calculator.sh script does everything Parts 1-5 used to require of you manually — it discovers Prometheus automatically, mints an auth token, runs all four PromQL queries, and prints the forecasting result.

cd ~/module-02
NAMESPACE=capacity-workshop ./pod-velocity-calculator.sh
Sample Output
══════════════════════════════════════════════════
  Module 2 — Pod Velocity Calculator
══════════════════════════════════════════════════
[INFO]  Namespace : capacity-workshop
[INFO]  Window    : 30 days
[INFO]  Node CPU  : 8 cores (allocatable)

[INFO]  Step 1/5 — Locating Prometheus route …
[OK]    Prometheus : https://prometheus-k8s-openshift-monitoring.apps.student.student-01.sandbox5388.opentlc.com

[INFO]  Step 2/5 — Minting Prometheus service-account token …
[OK]    Token acquired (1287 chars)

[INFO]  Step 3/5 — Calculating pod velocity …
[OK]    Pods started in last 30d  : 6
[OK]    Pod velocity              : 0.2 pods/day

[INFO]  Step 4/5 — Querying average resource requests …
[OK]    Active deployments              : 6
[OK]    Avg replicas per deployment     : 1.0
[OK]    Avg CPU request per container   : 167m  (0.167  cores)
[OK]    Avg Memory request per container: 192Mi (0.188  GiB)

  ┌─────────────────────────────────────────────────────┐
  │           QUARTERLY FORECASTING MODEL               │
  ├─────────────────────────────────────────────────────┤
  │ Pod velocity (last 30d)      :     0.20 pods/day    │
  │ Quarterly new pods (×90)     :     18.0             │
  │ Avg CPU request/container    :      167m            │
  │ Total CPU needed (new)       :     3.01 cores       │
  │ Node allocatable CPU         :        8 cores/node  │
  ├─────────────────────────────────────────────────────┤
  │ Nodes needed next quarter    :     0.38  → ceil = 1 │
  └─────────────────────────────────────────────────────┘

  RESULT: Add 1 worker node(s) this quarter to accommodate forecasted growth.

Understanding what the script queries:

  1. Pod velocity — pods whose kube_pod_start_time falls within the last 30 days, divided by 30 to get pods/day

  2. Deployment countkube_deployment_spec_replicas (how many Deployments exist)

  3. Avg CPU requestkube_pod_container_resource_requests{resource="cpu"} averaged across all containers

  4. Avg memory request — same metric for memory, converted to GiB

  5. Node forecast(velocity × 90 days × avg_cpu) / node_allocatable_cpu, rounded up

Want to see the raw PromQL? Open pod-velocity-calculator.sh in your editor.

Part 2: Interpret Your Numbers

Look at the output from the calculator. Answer these questions for your own environment:

Metric Your Value

Pod velocity (pods/day)

Quarterly new pods projected

Avg CPU request per container

Nodes needed next quarter

Forecasting Pitfall:

This model assumes deployment cadence stays constant. It will overestimate if your cluster is still being bootstrapped (many deployments all at once), and underestimate during growth phases. Always combine model output with team-level deployment plans.

Part 3: Adjust for Your Node Size

Rerun the calculator with the actual allocatable CPU for your worker nodes:

NODE_ALLOC_RAW=$(oc get node -l node-role.kubernetes.io/worker \
  -o jsonpath='{.items[0].status.allocatable.cpu}')
# Convert millicores (e.g. 7500m) to cores (7.5); leave whole-core values unchanged
NODE_ALLOC=$(python3 -c "
v='$NODE_ALLOC_RAW'
if v.endswith('m'):
    print(round(int(v[:-1]) / 1000, 2))
else:
    print(round(float(v), 2))
")
echo "Worker node allocatable CPU: $NODE_ALLOC cores"

Then rerun with the real value:

cd ~/module-02
NAMESPACE=capacity-workshop NODE_CPU=$NODE_ALLOC ./pod-velocity-calculator.sh

Compare the "Nodes needed" figure to the default (8-core assumption). Does a larger node size reduce your projected node count significantly?

Part 4: Publish the Pod Velocity Dashboard to RHACM Grafana

Push the forecasting panels into the RHACM Grafana instance running on the hub cluster. You will switch your oc context to the hub, apply a pre-built dashboard ConfigMap, then switch back — a pattern you will reuse in Module 5.

Why switch context?

RHACM Grafana runs on the hub cluster. Your current oc session points at your student cluster. Logging into the hub gives you a hub token so oc apply can write to the hub’s open-cluster-management-observability namespace.

Step 4a — Save your student context and log into the hub

STUDENT_CTX=$(oc config current-context)
echo "Student context saved: $STUDENT_CTX"
oc login  \
  --username= \
  --password= \
  --insecure-skip-tls-verify=true
Sample Output
Student context saved: capacity-workshop/api-student-student-01-sandbox5388-opentlc-com:6443/system:admin
Login successful.

You don't have any projects. You can try to create a new project, by running

    oc new-project <projectname>

Step 4b — Apply the Pod Velocity Forecast dashboard ConfigMap

oc apply -f ~/module-02/module-2---pod-velocity-forecast.yaml \
  -n open-cluster-management-observability
Sample Output
configmap/module-2---pod-velocity-forecast created

The grafana-dashboard-loader sidecar detects the new ConfigMap (labelled grafana-custom-dashboard: "true") and imports it into production Grafana within a few seconds.

Step 4c — Switch back to your student cluster

oc config use-context $STUDENT_CTX
oc whoami
Sample Output
Switched to context "capacity-workshop/api-student-student-01-sandbox5388-opentlc-com:6443/system:admin".
system:admin

This three-step pattern — save context, login to hub, switch back — is used again in Module 5 when you import your student cluster into RHACM.

How RHACM Dev Grafana Works

The ConfigMap you just applied was generated using the official RHACM Development Grafana tooling from the multicluster-observability-operator repository. Understanding this workflow is useful when you want to build your own custom dashboards in a real environment.

A Dev Grafana instance is already running in this workshop hub at:

Log in with your hub credentials ( / ) and explore freely — it connects to the same Thanos datasource as production Grafana but gives you full editor access in the UI.

The interactive dashboard workflow

When you want to create or modify dashboards from scratch:

  1. Deploy Dev Grafana (hub admin runs once):

    git clone --depth 1 \
      https://github.com/stolostron/multicluster-observability-operator.git ~/acm-tools
    cd ~/acm-tools/tools
    bash setup-grafana-dev.sh --deploy
  2. Log into Dev Grafana in your browser — this creates your user record in the Dev Grafana database, which the export script requires.

  3. Promote yourself to Grafana Admin (optional — needed to create/save dashboards):

    bash switch-to-grafana-admin.sh 
  4. Design your dashboard in the Dev Grafana UI. Create a folder named after yourself, build the panels, and save. The Thanos datasource with all managed cluster metrics is already configured.

  5. Export to a production ConfigMap:

    bash generate-dashboard-configmap-yaml.sh "Your Dashboard Name"
    # To target a specific folder:
    bash generate-dashboard-configmap-yaml.sh -f "Your Folder" "Your Dashboard Name"

    The exported YAML is labelled grafana-custom-dashboard: "true" and optionally annotated with observability.open-cluster-management.io/dashboard-folder: <folder> — the loader uses this to place the dashboard in your named folder in production.

  6. Apply to production Grafana:

    oc apply -f your-dashboard-name.yaml -n open-cluster-management-observability
  7. Clean up Dev Grafana when done:

    bash setup-grafana-dev.sh --clean

generate-dashboard-configmap-yaml.sh uses kubectl exec into the Dev Grafana pod, so it requires hub admin (kubeadmin) access. Students apply the exported YAML with oc apply — only ConfigMap create access in the observability namespace is needed, which is pre-granted in this workshop.

Part 5: Log In to Grafana and View the Dashboard

Step 5a — Open the dashboard URL

Open a new browser tab and navigate directly to the RHACM Grafana instance:

Once logged in, browse to Dashboards → Custom → Module 2 — Pod Velocity Forecast.

You can also use the direct dashboard URL: /d/module-02-pod-velocity

Step 5b — Authenticate with OpenShift

RHACM Grafana uses OpenShift OAuth — there is no separate Grafana username or password to set up. Follow these steps on the login page:

  1. Click Log in with OpenShift.

  2. On the OpenShift login page, choose the workshop-students identity provider.

  3. Enter your hub credentials:

    Field Value

    Username

    Password

  4. Click Log in — you are redirected back to Grafana automatically.

The identity provider is named workshop-students because the hub cluster was configured with an htpasswd provider for all student accounts. If you see other providers (e.g. htpasswd or kube:admin) you can ignore them — only workshop-students has your student account.

Step 5c — Explore the dashboard panels

Once logged in you will see three panels:

  1. Pod Velocity Forecast — Monthly Growth (capacity-workshop) (Stat panels, one per cluster) — pods started in capacity-workshop in the last 30 days; should match the "Pods started in last 30d" value from pod-velocity-calculator.sh

  2. Projected Nodes Needed (Next Quarter) (Gauge panels, one per cluster) — the computed node requirement with green/yellow/red thresholds; should match the "Nodes needed next quarter" from the script

  3. Pod Count Over Time (Time series) — a historical view of pod growth in capacity-workshop

What to look for:

  • If the Gauge shows green (0-5 nodes), your cluster has comfortable headroom

  • Yellow (5-10) means you should start a procurement conversation this quarter

  • Red (10+) means you need nodes sooner — initiate an emergency capacity request

Because viewers_can_edit is enabled in this Grafana instance, you can also click the panel title → Edit to inspect or modify the PromQL query behind any panel — useful for adapting dashboards to your own environment.

Pod Velocity Forecast Dashboard showing 6 pods/month and 1 node needed for the capacity-workshop namespace
Figure 1. Pod Velocity Forecast Dashboard — Cluster: student-01 / Namespace: capacity-workshop

The Cluster and Namespace dropdowns default to student-01 and capacity-workshop. With those selections the Stat and Gauge values match pod-velocity-calculator.sh directly — if the script printed "Add 1 worker node(s)", the Gauge shows 1 (green).

Now change Namespace to All — notice how the numbers jump:

Dashboard with Namespace set to All showing 293 pods/month and 6 nodes needed
Figure 2. Namespace = All — 293 pods/month across the entire cluster (includes ~287 platform pods)

This is why namespace scoping matters in capacity planning.

Those 293 pods are every OpenShift platform pod (etcd, API server, monitoring, DNS, etc.) started when the cluster was provisioned. Unscoped, the model recommends 6 new nodes for workloads that are already running and won’t grow. Always scope your velocity query to the namespaces your application teams control.

Part 6: Validate the Forecast

Let’s cross-check the model against what’s actually running. Run this command to count current pods:

oc get pods -n capacity-workshop --no-headers | wc -l

Now run the full validation echo:

echo "Current pod count: $(oc get pods -n capacity-workshop --no-headers | wc -l)"
echo ""
echo "Forecast predicted (based on 30-day velocity):"
echo "This would require manual calculation from historical data"

Facilitator Discussion Point:

In production, you’d store daily snapshots of these metrics in a time-series database (Thanos/Prometheus long-term storage) to validate forecast accuracy over time.

Accuracy improves with:

  • Longer historical data (90 days > 30 days)

  • Seasonal adjustment (Black Friday, tax season, etc.)

  • Deployment pattern clustering (some teams deploy weekly, others quarterly)

Lab 2 Summary: Your Forecasting Model

You’ve now built:

✅ Pod Velocity calculation using the Prometheus API (no browser PromQL needed) ✅ Average resource request metrics pulled automatically ✅ Quarterly node requirement projection formula ✅ Custom RHACM Grafana dashboard for fleet-wide visibility — published via ConfigMap ✅ Understanding of why Pod Velocity beats linear CPU trending

Real-World Application

Take this model back to your organization:

  1. Capture your deployment cadence - How many new services per quarter?

  2. Measure average resource requests - What’s your "standard microservice footprint"?

  3. Calculate node runway - When will you run out of capacity?

  4. Build forecasting into your change approval process - Every new service deployment should update the forecast

Reusing these scripts in your own environment:

Both scripts accept environment variables so you can adapt them without editing:

NAMESPACE=production NODE_CPU=16 WINDOW_DAYS=90 \
  ./pod-velocity-calculator.sh

GRAFANA_URL=https://my-grafana.example.com \
  NAMESPACE=production \
  ./create-acm-dashboard.sh

The scripts are designed to be run repeatedly — create-acm-dashboard.sh overwrites the dashboard if it already exists ("overwrite": true).

The Forecasting Pitfall:

Forecasting is only as good as your resource requests. If developers over-request by 10x, your forecast will tell you to buy 10x more hardware than you actually need.

This is why Module 3 (Developer Track - Right-Sizing) is critical. You must fix request accuracy BEFORE forecasting, or you’ll forecast waste.

Key Takeaways

  • Pod Velocity (deployments/time) is more predictive than linear CPU trending for microservices

  • The Prometheus API lets you query PromQL from scripts — no browser UI required

  • RHACM provides fleet-wide capacity visibility across multiple clusters

  • RHACM Dev Grafana provides a full-editor instance for interactive dashboard creation — export with generate-dashboard-configmap-yaml.sh and promote to production via ConfigMap labelled grafana-custom-dashboard: "true"

  • Forecasting accuracy depends on accurate resource requests (Module 3 fixes this)

Next Steps

In Module 3: Developer Track - The "Zero Request" Myth, we’ll tackle the root cause of forecasting inaccuracy: incorrect or missing resource requests. You’ll learn about QoS classes, debug throttling, and right-size workloads using historical Prometheus data.

Pre-Module 3 Setup:

We’ve pre-deployed sample applications in capacity-workshop with intentionally broken resource configurations. These will be your test subjects for debugging OOMKilled events and CPU throttling in the next module.