Module 8: AI-Assisted Capacity Operations with OpenShift Lightspeed

Duration: 90 minutes

Optional Module

This module is optional. It requires OpenShift Lightspeed to be installed and configured on your cluster. If Lightspeed is not available in your environment, you can skip this module without affecting any other modules.

Learning Objectives

By the end of this module, you will be able to:

Describe what OpenShift Lightspeed is, how it is configured, and where it lives in the OCP console
Query Lightspeed as a developer to debug resource sizing, QoS, and HPA problems
Query Lightspeed as an infrastructure engineer to reason about node density, etcd limits, and fleet architecture
Use Lightspeed as a forecasting co-pilot to write PromQL queries, estimate capacity runway, and translate metrics into executive language
Compare how IBM Granite and Qwen3 models respond differently to the same capacity planning question
Recognise the boundaries of what an AI assistant can and cannot do for capacity planning

What is OpenShift Lightspeed?

OpenShift Lightspeed is an AI chat assistant embedded directly into the OpenShift web console. It answers questions about OpenShift Container Platform — how things work, why something broke, and what commands to run — without leaving the browser.

Unlike a general-purpose chatbot, Lightspeed has access to a curated Retrieval-Augmented Generation (RAG) database of OCP documentation. This means it can give answers grounded in Red Hat product behaviour rather than generic Kubernetes theory.

What Lightspeed Can Help With

Category	Examples
Debugging	"Why is my pod OOMKilled?" "What does exit code 137 mean?" "How do I check CPU throttling in Prometheus?"
Configuration	"How do I enable HPA on a custom metric?" "What kubeletConfig field increases maxPods?" "How do I write a ResourceQuota for a namespace?"
PromQL authoring	"Write a query for CPU request overcommit ratio per namespace" "Show me the PromQL for Pod Velocity over 90 days" "How do I calculate etcd database growth rate?"
Forecasting assistance	"I have 180 pods per node and grow at 12 pods/week. When do I hit maxPods?" "What capacity buffer should I plan for Black Friday traffic spikes?" "Translate this utilisation data into a quarterly budget recommendation"
Architecture decisions	"When should I split one large cluster into multiple smaller ones?" "What are the RHACM observability storage trade-offs?"

What Lightspeed Cannot Do

Lightspeed is a conversational assistant, not an autonomous agent. It cannot:

Run oc or kubectl commands on your behalf autonomously
Make changes to your cluster (it only advises)
Guarantee 100% accurate PromQL — always test queries in Prometheus before using in dashboards

Cluster Interaction (Technology Preview)

Lightspeed includes an optional Model Context Protocol (MCP) server that gives it read-only access to your cluster’s live Kubernetes state. When enabled, the Lightspeed Operator injects an openshift-mcp-server sidecar container into the lightspeed-app-server pod and grants it read-only in-cluster API access.

In this workshop, cluster interaction is enabled on your student cluster. The ocp4_workload_lightspeed role sets introspectionEnabled: true in OLSConfig, which causes the Operator to restart the app server with the MCP sidecar active.

With the MCP active, Lightspeed can answer questions about your actual cluster resources — pods, nodes, deployments, events, configmaps, and routes — instead of giving generic answers based only on its training data. Questions like "which of my namespaces is most over-requested?" or "are any nodes under memory pressure?" will be answered using live Kubernetes API data.

The bundled openshift-mcp-server sidecar runs in --read-only mode. It provides Kubernetes resource access via the OCP API. It does not execute arbitrary PromQL queries directly — but Lightspeed may synthesise PromQL for you based on what it knows about your cluster topology, which you can then run yourself in the Observe → Metrics console.

Lab 8E exercises this capability directly.

How Lightspeed is Provisioned

Your student cluster receives OpenShift Lightspeed automatically during provisioning. The platform team uses an AgnosticD v2 workload role called ocp4_workload_lightspeed to install and configure it. Understanding this provisioning model helps you manage Lightspeed environments as a platform engineer.

The AgnosticD Workload Pattern

AgnosticD v2 deploys workloads by listing role names in a workloads: array. For the Capacity Planning Workshop, student-compact-aws.yml includes:

workloads:
  - agnosticd.core_workloads.ocp4_workload_cert_manager
  - agnosticd.core_workloads.ocp4_workload_openshift_gitops
  - ocp4_workload_capacity_planning_workshop
  - ocp4_workload_lightspeed          # <-- Module 8

Each role receives ACTION: provision and applies its changes idempotently to the cluster. Running agd provision -c openshift-workloads re-runs all workloads safely.

What `ocp4_workload_lightspeed` Does

The role performs these eight steps in order:

Step What It Does

Step	What It Does
1 Namespace	Creates `openshift-lightspeed` with `openshift.io/cluster-monitoring: "true"` so Prometheus scrapes Lightspeed metrics.
2 OperatorGroup	Creates an OperatorGroup in OwnNamespace mode. This is required — the Lightspeed Operator does not support AllNamespaces install mode.
3 Subscription	Subscribes to the `lightspeed-operator` package from `redhat-operators` on the `stable` channel.
4 Wait for operator	Polls the `lightspeed-operator-controller-manager` pod (label: `control-plane=controller-manager`) until it is Running.
5 LiteMaaS Secret	Creates a Kubernetes Secret (`litemaas-credentials`) containing the API token used by OLSConfig to authenticate with the LiteMaaS proxy.
6 OLSConfig	Applies the cluster-scoped `OLSConfig` CR with model `granite-3-2-8b-instruct` and `introspectionEnabled: true`. Setting introspection causes the Operator to inject an `openshift-mcp-server` sidecar into the app-server pod, giving Lightspeed read-only access to live cluster resources.
7 MCP RBAC	When `introspectionEnabled: true`, creates a `ClusterRoleBinding` binding the `lightspeed-app-server` ServiceAccount to the `cluster-reader` ClusterRole. Without this binding the MCP tools can start but cannot list pods, nodes, or namespaces.
8 Wait for service	Polls the `lightspeed-app-server` pod (label: `app.kubernetes.io/name=lightspeed-service-api`) until it is Running. When introspection is enabled the pod runs three containers: `lightspeed-service-api`, `lightspeed-to-dataverse-exporter`, and `openshift-mcp-server`.

1 Namespace

Creates openshift-lightspeed with openshift.io/cluster-monitoring: "true" so Prometheus scrapes Lightspeed metrics.

2 OperatorGroup

Creates an OperatorGroup in OwnNamespace mode. This is required — the Lightspeed Operator does not support AllNamespaces install mode.

3 Subscription

Subscribes to the lightspeed-operator package from redhat-operators on the stable channel.

4 Wait for operator

Polls the lightspeed-operator-controller-manager pod (label: control-plane=controller-manager) until it is Running.

5 LiteMaaS Secret

Creates a Kubernetes Secret (litemaas-credentials) containing the API token used by OLSConfig to authenticate with the LiteMaaS proxy.

6 OLSConfig

Applies the cluster-scoped OLSConfig CR with model granite-3-2-8b-instruct and introspectionEnabled: true. Setting introspection causes the Operator to inject an openshift-mcp-server sidecar into the app-server pod, giving Lightspeed read-only access to live cluster resources.

7 MCP RBAC

When introspectionEnabled: true, creates a ClusterRoleBinding binding the lightspeed-app-server ServiceAccount to the cluster-reader ClusterRole. Without this binding the MCP tools can start but cannot list pods, nodes, or namespaces.

8 Wait for service

Polls the lightspeed-app-server pod (label: app.kubernetes.io/name=lightspeed-service-api) until it is Running. When introspection is enabled the pod runs three containers: lightspeed-service-api, lightspeed-to-dataverse-exporter, and openshift-mcp-server.

The LiteMaaS Token Flow

rhpds.litellm_virtual_keys  →  agnosticd_user_info.litemaas_api_key
        ↓
ocp4_workload_lightspeed_litemaas_api_token  (in student-01-workloads.yml)
        ↓
Kubernetes Secret: litemaas-credentials  (in openshift-lightspeed namespace)
        ↓
OLSConfig.spec.llm.providers[0].credentialsSecretRef
        ↓
lightspeed-app-server  →  LiteMaaS API  →  granite-3-2-8b-instruct

The rhpds.litellm_virtual_keys Ansible collection creates a per-GUID virtual key with the lab-prod package (Granite + Mistral, 90-day TTL). In production, this key flows automatically through agnosticd_user_info. For local development, pass it directly with -e ocp4_workload_lightspeed_litemaas_api_token=sk-….

Local Development

Platform engineers can re-install or update Lightspeed on any cluster without running a full agd cycle:

# From capacity-planning-lab-guide/ansible/
export KUBECONFIG=/path/to/kubeconfig
ansible-playbook setup-lightspeed.yml \
  -e ocp4_workload_lightspeed_litemaas_api_token=<your-key> \
  -e openshift_cluster_ingress_domain=apps.<guid>.<base_domain>

This dev/test wrapper calls ocp4_workload_lightspeed with ACTION: provision — the same role the RHDP platform runs in production.

Where to Find Lightspeed in the OCP Console

Lightspeed appears as a lightbulb icon in the top-right toolbar of the OpenShift web console, next to the question-mark help icon. Click it to open the chat panel.

Before starting the labs, open your student cluster console:

If the lightbulb icon is not visible, the Lightspeed Operator may still be starting up. Wait 2–3 minutes and refresh the page.

The Models in This Workshop

The Lightspeed instance on your cluster has been configured to use two models via the RHDP LiteMaaS AI service:

Primary (default): qwen3-14b — Qwen 3 14B from Alibaba. Strong general reasoning and — critically — reliable MCP tool selection. This is the default because it correctly calls Kubernetes API tools by name when cluster interaction is enabled.
Comparison (Lab 8D): granite-3-2-8b-instruct — IBM Granite 3.2 8B, a Red Hat and IBM enterprise model trained on Red Hat documentation. Faster and optimised for OCP knowledge questions, but inconsistent with MCP tool calling in this Technology Preview release.

Both models are available without any local GPU. They are served by the RHDP Model-as-a-Service (LiteMaaS) platform over an OpenAI-compatible API.

Why Qwen 3 as the default? During validation of this workshop, Granite 3.2 8B consistently misnamed MCP tools (generating oc_get_pods instead of pods_list_in_namespace), causing cluster queries to fail. Qwen 3 14B called all 14 available tools correctly on the first attempt. Lab 8D lets you observe this difference directly.

Lab 8A: Developer Queries — The "Smart Debugger"

20 minutes — Revisits Module 3 topics

In Module 3 you experienced OOMKilled pods, CPU throttling, QoS classes, and HPA configuration hands-on. In this lab you will ask Lightspeed about the exact apps you worked with — and compare its answers against what you observed directly.

All interactions in Labs 8A, 8B, and 8C happen inside the Lightspeed chat panel in the OCP web console. There are no terminal commands to run. Open the chat panel now at:

The lightbulb icon is in the top-right toolbar.

Query 1: OOMKilled diagnosis

Type the following prompt into the Lightspeed chat exactly as shown:

I have a pod called besteffort-app in the capacity-workshop namespace.
It has no memory requests or limits set. How do I check whether it has
been OOMKilled, and what Prometheus metric should I use to determine an
accurate memory request for it?

What to look for in the response

A good answer will mention:

Exit code 137 = 128 + SIGKILL (OOM kill)
oc describe pod and checking lastState.terminated.reason
The Prometheus metric container_memory_working_set_bytes for historical sizing
Setting memory limit slightly above the 95th-percentile observed usage
The difference between requests (scheduling baseline) and limits (hard ceiling)

Does Lightspeed’s answer match what you observed when you ran besteffort-app without limits in Lab 3?

Using this prompt in your own environment

Replace besteffort-app and capacity-workshop with your own pod name and namespace. The diagnostic approach is the same for any pod regardless of its name.

Query 2: QoS classes in plain language

I have three apps in my capacity-workshop namespace:
- guaranteed-app: CPU and memory requests equal limits (200m CPU / 256Mi)
- burstable-app: has a 100m CPU request but no limit
- besteffort-app: no resource requests or limits set at all

Explain which QoS class each app is in, and what happens to each one
when the node runs low on memory. Which gets evicted first?

Reflection

Compare this response to what you saw in Module 3 when you deliberately induced memory pressure. Notice whether Lightspeed’s eviction ordering matches the behaviour you observed — besteffort-app should be first, burstable-app second, and guaranteed-app last (or never, if limits are set correctly).

Using this prompt in your own environment

Replace the three app names and resource values with your own deployments. Any namespace with a mix of Guaranteed, Burstable, and BestEffort pods works equally well.

Query 3: HPA with a custom Prometheus metric

My load-generator app in the capacity-workshop namespace has CPU requests
of 100m and limits of 500m. I set up a basic CPU HPA on it in a previous
lab. Show me how to change it to scale based on a custom Prometheus metric
— for example, the number of HTTP requests per second — instead of CPU.
Show me the YAML.

What to look for

The answer should reference:

custom.metrics.k8s.io API or external.metrics.k8s.io
The Prometheus Adapter (prometheus-adapter) that bridges Prometheus metrics to the Kubernetes metrics API
An HPA manifest using spec.metrics[].type: Pods or type: External
The relationship between the metric value and the target threshold

If the model gives a generic answer that ignores the existing CPU HPA, follow up with: "I already have a CPU HPA on load-generator. Show me the full updated YAML that replaces the CPU metric with http_requests_per_second."

Using this prompt in your own environment

Replace load-generator and capacity-workshop with your own deployment and namespace. The HPA YAML pattern is the same regardless of the app name.

Query 4: CPU throttling PromQL

Write a PromQL query that shows the CPU throttling rate as a percentage
for every pod in my capacity-workshop namespace. My cpu-throttle-demo pod
has a 200m CPU limit but runs a burn loop that tries to use far more.
I want to be able to see that throttling clearly in the query output.

Sample expected output

100 * sum by (pod, namespace) (
  rate(container_cpu_cfs_throttled_seconds_total{namespace="capacity-workshop"}[5m])
) /
sum by (pod, namespace) (
  rate(container_cpu_cfs_periods_total{namespace="capacity-workshop"}[5m])
)

Test every PromQL query Lightspeed writes before using it in a dashboard. Run it in the OCP console Observe → Metrics. Lightspeed can produce syntactically correct queries that don’t match your actual metric label names — verify that container_cpu_cfs_throttled_seconds_total exists in your cluster before building a dashboard panel around it.

Using this prompt in your own environment

Replace capacity-workshop with your namespace and cpu-throttle-demo with whatever pod you want to highlight. The throttling PromQL pattern works on any namespace.

Lab 8B: Infrastructure Engineer Queries — The "Fleet Advisor"

20 minutes — Revisits Modules 4 & 5

In Modules 4 and 5 you worked through node density mathematics, etcd constraints, and RHACM observability. Now you will ask Lightspeed the operational questions an infrastructure engineer faces — using your actual cluster topology as the context.

Query 1: Increasing maxPods safely

My OpenShift cluster has 3 nodes that are all running as combined
control-plane and worker nodes. I want to support more workloads in
the capacity-workshop namespace. How do I safely raise the maxPods
limit above the default 250 on OpenShift 4.21, and what are the specific
risks of doing this on nodes that run both control-plane and workload pods?

What to look for

A complete answer will cover:

Editing a KubeletConfig CR (spec.kubeletConfig.maxPods) — not editing kubelet.conf directly
The MachineConfigPool selector to target specific node types
Extra risk on combined control-plane/worker nodes: pushing the scheduler onto nodes that also run etcd, the API server, and the controller manager
Memory overhead increase per pod (DaemonSets consume slots too)
Monitoring: kubelet_running_pods metric and etcd etcd_db_total_size_in_bytes

Does Lightspeed flag that this triggers a node drain/reboot via MachineConfig? That is critical on a 3-node cluster — the cluster has no dedicated workers to absorb the load during a rolling restart.

Using this prompt in your own environment

Replace the node count and role description with your own cluster topology. The KubeletConfig approach works on any OpenShift 4.x cluster; the risk profile changes depending on whether your nodes are dedicated workers or combined roles.

Query 2: etcd sizing and monitoring

Write a PromQL query to check the current etcd database size on my
OpenShift cluster and estimate how quickly it is growing. My cluster
has 3 control-plane nodes. I want to know when I should defrag or
plan for additional control plane capacity.

Sample PromQL output to expect

# Current etcd DB size in bytes
etcd_mvcc_db_total_size_in_bytes

# Growth rate over 7 days (bytes per day)
deriv(etcd_mvcc_db_total_size_in_bytes[7d]) * 86400

Using this prompt in your own environment

The PromQL queries are cluster-agnostic — they work on any OpenShift cluster regardless of size. Update the control-plane node count to match your own topology.

Query 3: Cluster architecture trade-offs

I run a 3-node OpenShift training cluster where all nodes are combined
control-plane and workers, and I use a single capacity-workshop namespace
for all student workloads. As I scale to more students and cohorts, when
does it make more sense to keep growing this single cluster versus deploying
separate per-student clusters? What are the operational trade-offs?

Reflection

This question has no single right answer. Notice whether Lightspeed frames the decision around:

etcd object count limits (~8 GB database size as a hard threshold)
The added risk of co-locating student workloads on control-plane nodes
Network blast radius (one shared cluster vs. isolated per-student failures)
RHACM management overhead per cluster
Namespace-level multi-tenancy vs. dedicated cluster per cohort

Compare this to the Module 4 discussion of when fleet federation makes sense. The 3-node combined topology is a real constraint that a generic "500-node cluster" scenario ignores.

Using this prompt in your own environment

Replace the node count and use-case description with your own. The federation-vs-consolidation trade-off analysis is relevant at any scale — substitute your own workload pattern.

Query 4: RHACM capacity dashboard PromQL

I connected my student cluster to an RHACM hub in Module 5 and I'm
building a Grafana capacity dashboard on the hub. Write three PromQL
queries for my capacity-workshop namespace:
1. CPU request overcommit ratio per namespace across all clusters
2. Memory utilisation versus allocated capacity per cluster
3. The number of pods per node to identify density hotspots

Use the cluster label so each cluster appears separately in the dashboard.

When Lightspeed writes multi-cluster queries for RHACM, it should reference the cluster label that Thanos Federation adds to every metric scraped from managed clusters:

# CPU overcommit ratio per namespace
sum by (namespace, cluster) (kube_pod_container_resource_requests{resource="cpu"})
/
sum by (namespace, cluster) (kube_node_status_allocatable{resource="cpu"})

If Lightspeed omits the cluster label, follow up with: "These queries will run against RHACM Thanos. Add the cluster label to the by() clause so each cluster appears separately in Grafana."

Using this prompt in your own environment

Replace capacity-workshop with your own namespace. The three PromQL patterns work for any namespace on any RHACM-managed cluster — the cluster label is injected automatically by Thanos Federation.

Lab 8C: Forecasting Assistant — The "Planning Copilot"

30 minutes — Revisits Modules 1, 2, & 7

This lab is the capstone of the workshop. You will use Lightspeed to operationalise the forecasting models from Modules 1 and 2, stress-test them against the Black Friday scenario from Module 6, and produce the executive language from Module 7 — all with AI assistance.

How this lab works

The prompts below use your capacity-workshop namespace and your cluster’s actual node count. For Queries 2 and 5, representative numbers are provided based on the workshop cluster — if your Module 2 Pod Velocity dashboard showed different values, substitute them for even more accurate results.

Query 1: Pod Velocity PromQL (Module 2 revisited)

Write the PromQL query that calculates Pod Velocity — the number of new
pod deployments per week — across all namespaces for the past 90 days.
This is the foundation of our capacity forecasting model from Module 2.

Expected output

Lightspeed should produce something close to:

# New pod creations per week (7-day rolling sum)
increase(kube_pod_created[7d])

# Pod Velocity trend over 90 days (weekly averages)
sum by (namespace) (
  increase(kube_pod_created{namespace!=""}[7d])
)

The key concept from Module 2 is that Pod Velocity (new pods/time) is a better predictor of node demand than raw CPU trending for microservices architectures. Ask Lightspeed to explain why this metric matters for capacity forecasting to validate it understands the concept.

Query 2: Runway calculation with the math shown

My capacity-workshop cluster has 3 nodes, each currently running around
65 pods. The Pod Velocity from my Module 2 Prometheus dashboard is
approximately 14 new pods per week. The default maxPods per node is 250.

How many weeks until I hit maxPods and need to add a worker node?
Show your calculation step by step, then write a PromQL expression that
computes this runway automatically from live cluster data.

What a complete answer looks like

Current available capacity:
  3 nodes × 250 maxPods = 750 total pod slots
  Currently used: 3 × 65 = 195 pods
  Available: 750 - 195 = 555 pod slots

At 14 new pods/week:
  Weeks until full: 555 ÷ 14 = 39.6 weeks (~40 weeks)

PromQL for live runway (in weeks):
  (
    sum(kube_node_status_allocatable{resource="pods"})
    - sum(kube_pod_info{node!=""})
  ) / <pod_velocity_per_week>

Using this prompt in your own environment

Substitute your own node count, current pod count, and Pod Velocity. If your Module 2 dashboard showed a different velocity, use that number — the step-by-step maths and the PromQL pattern are the same regardless of scale.

Capacity buffer rule of thumb

Never plan to scale at 100% capacity. Ask Lightspeed: "What safety buffer should I apply to this runway estimate for a production platform?" A well-calibrated answer will recommend 20–30% buffer (scale at 70–80% capacity), accounting for burst traffic, deployment surges, and the time required to provision new nodes (especially in cloud environments where new node provisioning takes 5–15 minutes).

Query 3: Grafana panel JSON for capacity countdown

Generate a Grafana panel JSON snippet that shows "days until maxPods is
reached" as a single-stat panel per node. This should read from Prometheus
and update automatically. Use a green/yellow/red threshold:
- green: > 60 days
- yellow: 30-60 days
- red: < 30 days

Assume Pod Velocity of 14 pods/week and maxPods of 250.

Lightspeed will produce a JSON panel definition. Before pasting it into your RHACM Grafana dashboard, check:

The datasource field matches your Prometheus data source name (usually default or Thanos)
The PromQL uses label names that exist in your cluster
The thresholds are in the correct unit (days, not seconds)

Lightspeed often produces valid JSON with correct structure but wrong data source names — a quick edit is usually all that’s needed.

Query 4: Black Friday buffer planning (Module 6 revisited)

My production cluster has a CPU request overcommit ratio of 2.3× (meaning
applications have requested 2.3× the actual allocatable CPU). During Black
Friday, we expect a 10× traffic spike over baseline.

What are the risks of running at 2.3× overcommit during a 10× traffic event,
and how much additional node capacity should I plan to provision before the
event? Use the capacity planning framework from the answer.

Reflection

This is the Module 6 Black Friday Chaos Game scenario expressed as an advisory question rather than a live simulation. Lightspeed should discuss:

CPU overcommit ratio interacts with HPA — at 2.3× overcommit, HPA TARGETS percentages are inflated, meaning autoscaling fires later than expected
At 10× traffic, the request total could briefly exceed 23× actual CPU allocatable — a scenario where Guaranteed pods hold their CPU but Burstable pods get throttled to their limit
Pre-event node provisioning: add enough nodes to bring overcommit below 1.5× before the event window, not during it
Cloud commitment strategy: on-demand vs. reserved for burst capacity

Query 5: Executive summary (Module 7 revisited)

Based on the following capacity data from our capacity-workshop OpenShift
cluster, write a one-paragraph executive summary suitable for a quarterly
budget request:

- Current cluster: 3 combined control-plane/worker nodes on AWS (m7a.2xlarge)
- Active workloads: capacity-workshop namespace with ~195 pods running
- Pod Velocity: 14 new pods/week (growing 8% month-over-month)
- Capacity runway: ~40 weeks at current growth before hitting maxPods
- Action required: provision additional worker nodes before week 32 (safety buffer)
- Estimated cost: $3,200/month per additional m7a.2xlarge node on AWS

Write this for an audience of finance and business leadership, not engineers.
Avoid technical jargon. Focus on cost, risk, and timeline.

Compare to your Module 7 pitch

In Module 7 you wrote a capacity pitch manually. Notice how Lightspeed’s version differs:

Does it use the right tone (business language, not technical)?
Does it communicate the consequence of not acting (the risk) rather than just the technical facts?
Does it frame infrastructure spend as a business enabler ("supports student growth") rather than a cost centre?

If you are not satisfied with the first output, try this follow-up: "Rewrite the summary to emphasise the risk of not provisioning the nodes. What business impact occurs at week 40 if we do nothing?"

Using this prompt in your own environment

Replace the node type, pod count, velocity, and cost figure with your own cluster’s numbers. The executive summary pattern — current state, trend, runway, action, cost — is reusable for any capacity planning communication regardless of environment.

Optional: Live cluster queries (cluster interaction required)

If your instructor has enabled cluster interaction on your Lightspeed instance, the following prompts will return answers based on your actual cluster state rather than hypothetical numbers.

Look at my current cluster state. Which three namespaces have the highest
ratio of CPU requests versus actual CPU usage over the past 24 hours?

Based on current pod counts across all nodes and the deployment velocity
you can observe in cluster events, estimate how many weeks of node capacity
remain before I need to add a worker node.

Cluster interaction is a Technology Preview feature as of OpenShift Lightspeed 1.0. Answers from live cluster queries are more specific but also more dependent on model quality. Your default model, qwen3-14b, was selected specifically because it reliably selects the correct MCP tool names. You will observe in Lab 8D how granite-3-2-8b-instruct compares on the same queries.

Lab 8D: Model Comparison — Granite vs. Qwen3

20 minutes

Both models are pre-configured on your cluster. To switch: click the model name shown just above the chat input field and select from the dropdown. Your default is qwen3-14b — switch to granite-3-2-8b-instruct to run the comparison.

The comparison exercise

Pick one of the following prompts — ideally one that gave you a long or complex answer in Lab 8C. Run it against both models and compare side by side.

Suggested prompts for comparison:

The runway calculation from Lab 8C Query 2 (maths + PromQL)
The Black Friday buffer planning from Lab 8C Query 4
The executive summary from Lab 8C Query 5

For each response, evaluate against this rubric:

Criterion	Granite 3.2 8B	Qwen3 14B
Answers the question directly without unnecessary preamble	☐	☐
Shows calculation steps (for maths questions)	☐	☐
Uses Red Hat / OCP-specific terminology correctly	☐	☐
Provides working PromQL (for metric queries)	☐	☐
Explains reasoning, not just the answer	☐	☐
Response length is appropriate (not too short, not padded)	☐	☐
Executive language tone (for Query 5)	☐	☐

Criterion

Granite 3.2 8B

Qwen3 14B

Answers the question directly without unnecessary preamble

☐

Shows calculation steps (for maths questions)

☐

Uses Red Hat / OCP-specific terminology correctly

☐

Provides working PromQL (for metric queries)

☐

Explains reasoning, not just the answer

☐

Response length is appropriate (not too short, not padded)

☐

Executive language tone (for Query 5)

☐

Tick each box (☑) or leave it empty (☐) based on what you observe.

Discussion: Why does model choice matter?

Model selection is a capacity planning decision.

Consider the trade-offs:

Qwen3 14B (default in this workshop): Larger model (14B parameters), stronger general reasoning, and — verified in this workshop — reliable MCP tool selection. It correctly calls pods_list_in_namespace, nodes_top, and namespaces_list by name on the first attempt. Better for complex forecasting maths, multi-variable analysis, free-form writing like executive summaries, and any query that requires live cluster data.
Granite 3.2 8B: Smaller model (8B parameters), faster response time, lower token cost per query, trained on Red Hat documentation. Strong for day-to-day OCP knowledge questions (how-to, debugging explanations, PromQL authoring). In this Technology Preview release, it inconsistently maps MCP tool names — useful for the 8A–8C labs but less reliable for cluster interaction.

For a production Lightspeed deployment, the right choice depends on:

The primary use case (quick debugging vs. strategic analysis vs. live cluster queries)
Available LiteMaaS token budget (larger models cost more tokens per query)
Response latency requirements (14B is ~2× slower than 8B at same GPU throughput)
Whether MCP cluster interaction is required — Qwen3 14B is significantly more reliable at tool selection

As the platform engineer responsible for this deployment, you should evaluate both models against your team’s actual query patterns — exactly as you did in this lab.

Lab 8E: Live Cluster Queries via MCP — Asking About Your Actual Cluster

10 minutes — requires cluster interaction (pre-enabled on your student cluster)

In Labs 8A–8D, Lightspeed answered every question using its RAG knowledge base — authoritative, but hypothetical. Now that your cluster has the MCP server active, Lightspeed can call the Kubernetes API to read live resource state before forming its answer.

This lab requires the openshift-mcp-server sidecar to be active inside the lightspeed-app-server pod. The provisioning role already set introspectionEnabled: true. To verify it is running:

oc get pod -n openshift-lightspeed \
  -l app.kubernetes.io/name=lightspeed-service-api \
  -o jsonpath='{range .items[0].spec.containers[*]}{.name}{"\n"}{end}'

You should see three lines: lightspeed-service-api, lightspeed-to-dataverse-exporter, and openshift-mcp-server.

Query 1: What workloads are running in the capacity-workshop namespace?

Type the following into the Lightspeed chat panel:

What deployments and pods are running in the capacity-workshop namespace?
Are any of them in a non-Running state?

What to look for

With MCP enabled, Lightspeed should return the actual deployment names from your cluster rather than a generic answer. Compare this to the RAG-only responses in Labs 8A–8C — those returned general Kubernetes knowledge. This response should reference pods you actually deployed in earlier modules.

If Lightspeed gives a generic answer about checking deployments, cluster interaction may still be initialising — wait 2 minutes and try again.

Query 2: Summarise this cluster’s node capacity

How many nodes does this cluster have? What is the total allocatable CPU and
memory across all nodes? Are any nodes in a NotReady or degraded state?

What to look for

Lightspeed should call the Kubernetes API to list nodes and return real node names, actual CPU/memory values, and accurate Ready conditions. This is the baseline data you need before running any capacity projection — and getting it via chat is faster than running oc describe nodes and mentally summing the allocatable fields.

Query 3: Capacity headroom from live data

Based on the nodes in this cluster, estimate how much additional workload capacity
is available. What fraction of allocatable CPU and memory is currently requested?

Reflection

Compare this response to what Lightspeed gave you in Lab 8C Query 2 (the runway calculation). In Lab 8C you supplied hypothetical numbers. Here, Lightspeed reads the actual cluster state. Notice:

Does it correctly distinguish between requests (what pods ask for at scheduling time) and limits (hard ceiling)?
Does it reference actual node names and resource totals from your cluster?
Is the headroom estimate consistent with what you would calculate manually from oc describe nodes?

This is the core value proposition of MCP introspection: AI-assisted capacity operations against the real environment rather than a hypothetical scenario.

Lab 8 Summary: AI-Assisted Capacity Operations

You have now completed all four labs. Here is what you accomplished:

Lab 8A — Developer Queries

✅ Diagnosed OOMKilled events using Lightspeed as a debugging assistant
✅ Generated QoS explanations and HPA YAML configurations
✅ Authored CPU throttling PromQL with AI assistance

Lab 8B — Infrastructure Engineer Queries

✅ Got step-by-step guidance on increasing maxPods safely
✅ Generated etcd monitoring queries and growth forecasts
✅ Explored cluster architecture trade-offs through conversational AI

Lab 8C — Forecasting Assistant

✅ Rebuilt the Module 2 Pod Velocity model with AI-assisted PromQL
✅ Calculated capacity runway with explicit maths shown by the model
✅ Generated a Grafana panel specification for capacity countdown
✅ Applied Black Friday buffer planning via conversational analysis
✅ Produced an executive capacity summary from raw data

Lab 8D — Model Comparison

✅ Compared Granite 3.2 8B and Qwen3 14B on the same capacity question
✅ Evaluated model choice as a capacity and cost trade-off

Lab 8E — Live PromQL via MCP

✅ Asked Lightspeed questions that returned real cluster data via the MCP metrics toolset
✅ Verified that MCP introspection replaces hypothetical scenarios with actual numbers

Prompt Engineering Tips for Capacity Planning

These patterns consistently improve Lightspeed responses on capacity topics:

Give context: "I am a platform engineer managing a fleet of 12 OpenShift clusters on AWS…" is more effective than starting without context.
Specify the format: "Show the calculation step by step" or "Write a PromQL query" focuses the model on the output you need.
Ask for Red Hat specifics: "On OpenShift 4.21 specifically" or "using the OpenShift web console" narrows the answer to OCP-specific paths rather than generic Kubernetes.
Iterate: If the first response is too generic, follow up with: "That was for a generic Kubernetes cluster. How does this differ on OpenShift where [specific feature] is available?"
Validate PromQL: Always paste generated PromQL into the OCP console Observe → Metrics view before embedding it in a dashboard or alert rule.

Key Takeaways

Lightspeed is a force multiplier, not a replacement for understanding the capacity planning concepts from Modules 1–7 — the quality of your queries is directly proportional to the depth of your domain knowledge
Granite 3.2 8B is optimised for Red Hat product questions; Qwen3 14B is stronger for complex reasoning and tool-use
PromQL assistance is one of the highest-value Lightspeed use cases — it removes the barrier of memorising metric names and query syntax
Cluster interaction (MCP introspection) is enabled on your cluster — Lightspeed’s openshift-mcp-server sidecar gives it read-only Kubernetes API access, enabling capacity questions answered against real cluster state rather than hypothetical data
Model selection is a capacity planning decision: token cost, latency, and capability all factor in just like infrastructure sizing decisions

Workshop Complete

Congratulations — you have completed all eight modules of the Strategic Capacity Planning & Forecasting for OpenShift at Scale workshop.

You now have a complete toolkit for running data-driven capacity operations:

Module	Capability Gained
1 — Planning Horizon	Established baselines using the three-horizon planning framework
2 — Mathematics of Forecasting	Pod Velocity model and predictive dashboard construction
3 — Developer Track	QoS classes, right-sizing with Prometheus P95, HPA configuration
4 — Infrastructure Track	Node density mathematics, etcd limits, kubeletConfig tuning
5 — Fleet Observability	RHACM multi-cluster Thanos dashboards and metric allowlists
6 — Integration Challenge	Real-time incident decision-making under Black Friday pressure
7 — Strategic Roadmapping	12-month capacity roadmap and executive communication
8 — AI-Assisted Operations	Lightspeed as a debugging co-pilot, PromQL assistant, forecasting advisor, and live-cluster MCP query engine

Module

Capability Gained

1 — Planning Horizon

Established baselines using the three-horizon planning framework

2 — Mathematics of Forecasting

Pod Velocity model and predictive dashboard construction

3 — Developer Track

QoS classes, right-sizing with Prometheus P95, HPA configuration

4 — Infrastructure Track

Node density mathematics, etcd limits, kubeletConfig tuning

5 — Fleet Observability

RHACM multi-cluster Thanos dashboards and metric allowlists

6 — Integration Challenge

Real-time incident decision-making under Black Friday pressure

7 — Strategic Roadmapping

12-month capacity roadmap and executive communication

8 — AI-Assisted Operations

Lightspeed as a debugging co-pilot, PromQL assistant, forecasting advisor, and live-cluster MCP query engine

Module 8: AI-Assisted Capacity Operations with OpenShift Lightspeed

Learning Objectives

What is OpenShift Lightspeed?

What Lightspeed Can Help With

What Lightspeed Cannot Do

Cluster Interaction (Technology Preview)

How Lightspeed is Provisioned

The AgnosticD Workload Pattern

What ocp4_workload_lightspeed Does

The LiteMaaS Token Flow

Local Development

Where to Find Lightspeed in the OCP Console

The Models in This Workshop

Lab 8A: Developer Queries — The "Smart Debugger"

Query 1: OOMKilled diagnosis

Query 2: QoS classes in plain language

Query 3: HPA with a custom Prometheus metric

Query 4: CPU throttling PromQL

Lab 8B: Infrastructure Engineer Queries — The "Fleet Advisor"

Query 1: Increasing maxPods safely

Query 2: etcd sizing and monitoring

Query 3: Cluster architecture trade-offs

Query 4: RHACM capacity dashboard PromQL

Lab 8C: Forecasting Assistant — The "Planning Copilot"

Query 1: Pod Velocity PromQL (Module 2 revisited)

Query 2: Runway calculation with the math shown

Query 3: Grafana panel JSON for capacity countdown

Query 4: Black Friday buffer planning (Module 6 revisited)

Query 5: Executive summary (Module 7 revisited)

Optional: Live cluster queries (cluster interaction required)

Lab 8D: Model Comparison — Granite vs. Qwen3

The comparison exercise

Discussion: Why does model choice matter?

Lab 8E: Live Cluster Queries via MCP — Asking About Your Actual Cluster

Query 1: What workloads are running in the capacity-workshop namespace?

Query 2: Summarise this cluster’s node capacity

Query 3: Capacity headroom from live data

Lab 8 Summary: AI-Assisted Capacity Operations

Prompt Engineering Tips for Capacity Planning

Key Takeaways

Workshop Complete

What `ocp4_workload_lightspeed` Does