Module 3: Baseline Performance Testing and Measurement
Module Overview
In this module, you will establish baseline performance metrics for your OpenShift cluster before implementing any low-latency optimizations. Understanding your starting point is crucial for measuring the effectiveness of performance tuning efforts.
By the end of this module, you will be able to:
-
Install and configure kube-burner for performance testing
-
Run baseline performance tests to measure pod creation latency
-
Analyze test results to understand current cluster performance
-
Interpret performance metrics including percentiles (P50, P95, P99)
-
Create a performance baseline document for future comparisons
Lab Environment
This module requires:
Important: This module should be executed on your target cluster (not the hub cluster), where performance tuning will be applied. |
In this hands-on lab, you’ll use kube-burner, a Kubernetes performance testing tool specifically designed to stress test OpenShift clusters. We’ll focus on measuring pod creation latency, which is a critical metric for applications requiring fast scaling and low startup times.
Connecting to Your Target Cluster
Before beginning performance testing, you must connect to your target cluster that was set up and imported in Module 2.
Ensure you completed Module 2 and have:
-
A target cluster imported into RHACM
-
SR-IOV Network Operator installed (if applicable)
-
OpenShift Virtualization installed (if applicable)
-
Built-in Node Tuning Operator verified
Step 1: Log into Your Target Cluster
-
From your hub cluster or workstation, connect to your target cluster:
# If using RHACM, list your managed clusters first oc get managedclusters # Log into your target cluster (replace with your cluster details) # Option 1: Using cluster API URL and token oc login --token=<your-cluster-token> --server=https://api.<cluster-name>.<domain>:6443 # Option 2: If you have multiple contexts configured oc config get-contexts oc config use-context <target-cluster-context>
If you don’t have the login credentials for your target cluster:
-
From RHACM console, navigate to "Infrastructure" → "Clusters"
-
Find your target cluster and click on it
-
Use the "Access cluster" or "Launch to cluster" option
-
Copy the login command from the target cluster’s web console
-
-
Verify you’re connected to the correct cluster:
# Confirm cluster identity oc cluster-info # Check you're not on the hub cluster oc get managedclusters 2>/dev/null || echo "✅ Connected to target cluster (not hub)" # Verify cluster version matches expectations oc version --short
Step 2: Verify Target Cluster Status
-
Confirm the target cluster is ready for performance testing:
# Check overall cluster health oc get nodes # Verify installed operators from Module 2 oc get csv --all-namespaces | grep -E "(sriov|kubevirt|virtualization)" # Check built-in Node Tuning Operator oc get tuned -n openshift-cluster-node-tuning-operator
What is Low-Latency and Why Does it Matter?
Low-latency computing refers to minimizing the delay between an input and its corresponding output. In containerized environments, this translates to:
-
Pod Startup Time: How quickly containers become ready to serve traffic
-
Network Latency: Time for network packets to traverse the cluster
-
Storage I/O Latency: Speed of persistent volume operations
-
API Response Time: Kubernetes API server responsiveness
Low-latency performance is essential for:
-
Financial trading systems requiring microsecond response times
-
Real-time gaming and streaming platforms
-
IoT edge computing applications
-
High-frequency data processing workloads
-
Live video/audio processing systems
Verifying Your Target Cluster Configuration
Now that you’re connected to your target cluster, let’s verify it meets the requirements for performance testing and has the components installed from Module 2.
-
Verify cluster access and basic information:
# Confirm cluster-admin access on target cluster oc auth can-i '*' '*' # Check OpenShift version (should be 4.11+ for modern performance features) oc get clusterversion # Get cluster name and basic info oc cluster-info | head -3
-
Review the cluster nodes and their specifications:
# List all nodes with detailed information oc get nodes -o wide # Check node resources oc describe nodes | grep -E "(Name:|cpu:|memory:|Capacity|Allocatable)" # Verify worker node count (Optional) oc get nodes --selector='node-role.kubernetes.io/worker' --no-headers | wc -l
-
Confirm that operators from Module 2 are properly installed:
# Check built-in Node Tuning Operator (OpenShift 4.11+) oc get tuned -n openshift-cluster-node-tuning-operator # Verify SR-IOV Network Operator (if installed) oc get csv -n openshift-sriov-network-operator 2>/dev/null || echo "SR-IOV not installed (optional)" # Check OpenShift Virtualization (if installed) oc get csv -n openshift-cnv 2>/dev/null || echo "OpenShift Virtualization not installed (optional)" # Verify Performance Profile CRD is available oc get crd performanceprofiles.performance.openshift.io
Establishing Baseline Performance Metrics
Before implementing any performance optimizations, we need to establish baseline metrics on your target cluster. This provides a reference point for measuring the effectiveness of our tuning efforts in subsequent modules.
Why Test the Target Cluster? We’re running performance tests on the target cluster (not the hub cluster) because:
|
Setting up the Performance Testing Environment
-
Create a dedicated namespace for performance testing:
# Create performance testing namespace oc create namespace performance-testing # Set the namespace as current context oc project performance-testing # Verify namespace creation oc get project performance-testing
-
Label worker nodes for performance testing (optional - helps with workload placement):
# List worker nodes oc get nodes --selector='node-role.kubernetes.io/worker' --no-headers # Label nodes for performance testing (optional) # Replace 'worker-node-1' with actual node name # oc label node <worker-node-name> performance-testing=true
Installing kube-burner for Performance Testing
Kube-burner is a performance testing tool designed specifically for Kubernetes clusters. It can stress-test various aspects of cluster performance.
-
Download and install kube-burner:
# Create a directory for kube-burner mkdir -p ~/kube-burner && cd ~/kube-burner # Download the latest kube-burner binary for Linux curl -L https://github.com/kube-burner/kube-burner/releases/download/v1.17.5/kube-burner-V1.17.5-linux-x86_64.tar.gz -o kube-burner.tar.gz # Extract the binary tar -xzf kube-burner.tar.gz ls ~/ cd ~/ sudo mv kube-burner /usr/local/bin/ # Verify installation kube-burner version || echo "kube-burner installed successfully"
-
Create a directory for kube-burner configuration files:
# Create configuration directory mkdir -p ~/kube-burner-configs && cd ~/kube-burner-configs # Verify current directory pwd
-
Create a baseline performance test configuration:
cat > baseline-config.yml << 'EOF' global: measurements: - name: podLatency thresholds: - conditionType: Ready metric: P99 threshold: 30000ms metricsEndpoints: - indexer: type: local metricsDirectory: collected-metrics jobs: - name: baseline-workload jobType: create jobIterations: 20 namespace: baseline-workload namespacedIterations: true cleanup: false podWait: false waitWhenFinished: true verifyObjects: true errorOnVerify: false objects: - objectTemplate: pod.yml replicas: 5 inputVars: containerImage: registry.redhat.io/ubi8/ubi:latest EOF
-
Create the pod template for the baseline test:
cat > pod.yml << 'EOF' apiVersion: v1 kind: Pod metadata: name: baseline-pod-{{.Iteration}}-{{.Replica}} labels: app: baseline-test iteration: "{{.Iteration}}" spec: containers: - name: baseline-container image: {{.containerImage}} command: ["sleep"] args: ["300"] resources: requests: memory: "64Mi" cpu: "50m" limits: memory: "128Mi" cpu: "100m" restartPolicy: Never EOF
-
Verify the configuration files:
# List created configuration files ls -la ~/kube-burner-configs/ # Validate YAML syntax cat baseline-config.yml | head -10 cat pod.yml | head -10
Running Baseline Performance Tests
Now let’s execute our baseline performance test to measure the current cluster performance.
-
Execute the baseline performance test using the kube-burner CLI:
# Change to the configuration directory cd ~/kube-burner-configs # Run the baseline test kube-burner init -c baseline-config.yml --log-level=info # The test will create 20 iterations with 5 pods each (100 total pods) # and measure pod creation latency
-
Monitor the test progress in a separate terminal:
# Watch pods being created across namespases watch "oc get pods --all-namespaces | grep baseline" # Monitor cluster resource usage oc adm top nodes
-
Wait for the test to complete. You should see output similar to:
INFO[2025-09-05T10:30:15Z] 📁 Creating directory: collected-metrics INFO[2025-09-05T10:30:15Z] 🔥 Starting kube-burner with UUID 12345678-1234-1234-1234-123456789abc INFO[2025-09-05T10:30:15Z] 📊 Job baseline-workload: 20 iterations INFO[2025-09-05T10:30:45Z] ✅ Job baseline-workload completed in 30s
Analyzing Baseline Results
-
View the pod latency metrics from the collected data:
# Change to the kube-burner configuration directory cd ~/kube-burner-configs # Check if metrics were collected successfully if [ -d "collected-metrics" ]; then echo "✅ Metrics collected successfully!" echo "" # View the pod latency quantiles (summary metrics) echo "=== Pod Latency Summary ===" find collected-metrics/ -name "*podLatencyQuantilesMeasurement*" -type f | head -1 | xargs cat | jq -r '.[] | select(.quantileName != null) | "\(.quantileName): P99=\(.P99)ms, P95=\(.P95)ms, P50=\(.P50)ms, Avg=\(.avg)ms, Max=\(.max)ms"' | sort echo "" echo "=== Individual Pod Metrics (first 5) ===" find collected-metrics/ -name "*podLatencyMeasurement*" -type f | head -1 | xargs cat | jq -r '.[] | select(.podName != null) | "\(.podName): Ready=\(.podReadyLatency)ms, ContainersReady=\(.containersReadyLatency)ms, Scheduled=\(.schedulingLatency)ms"' | head -5 else echo "❌ No metrics directory found. Checking log output..." LATEST_LOG=$(ls -t kube-burner-*-*-*-*-*.log | head -1) echo "Latest log: $LATEST_LOG" echo "" grep -E "(Ready|PodScheduled|ContainersReady|Initialized).*99th.*max.*avg" $LATEST_LOG || echo "No latency metrics found in log" fi
-
Create a baseline results summary:
# Ensure we're in the correct directory cd ~/kube-burner-configs # Get the latest log file and extract UUID LATEST_LOG=$(ls -t kube-burner-*-*-*-*-*.log | head -1) TEST_UUID=$(grep "Finished execution with UUID" $LATEST_LOG | grep -o "[a-f0-9-]*" | tail -1) # Create results summary cat > baseline-results-$(date +%Y%m%d).md << EOF # Baseline Performance Test Results - $(date) ## Test Configuration - **Test Scale**: 100 pods (5 pods × 20 iterations) - **Container Image**: registry.redhat.io/ubi8/ubi:latest - **Test Type**: Pod creation latency measurement - **Test UUID**: $TEST_UUID ## Pod Latency Results EOF # Check for structured metrics first (modern approach) if [ -d "collected-metrics" ] && [ -f "collected-metrics/"*"podLatencyQuantilesMeasurement"* ]; then echo "" >> baseline-results-$(date +%Y%m%d).md echo "### Latency Metrics (from structured data)" >> baseline-results-$(date +%Y%m%d).md # Extract quantile metrics using jq find collected-metrics/ -name "*podLatencyQuantilesMeasurement*" -type f | head -1 | xargs cat | \ jq -r '.[] | select(.quantileName != null) | "- **\(.quantileName)**: P99=\(.P99)ms, P95=\(.P95)ms, P50=\(.P50)ms, Avg=\(.avg)ms, Max=\(.max)ms"' | \ sort >> baseline-results-$(date +%Y%m%d).md # Extract key insights from structured data echo "" >> baseline-results-$(date +%Y%m%d).md echo "## Key Insights" >> baseline-results-$(date +%Y%m%d).md READY_AVG=$(find collected-metrics/ -name "*podLatencyQuantilesMeasurement*" -type f | head -1 | xargs cat | jq -r '.[] | select(.quantileName == "Ready") | .avg') if [ ! -z "$READY_AVG" ] && [ "$READY_AVG" != "null" ]; then READY_AVG_SEC=$(echo "scale=1; $READY_AVG / 1000" | bc 2>/dev/null || awk "BEGIN {print $READY_AVG/1000}") echo "- Average pod startup time is ${READY_AVG_SEC} seconds" >> baseline-results-$(date +%Y%m%d).md fi elif grep -q "99th.*max.*avg" $LATEST_LOG; then # Fallback to log parsing (legacy approach) echo "" >> baseline-results-$(date +%Y%m%d).md echo "### Latency Metrics (from log output)" >> baseline-results-$(date +%Y%m%d).md grep -E "(Ready|PodScheduled|ContainersReady|Initialized).*99th.*max.*avg" $LATEST_LOG | \ sed 's/.*baseline-workload: /- **/' | \ sed 's/ 50th:/ P50:/' | \ sed 's/ 99th:/ P99:/' | \ sed 's/ max:/ Max:/' | \ sed 's/ avg:/ Avg:/' | \ sed 's/$/ms**/' >> baseline-results-$(date +%Y%m%d).md # Extract key insights from log echo "" >> baseline-results-$(date +%Y%m%d).md echo "## Key Insights" >> baseline-results-$(date +%Y%m%d).md READY_AVG=$(grep "Ready.*avg:" $LATEST_LOG | grep -o "avg: [0-9]*" | cut -d' ' -f2) if [ ! -z "$READY_AVG" ]; then READY_AVG_SEC=$(echo "scale=1; $READY_AVG / 1000" | bc 2>/dev/null || awk "BEGIN {print $READY_AVG/1000}") echo "- Average pod startup time is ${READY_AVG_SEC} seconds" >> baseline-results-$(date +%Y%m%d).md fi else echo "- No latency metrics found. Check if the test completed successfully." >> baseline-results-$(date +%Y%m%d).md echo "- Last few lines of log:" >> baseline-results-$(date +%Y%m%d).md echo '```' >> baseline-results-$(date +%Y%m%d).md tail -5 $LATEST_LOG >> baseline-results-$(date +%Y%m%d).md echo '```' >> baseline-results-$(date +%Y%m%d).md fi # Display the results cat baseline-results-$(date +%Y%m%d).md
-
Clean up the test resources (optional):
# Remove all baseline test namespaces oc get namespaces | grep baseline-workload | awk '{print $1}' | xargs -r oc delete namespace # Verify cleanup oc get namespaces | grep baseline-workload || echo "Cleanup completed successfully"
Using Educational Analysis Scripts
The workshop provides educational Python scripts to help you understand and analyze your baseline metrics.
-
Baseline Analyzer - Simplified analysis with educational explanations:
# Run the baseline analyzer with educational output python3 ~/low-latency-performance-workshop/scripts/module03-baseline-analyzer.py \ --metrics-dir ~/kube-burner-configs # Generate a detailed report python3 ~/low-latency-performance-workshop/scripts/module03-baseline-analyzer.py \ --metrics-dir ~/kube-burner-configs \ --report
This script provides:
-
Educational explanations of baseline performance concepts
-
Interpretation of P50, P95, P99 percentiles
-
Guidance on what the metrics mean for your cluster
-
Next steps in the workshop journey
-
-
Metrics Explainer - Interactive learning tool for performance metrics:
# Learn about percentiles python3 ~/low-latency-performance-workshop/scripts/module03-metrics-explainer.py \ --topic percentiles # Understand why P99 matters python3 ~/low-latency-performance-workshop/scripts/module03-metrics-explainer.py \ --topic p99 # Learn about latency types python3 ~/low-latency-performance-workshop/scripts/module03-metrics-explainer.py \ --topic latency # Take an interactive quiz python3 ~/low-latency-performance-workshop/scripts/module03-metrics-explainer.py \ --topic quiz # See all topics python3 ~/low-latency-performance-workshop/scripts/module03-metrics-explainer.py \ --topic all
This interactive tool helps you understand:
-
What percentiles are and why they matter
-
The difference between P50, P95, and P99
-
Why P99 is critical for low-latency systems
-
Different types of latency in Kubernetes
-
How to interpret performance metrics
-
Understanding Your Results
The baseline test measures several key metrics that are critical for low-latency applications:
-
Pod Creation Latency: Time from API request to pod ready state
-
50th Percentile (P50): Median latency - half of requests complete faster
-
95th Percentile (P95): 95% of requests complete within this time
-
99th Percentile (P99): 99% of requests complete within this time - critical for SLA compliance
On an untuned cluster, you might see pod creation latencies like: * P50: ~2-5 seconds * P95: ~8-15 seconds * P99: ~15-30 seconds
These baseline metrics will serve as your reference point. In subsequent modules, we’ll implement various performance optimizations and measure their impact against these baseline numbers.
Document these baseline metrics carefully - they represent your cluster’s current performance characteristics and will help you: - Identify performance bottlenecks - Measure optimization effectiveness - Set realistic performance targets - Validate tuning changes
Module Summary
In this module, you have successfully:
✅ Verified cluster readiness for performance testing ✅ Installed kube-burner performance testing tool ✅ Created baseline test configurations for pod creation latency ✅ Executed baseline performance tests to measure current cluster performance ✅ Analyzed test results and documented baseline metrics ✅ Established a reference point for measuring future optimizations
In Module 4, we’ll begin implementing core performance tuning optimizations on this same target cluster, including: - Performance Profiles for CPU isolation (using built-in Node Tuning Operator) - HugePages configuration - Real-time kernel enablement - Node tuning optimizations
These optimizations should significantly improve the latency metrics you’ve just measured in your baseline tests.
Cluster Context for Next Modules: Stay connected to your target cluster for the remaining workshop modules. All performance tuning will be applied to this cluster, and you’ll run comparison tests here to measure improvements against your baseline metrics. If you need to switch between hub and target clusters: - Hub cluster: For RHACM management and ArgoCD operations - Target cluster: For performance testing and tuning implementation |