Module 3: Baseline Performance Testing and Measurement
Module Overview
In this module, you will establish baseline performance metrics for your OpenShift cluster before implementing any low-latency optimizations. Understanding your starting point is crucial for measuring the effectiveness of performance tuning efforts.
By the end of this module, you will be able to:
-
Install and configure kube-burner for performance testing
-
Run baseline performance tests to measure pod creation latency
-
Analyze test results to understand current cluster performance
-
Interpret performance metrics including percentiles (P50, P95, P99)
-
Create a performance baseline document for future comparisons
|
Lab Environment
This module requires:
Important: This module should be executed on your target cluster (not the hub cluster), where performance tuning will be applied. |
|
Cluster Access via Bastion SSH to your bastion host (credentials from Module 2) and run |
In this hands-on lab, you’ll use kube-burner, a Kubernetes performance testing tool specifically designed to stress test OpenShift clusters. We’ll focus on measuring pod creation latency, which is a critical metric for applications requiring fast scaling and low startup times.
Connecting to Your Target Cluster
Before beginning performance testing, ensure you’re connected to your SNO cluster via the bastion host (set up in Module 2).
Ensure you completed Module 2 and have:
-
A SNO cluster with operators installed and verified
-
SR-IOV Network Operator installed (if applicable)
-
OpenShift Virtualization installed (if applicable)
-
Built-in Node Tuning Operator verified
Step 1: Verify Cluster Access
-
SSH to your bastion host and verify cluster access:
# Verify you're authenticated oc whoami # Verify cluster nodes oc get nodesThe bastion host has
oc,kubectl, and KUBECONFIG pre-configured. You should seesystem:adminwhen runningoc whoami. -
Verify you’re connected to the correct cluster:
# Confirm cluster identity oc cluster-info # Verify cluster version matches expectations oc version
Step 2: Verify Target Cluster Status
-
Confirm the target cluster is ready for performance testing:
# Check overall cluster health oc get nodes # Verify installed operators from Module 2 oc get csv --all-namespaces | grep -E "(kubevirt|virtualization)" # Check built-in Node Tuning Operator oc get tuned -n openshift-cluster-node-tuning-operator # Check node resources oc describe nodes | grep -E "(Name:|cpu:|memory:|Capacity|Allocatable)"
What is Low-Latency and Why Does it Matter?
Low-latency computing refers to minimizing the delay between an input and its corresponding output. In containerized environments, this translates to:
-
Pod Startup Time: How quickly containers become ready to serve traffic
-
Network Latency: Time for network packets to traverse the cluster
-
Storage I/O Latency: Speed of persistent volume operations
-
API Response Time: Kubernetes API server responsiveness
Low-latency performance is essential for:
-
Financial trading systems requiring microsecond response times
-
Real-time gaming and streaming platforms
-
IoT edge computing applications
-
High-frequency data processing workloads
-
Live video/audio processing systems
Verifying Your Target Cluster Configuration
Now that you’re connected to your SNO cluster, let’s verify it meets the requirements for performance testing and has the components installed from Module 2.
Establishing Baseline Performance Metrics
Before implementing any performance optimizations, we need to establish baseline metrics on your target cluster. This provides a reference point for measuring the effectiveness of our tuning efforts in subsequent modules.
Installing kube-burner for Performance Testing
Kube-burner is a performance testing tool designed specifically for Kubernetes clusters. It can stress-test various aspects of cluster performance.
-
Download and install kube-burner:
# Create a directory for kube-burner mkdir -p ~/kube-burner && cd ~/kube-burner # Download the latest kube-burner binary for Linux curl -L https://github.com/kube-burner/kube-burner/releases/download/v1.17.5/kube-burner-V1.17.5-linux-x86_64.tar.gz -o kube-burner.tar.gz # Extract the binary tar -xzf kube-burner.tar.gz ls ~/kube-burner sudo mv kube-burner /usr/local/bin/ # Verify installation kube-burner version -
Create a directory for kube-burner configuration files:
# Create configuration directory mkdir -p ~/kube-burner-configs && cd ~/kube-burner-configs # Verify current directory pwd -
Create a baseline performance test configuration:
cat > baseline-config.yml << 'EOF' --- global: measurements: - name: podLatency thresholds: - conditionType: Ready metric: P99 threshold: 30000ms metricsEndpoints: - indexer: type: local metricsDirectory: collected-metrics jobs: - name: baseline-workload jobType: create jobIterations: 20 namespace: baseline-workload namespacedIterations: true cleanup: false podWait: false waitWhenFinished: true verifyObjects: true errorOnVerify: false objects: - objectTemplate: pod.yml replicas: 5 inputVars: containerImage: registry.redhat.io/ubi8/ubi:latest EOF -
Create the pod template for the baseline test:
cat > pod.yml << 'EOF' --- apiVersion: v1 kind: Pod metadata: name: baseline-pod-{{.Iteration}}-{{.Replica}} labels: app: baseline-test iteration: "{{.Iteration}}" spec: containers: - name: baseline-container image: {{.containerImage}} securityContext: runAsNonRoot: true seccompProfile: type: "RuntimeDefault" allowPrivilegeEscalation: false capabilities: drop: - ALL command: ["sleep"] args: ["300"] resources: requests: memory: "64Mi" cpu: "50m" limits: memory: "128Mi" cpu: "100m" restartPolicy: Never EOF -
Verify the configuration files:
# List created configuration files ls -la ~/kube-burner-configs/ # Validate YAML syntax cat baseline-config.yml | head -10 cat pod.yml | head -10
Running Baseline Performance Tests
Now let’s execute our baseline performance test to measure the current cluster performance.
-
Execute the baseline performance test using the kube-burner CLI:
# Change to the configuration directory cd ~/kube-burner-configs # Run the baseline test kube-burner init -c baseline-config.yml --log-level=info # The test will create 20 iterations with 5 pods each (100 total pods) # and measure pod creation latency -
Monitor the test progress in a separate terminal:
# Watch pods being created across namespases watch "oc get pods --all-namespaces | grep baseline" # Monitor cluster resource usage oc adm top nodes -
Wait for the test to complete. You should see output similar to:
INFO[2025-09-05T10:30:15Z] 📁 Creating directory: collected-metrics INFO[2025-09-05T10:30:15Z] 🔥 Starting kube-burner with UUID 12345678-1234-1234-1234-123456789abc INFO[2025-09-05T10:30:15Z] 📊 Job baseline-workload: 20 iterations INFO[2025-09-05T10:30:45Z] ✅ Job baseline-workload completed in 30s
Analyzing Baseline Results
-
View the pod latency metrics from the collected data:
# Change to the kube-burner configuration directory cd ~/kube-burner-configs # Check if metrics were collected successfully if [ -d "collected-metrics" ]; then echo "✅ Metrics collected successfully!" echo "" # View the pod latency quantiles (summary metrics) echo "=== Pod Latency Summary ===" find collected-metrics/ -name "*podLatencyQuantilesMeasurement*" -type f | head -1 | xargs cat | jq -r '.[] | select(.quantileName != null) | "\(.quantileName): P99=\(.P99)ms, P95=\(.P95)ms, P50=\(.P50)ms, Avg=\(.avg)ms, Max=\(.max)ms"' | sort echo "" echo "=== Individual Pod Metrics (first 5) ===" find collected-metrics/ -name "*podLatencyMeasurement*" -type f | head -1 | xargs cat | jq -r '.[] | select(.podName != null) | "\(.podName): Ready=\(.podReadyLatency)ms, ContainersReady=\(.containersReadyLatency)ms, Scheduled=\(.schedulingLatency)ms"' | head -5 else echo "❌ No metrics directory found. Checking log output..." LATEST_LOG=$(ls -t kube-burner-*-*-*-*-*.log | head -1) echo "Latest log: $LATEST_LOG" echo "" grep -E "(Ready|PodScheduled|ContainersReady|Initialized).*99th.*max.*avg" $LATEST_LOG || echo "No latency metrics found in log" fi -
Create a baseline results summary:
# Ensure we're in the correct directory cd ~/kube-burner-configs # Get the latest log file and extract UUID LATEST_LOG=$(ls -t kube-burner-*-*-*-*-*.log | head -1) TEST_UUID=$(grep "Finished execution with UUID" $LATEST_LOG | grep -o "[a-f0-9-]*" | tail -1) # Create results summary cat > baseline-results-$(date +%Y%m%d).md << EOF # Baseline Performance Test Results - $(date) ## Test Configuration - **Test Scale**: 100 pods (5 pods × 20 iterations) - **Container Image**: registry.redhat.io/ubi8/ubi:latest - **Test Type**: Pod creation latency measurement - **Test UUID**: $TEST_UUID ## Pod Latency Results EOF # Check for structured metrics first (modern approach) if [ -d "collected-metrics" ] && [ -f "collected-metrics/"*"podLatencyQuantilesMeasurement"* ]; then echo "" >> baseline-results-$(date +%Y%m%d).md echo "### Latency Metrics (from structured data)" >> baseline-results-$(date +%Y%m%d).md # Extract quantile metrics using jq find collected-metrics/ -name "*podLatencyQuantilesMeasurement*" -type f | head -1 | xargs cat | \ jq -r '.[] | select(.quantileName != null) | "- **\(.quantileName)**: P99=\(.P99)ms, P95=\(.P95)ms, P50=\(.P50)ms, Avg=\(.avg)ms, Max=\(.max)ms"' | \ sort >> baseline-results-$(date +%Y%m%d).md # Extract key insights from structured data echo "" >> baseline-results-$(date +%Y%m%d).md echo "## Key Insights" >> baseline-results-$(date +%Y%m%d).md READY_AVG=$(find collected-metrics/ -name "*podLatencyQuantilesMeasurement*" -type f | head -1 | xargs cat | jq -r '.[] | select(.quantileName == "Ready") | .avg') if [ ! -z "$READY_AVG" ] && [ "$READY_AVG" != "null" ]; then READY_AVG_SEC=$(echo "scale=1; $READY_AVG / 1000" | bc 2>/dev/null || awk "BEGIN {print $READY_AVG/1000}") echo "- Average pod startup time is ${READY_AVG_SEC} seconds" >> baseline-results-$(date +%Y%m%d).md fi elif grep -q "99th.*max.*avg" $LATEST_LOG; then # Fallback to log parsing (legacy approach) echo "" >> baseline-results-$(date +%Y%m%d).md echo "### Latency Metrics (from log output)" >> baseline-results-$(date +%Y%m%d).md grep -E "(Ready|PodScheduled|ContainersReady|Initialized).*99th.*max.*avg" $LATEST_LOG | \ sed 's/.*baseline-workload: /- **/' | \ sed 's/ 50th:/ P50:/' | \ sed 's/ 99th:/ P99:/' | \ sed 's/ max:/ Max:/' | \ sed 's/ avg:/ Avg:/' | \ sed 's/$/ms**/' >> baseline-results-$(date +%Y%m%d).md # Extract key insights from log echo "" >> baseline-results-$(date +%Y%m%d).md echo "## Key Insights" >> baseline-results-$(date +%Y%m%d).md READY_AVG=$(grep "Ready.*avg:" $LATEST_LOG | grep -o "avg: [0-9]*" | cut -d' ' -f2) if [ ! -z "$READY_AVG" ]; then READY_AVG_SEC=$(echo "scale=1; $READY_AVG / 1000" | bc 2>/dev/null || awk "BEGIN {print $READY_AVG/1000}") echo "- Average pod startup time is ${READY_AVG_SEC} seconds" >> baseline-results-$(date +%Y%m%d).md fi else echo "- No latency metrics found. Check if the test completed successfully." >> baseline-results-$(date +%Y%m%d).md echo "- Last few lines of log:" >> baseline-results-$(date +%Y%m%d).md echo '```' >> baseline-results-$(date +%Y%m%d).md tail -5 $LATEST_LOG >> baseline-results-$(date +%Y%m%d).md echo '```' >> baseline-results-$(date +%Y%m%d).md fi # Display the results cat baseline-results-$(date +%Y%m%d).md -
Clean up the test resources (optional):
# Remove all baseline test namespaces oc get namespaces | grep baseline-workload | awk '{print $1}' | xargs -r oc delete namespace # Verify cleanup oc get namespaces | grep baseline-workload || echo "Cleanup completed successfully"
Using Educational Analysis Scripts
The workshop provides educational Python scripts to help you understand and analyze your baseline metrics.
-
Baseline Analyzer - Simplified analysis with educational explanations:
# Run the baseline analyzer with educational output python3 ~/low-latency-performance-workshop/scripts/module03-baseline-analyzer.py \ --metrics-dir ~/kube-burner-configs # Generate a detailed report python3 ~/low-latency-performance-workshop/scripts/module03-baseline-analyzer.py \ --metrics-dir ~/kube-burner-configs \ --reportThis script provides:
-
Educational explanations of baseline performance concepts
-
Interpretation of P50, P95, P99 percentiles
-
Guidance on what the metrics mean for your cluster
-
Next steps in the workshop journey
-
-
Metrics Explainer - Interactive learning tool for performance metrics:
# Learn about percentiles python3 ~/low-latency-performance-workshop/scripts/module03-metrics-explainer.py \ --topic percentiles # Understand why P99 matters python3 ~/low-latency-performance-workshop/scripts/module03-metrics-explainer.py \ --topic p99 # Learn about latency types python3 ~/low-latency-performance-workshop/scripts/module03-metrics-explainer.py \ --topic latency # Take an interactive quiz python3 ~/low-latency-performance-workshop/scripts/module03-metrics-explainer.py \ --topic quiz # See all topics python3 ~/low-latency-performance-workshop/scripts/module03-metrics-explainer.py \ --topic allThis interactive tool helps you understand:
-
What percentiles are and why they matter
-
The difference between P50, P95, and P99
-
Why P99 is critical for low-latency systems
-
Different types of latency in Kubernetes
-
How to interpret performance metrics
-
Understanding Your Results
The baseline test measures several key metrics that are critical for low-latency applications:
-
Pod Creation Latency: Time from API request to pod ready state
-
50th Percentile (P50): Median latency - half of requests complete faster
-
95th Percentile (P95): 95% of requests complete within this time
-
99th Percentile (P99): 99% of requests complete within this time - critical for SLA compliance
On an untuned cluster, you might see pod creation latencies like: * P50: ~2-5 seconds * P95: ~8-15 seconds * P99: ~15-30 seconds
These baseline metrics will serve as your reference point. In subsequent modules, we’ll implement various performance optimizations and measure their impact against these baseline numbers.
Document these baseline metrics carefully - they represent your cluster’s current performance characteristics and will help you: - Identify performance bottlenecks - Measure optimization effectiveness - Set realistic performance targets - Validate tuning changes
Module Summary
In this module, you have successfully:
✅ Verified cluster readiness for performance testing
✅ Installed kube-burner performance testing tool
✅ Created baseline test configurations for pod creation latency
✅ Executed baseline performance tests to measure current cluster performance
✅ Analyzed test results and documented baseline metrics
✅ Established a reference point for measuring future optimizations
In Module 4, we’ll begin implementing core performance tuning optimizations on this same target cluster, including:
- Performance Profiles for CPU isolation (using built-in Node Tuning Operator)
- HugePages configuration
- Real-time kernel enablement
- Node tuning optimizations
These optimizations should significantly improve the latency metrics you’ve just measured in your baseline tests.