Module 4: Core Performance Tuning with Performance Profiles

Module Overview

This module is the heart of the workshop, where you’ll apply node-level performance tuning using OpenShift’s Performance Profile Controller. You’ll configure CPU isolation, HugePages allocation, and real-time kernel settings to dramatically improve latency characteristics.

Learning Objectives

Understand Performance Profiles and their components
Configure CPU isolation for high-performance workloads
Allocate and manage HugePages for reduced memory latency
Apply real-time kernel tuning profiles
Measure the performance improvements from your optimizations

Prerequisites

Before starting this module, ensure you have completed:

Module 1: Workshop Setup and Multi-Cluster Management
Module 2: GitOps Deployment and Operator Installation
Module 3: Baseline Performance Measurement and Analysis
Established baseline performance metrics using kube-burner from Module 3

Understanding Performance Profiles

The Performance Profile Controller (PPC) is integrated into the Node Tuning Operator and provides a declarative way to configure multiple low-latency optimizations through a single custom resource.

Key Components

CPU Management: Isolate specific CPUs for high-performance workloads
Memory Tuning: Configure HugePages and memory-related kernel parameters
Kernel Tuning: Apply real-time kernel settings and tuned profiles
NUMA Awareness: Optimize for Non-Uniform Memory Access architectures

Performance Profile Benefits

Single Configuration: One CR manages multiple complex tuning parameters
Declarative: Version-controlled, repeatable configuration
Node Pool Isolation: Apply tuning to specific worker nodes only
Rolling Updates: Orchestrated updates with minimal disruption

Hands-on Exercise: Creating a Performance Profile

Step 1: Identify Target Nodes and Environment

First, we’ll identify your cluster architecture and configure the appropriate nodes for performance tuning.

Switch to your target cluster context:
```
oc config use-context target-cluster
```

Detect your cluster architecture:

# Check cluster architecture
TOTAL_NODES=$(oc get nodes --no-headers | wc -l)
WORKER_NODES=$(oc get nodes -l node-role.kubernetes.io/worker= --no-headers | wc -l)
MASTER_NODES=$(oc get nodes -l node-role.kubernetes.io/master= --no-headers | wc -l)

echo "=== Cluster Architecture Detection ==="
echo "Total nodes: $TOTAL_NODES"
echo "Master nodes: $MASTER_NODES"
echo "Worker nodes: $WORKER_NODES"

if [ $TOTAL_NODES -eq 1 ]; then
    echo "✅ Detected: Single Node OpenShift (SNO)"
    CLUSTER_TYPE="SNO"
    TARGET_NODE=$(oc get nodes -o jsonpath='{.items[0].metadata.name}')
    echo "Target node: $TARGET_NODE"
elif [ $WORKER_NODES -gt 0 ]; then
    echo "✅ Detected: Multi-node cluster with dedicated workers"
    CLUSTER_TYPE="MULTI_NODE"
    TARGET_NODE=$(oc get nodes -l node-role.kubernetes.io/worker= -o jsonpath='{.items[0].metadata.name}')
    echo "Target worker node: $TARGET_NODE"
else
    echo "⚠️  Detected: Multi-node cluster without dedicated workers"
    CLUSTER_TYPE="MULTI_MASTER"
    TARGET_NODE=$(oc get nodes -l node-role.kubernetes.io/master= -o jsonpath='{.items[0].metadata.name}')
    echo "Target master node: $TARGET_NODE"
fi

echo "CLUSTER_TYPE=$CLUSTER_TYPE" > /tmp/cluster-config
echo "TARGET_NODE=$TARGET_NODE" >> /tmp/cluster-config

Check node resources and CPU topology:

# Load cluster configuration
source /tmp/cluster-config

# Get detailed node information
echo "=== Node Information for $TARGET_NODE ==="
oc describe node $TARGET_NODE | grep -E "(Name:|Roles:|Capacity:|Allocatable:)" -A 10

# Check CPU information
echo "=== CPU Topology ==="
oc debug node/$TARGET_NODE -- chroot /host lscpu | grep -E "(CPU\(s\)|Thread|Core|Socket|NUMA)"

Configure node labels based on cluster type:

# Load cluster configuration
source /tmp/cluster-config

if [ "$CLUSTER_TYPE" = "SNO" ]; then
    echo "=== SNO Configuration ==="
    echo "✅ Single Node OpenShift detected - no additional labeling needed"
    echo "Performance Profile will target: node-role.kubernetes.io/master"

elif [ "$CLUSTER_TYPE" = "MULTI_NODE" ]; then
    echo "=== Multi-Node Configuration ==="
    echo "🏷️  Labeling worker node for performance tuning..."

    # Label the worker node(s) for performance tuning
    oc label node $TARGET_NODE node-role.kubernetes.io/worker-rt= --overwrite

    # Verify the label
    echo "✅ Labeled nodes:"
    oc get nodes --show-labels | grep worker-rt

else
    echo "=== Multi-Master Configuration ==="
    echo "🏷️  Labeling master node for performance tuning..."

    # Label a master node for performance tuning (advanced scenario)
    oc label node $TARGET_NODE node-role.kubernetes.io/master-rt= --overwrite

    # Verify the label
    echo "✅ Labeled nodes:"
    oc get nodes --show-labels | grep master-rt
fi

Step 2: Create Machine Config Pool (Multi-Node Only)

For multi-node clusters, create a dedicated Machine Config Pool to isolate performance configuration changes. SNO clusters can skip this step.

Create Machine Config Pool based on cluster type:

# Load cluster configuration
source /tmp/cluster-config

if [ "$CLUSTER_TYPE" = "SNO" ]; then
    echo "=== SNO Configuration ==="
    echo "✅ Skipping Machine Config Pool creation for Single Node OpenShift"
    echo "📝 SNO uses existing 'master' Machine Config Pool"
    echo ""
    echo "Existing Machine Config Pools:"
    oc get mcp

elif [ "$CLUSTER_TYPE" = "MULTI_NODE" ]; then
    echo "=== Multi-Node Configuration ==="
    echo "🔧 Creating dedicated worker-rt Machine Config Pool..."

    cat << 'EOF' | oc apply -f -
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker-rt
  labels:
    machineconfiguration.openshift.io/role: worker-rt
spec:
  machineConfigSelector:
    matchExpressions:
    - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker, worker-rt]}
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker-rt: ""
  paused: false
EOF

    echo "✅ Created worker-rt Machine Config Pool"

else
    echo "=== Multi-Master Configuration ==="
    echo "🔧 Creating dedicated master-rt Machine Config Pool..."

    cat << 'EOF' | oc apply -f -
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: master-rt
  labels:
    machineconfiguration.openshift.io/role: master-rt
spec:
  machineConfigSelector:
    matchExpressions:
    - {key: machineconfiguration.openshift.io/role, operator: In, values: [master, master-rt]}
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/master-rt: ""
  paused: false
EOF

    echo "✅ Created master-rt Machine Config Pool"
fi

Verify Machine Config Pool status:

# Load cluster configuration
source /tmp/cluster-config

echo "=== Machine Config Pool Status ==="
if [ "$CLUSTER_TYPE" = "SNO" ]; then
    echo "📋 SNO uses existing pools:"
    oc get mcp
else
    echo "📋 All Machine Config Pools:"
    oc get mcp

    if [ "$CLUSTER_TYPE" = "MULTI_NODE" ]; then
        echo ""
        echo "📊 Worker-RT Pool Details:"
        oc get mcp worker-rt -o yaml | grep -A 5 -B 5 "readyMachineCount\|updatedMachineCount"
    else
        echo ""
        echo "📊 Master-RT Pool Details:"
        oc get mcp master-rt -o yaml | grep -A 5 -B 5 "readyMachineCount\|updatedMachineCount"
    fi
fi

Step 3: Create the Performance Profile

Now we’ll create a Performance Profile that configures CPU isolation, HugePages, and real-time kernel settings optimized for your cluster architecture.

Determine optimal CPU allocation based on your cluster type:

# Run the CPU allocation calculator script
bash ~/low-latency-performance-workshop/scripts/module04-calculate-cpu-allocation.sh

What This Script Does:

Detects CPU count on the target node
Calculates optimal CPU allocation based on cluster type (SNO, Multi-Node, Multi-Master)
Validates CPU ranges to prevent configuration errors
Saves configuration to /tmp/cluster-config for next steps

Benefits of Using the Script:

Error Prevention: Validates CPU ranges before saving
Handles Edge Cases: Prevents division by zero and invalid ranges
Clear Output: Shows allocation strategy and percentages
Reusable: Can be run multiple times to recalculate

The script implements workshop-friendly conservative allocation that preserves cluster functionality while demonstrating performance benefits.

Create the Performance Profile optimized for your cluster architecture:

# Run the Performance Profile creation script
bash ~/low-latency-performance-workshop/scripts/module04-create-performance-profile.sh

What This Script Does:

Loads and validates CPU allocation from previous step
Validates CPU ranges to prevent invalid PerformanceProfile
Determines HugePages allocation based on cluster type
Creates PerformanceProfile with proper node selector
Shows configuration before applying (requires confirmation)

Benefits of Using the Script:

Prevents Errors: Validates CPU ranges before creating PerformanceProfile
Interactive: Shows configuration and asks for confirmation
Clear Output: Displays profile summary and next steps
Error Handling: Provides helpful error messages if creation fails

The script will ask for confirmation before applying the PerformanceProfile.

If you see an error like invalid range "0—2", this means the CPU allocation calculation failed. Run the CPU allocation calculator script again:

bash ~/low-latency-performance-workshop/scripts/module04-calculate-cpu-allocation.sh

Then retry the Performance Profile creation.

Step 4: Monitor the Performance Profile Application

The Performance Profile will trigger a rolling update of your nodes. This process includes installing the real-time kernel and applying all the specified optimizations. The monitoring approach varies by cluster architecture.

Monitor Machine Config Pool status based on cluster type:

# Load cluster configuration
source /tmp/cluster-config

echo "=== Monitoring Performance Profile Application ==="
echo "Cluster type: $CLUSTER_TYPE"
echo "Profile name: $PROFILE_NAME"
echo "Target node: $TARGET_NODE"

if [ "$CLUSTER_TYPE" = "SNO" ]; then
    echo ""
    echo "🔍 SNO Monitoring Strategy:"
    echo "   - Monitor 'master' Machine Config Pool"
    echo "   - Single node will reboot (cluster temporarily unavailable)"
    echo "   - Expect 5-15 minutes for RT kernel installation"
    echo ""
    echo "📊 Current master MCP status:"
    oc get mcp master
    echo ""
    echo "⏱️  Starting continuous monitoring (Ctrl+C to stop):"
    echo "watch 'oc get mcp master; echo; oc get nodes'"

elif [ "$CLUSTER_TYPE" = "MULTI_NODE" ]; then
    echo ""
    echo "🔍 Multi-Node Monitoring Strategy:"
    echo "   - Monitor 'worker-rt' Machine Config Pool"
    echo "   - Worker nodes will reboot sequentially"
    echo "   - Control plane remains available"
    echo ""
    echo "📊 Current worker-rt MCP status:"
    oc get mcp worker-rt
    echo ""
    echo "⏱️  Starting continuous monitoring (Ctrl+C to stop):"
    echo "watch 'oc get mcp worker-rt; echo; oc get nodes'"

else
    echo ""
    echo "🔍 Multi-Master Monitoring Strategy:"
    echo "   - Monitor 'master-rt' Machine Config Pool"
    echo "   - Master nodes will reboot sequentially"
    echo "   - Cluster maintains quorum during updates"
    echo ""
    echo "📊 Current master-rt MCP status:"
    oc get mcp master-rt
    echo ""
    echo "⏱️  Starting continuous monitoring (Ctrl+C to stop):"
    echo "watch 'oc get mcp master-rt; echo; oc get nodes'"
fi

Monitor node updates in detail:

# Load cluster configuration
source /tmp/cluster-config

echo "=== Detailed Node Update Monitoring ==="

# Check machine config daemon status
echo "📋 Machine Config Daemon Pods:"
oc get pods -n openshift-machine-config-operator | grep daemon

echo ""
echo "📋 Recent Node Events:"
oc get events --sort-by='.lastTimestamp' --field-selector involvedObject.name=$TARGET_NODE | tail -10

echo ""
echo "📋 Machine Config Status:"
if [ "$CLUSTER_TYPE" = "SNO" ]; then
    oc describe mcp master | grep -A 10 -B 5 "Conditions:"
elif [ "$CLUSTER_TYPE" = "MULTI_NODE" ]; then
    oc describe mcp worker-rt | grep -A 10 -B 5 "Conditions:"
else
    oc describe mcp master-rt | grep -A 10 -B 5 "Conditions:"
fi

Wait for the update to complete:

# Load cluster configuration
source /tmp/cluster-config

echo "=== Waiting for Performance Profile Application ==="
echo "This process may take 10-20 minutes depending on your environment"

if [ "$CLUSTER_TYPE" = "SNO" ]; then
    echo ""
    echo "⏳ SNO Update Process:"
    echo "   1. Machine config generation"
    echo "   2. Node cordoning and draining"
    echo "   3. RT kernel installation and reboot"
    echo "   4. Node rejoin and ready state"
    echo ""
    echo "🔄 Waiting for master MCP to be updated..."
    oc wait --for=condition=Updated mcp/master --timeout=1200s

    echo "🔄 Waiting for node to be ready after reboot..."
    oc wait --for=condition=Ready node/$TARGET_NODE --timeout=600s

elif [ "$CLUSTER_TYPE" = "MULTI_NODE" ]; then
    echo ""
    echo "⏳ Multi-Node Update Process:"
    echo "   1. Worker-RT MCP configuration"
    echo "   2. Sequential worker node updates"
    echo "   3. RT kernel installation per node"
    echo "   4. Node rejoin and ready state"
    echo ""
    echo "🔄 Waiting for worker-rt MCP to be updated..."
    oc wait --for=condition=Updated mcp/worker-rt --timeout=1200s

    echo "🔄 Waiting for target node to be ready..."
    oc wait --for=condition=Ready node/$TARGET_NODE --timeout=600s

else
    echo ""
    echo "⏳ Multi-Master Update Process:"
    echo "   1. Master-RT MCP configuration"
    echo "   2. Sequential master node updates"
    echo "   3. RT kernel installation per node"
    echo "   4. Node rejoin and ready state"
    echo ""
    echo "🔄 Waiting for master-rt MCP to be updated..."
    oc wait --for=condition=Updated mcp/master-rt --timeout=1200s

    echo "🔄 Waiting for target node to be ready..."
    oc wait --for=condition=Ready node/$TARGET_NODE --timeout=600s
fi

echo ""
echo "✅ Performance Profile application completed!"
echo "📊 Final status:"
oc get nodes
oc get mcp

echo ""
echo "🧪 Testing cluster functionality after performance tuning..."

# Create a simple test pod to verify the cluster is still functional
cat << EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: post-tuning-test
  namespace: default
  labels:
    app: post-tuning-test
spec:
  containers:
  - name: test-container
    image: registry.redhat.io/ubi8/ubi-minimal:latest
    command: ["sleep"]
    args: ["30"]
    resources:
      requests:
        memory: "32Mi"
        cpu: "50m"
      limits:
        memory: "64Mi"
        cpu: "100m"
  restartPolicy: Never
EOF

# Wait for pod to be scheduled and running
echo "⏱️  Waiting for test pod to verify cluster functionality..."
if oc wait --for=condition=Ready pod/post-tuning-test --timeout=120s -n default 2>/dev/null; then
    echo "   ✅ Cluster is functional after performance tuning!"
    echo "   📍 Test pod scheduled successfully"
    oc delete pod post-tuning-test -n default --ignore-not-found=true
else
    echo "   ⚠️  Cluster may have scheduling issues after performance tuning"
    echo "   💡 Consider using the revert script if problems persist"
    echo "   🔍 Check pod status: oc describe pod post-tuning-test -n default"
fi

Step 5: Verify Performance Profile Effects

Once the update is complete, verify that all the performance optimizations have been applied correctly across your cluster architecture.

Comprehensive verification using Python health check script:

echo "=== Performance Profile Verification ==="
echo "Running comprehensive cluster health check..."
echo ""

# Use Python script for thorough verification
python3 ~/low-latency-performance-workshop/scripts/module04-cluster-health-check.py

echo ""
echo "✅ Comprehensive verification completed!"
echo ""
echo "💡 The health check script validates:"
echo "   - Cluster architecture detection"
echo "   - Performance Profile status"
echo "   - Real-time kernel installation"
echo "   - CPU isolation configuration"
echo "   - Pod scheduling functionality"

Get a quick performance tuning summary:

echo "=== Performance Tuning Summary ==="
echo ""

# Get color-coded summary of current performance settings
python3 ~/low-latency-performance-workshop/scripts/module04-performance-summary.py

echo ""
echo "💡 This summary shows:"
echo "   - Current CPU allocation strategy"
echo "   - Performance vs stability balance"
echo "   - Recommendations for optimization"

Detailed Performance Tuning Validation

The workshop provides additional scripts to help with CPU allocation, Performance Profile creation, and validation.

CPU Allocation Calculator - Calculate optimal CPU allocation:
```
# Calculate CPU allocation based on cluster type
bash ~/low-latency-performance-workshop/scripts/module04-calculate-cpu-allocation.sh
```
This script:
- Detects CPU count on target node
- Calculates optimal reserved/isolated CPU allocation
- Validates CPU ranges to prevent errors
- Saves configuration for Performance Profile creation
Performance Profile Creator - Create validated Performance Profile:
```
# Create Performance Profile with validated configuration
bash ~/low-latency-performance-workshop/scripts/module04-create-performance-profile.sh
```
This script:
- Validates CPU allocation from previous step
- Prevents invalid CPU range errors
- Shows configuration before applying
- Requires confirmation for safety
Performance Tuning Validator - Comprehensive validation of Performance Profile:
```
# Validate Performance Profile configuration
python3 ~/low-latency-performance-workshop/scripts/module04-tuning-validator.py

# Skip educational explanations for automation
python3 ~/low-latency-performance-workshop/scripts/module04-tuning-validator.py --skip-explanation
```
This script validates:
- Performance Profile existence and configuration
- Machine Config Pool (MCP) status and readiness
- Real-Time kernel installation on target nodes
- Overall tuning configuration health

CPU Isolation Checker - Detailed CPU allocation analysis:

# Check CPU isolation configuration
python3 ~/low-latency-performance-workshop/scripts/module04-cpu-isolation-checker.py

# Specify a custom Performance Profile
python3 ~/low-latency-performance-workshop/scripts/module04-cpu-isolation-checker.py \
    --profile my-performance-profile

This script provides:

Visual representation of CPU allocation
Reserved vs isolated CPU validation
CPU allocation strategy explanation
Best practices for CPU isolation
Configuration recommendations

HugePages Validator - HugePages configuration verification:

# Validate HugePages configuration
python3 ~/low-latency-performance-workshop/scripts/module04-hugepages-validator.py

# Check specific Performance Profile
python3 ~/low-latency-performance-workshop/scripts/module04-hugepages-validator.py \
    --profile my-performance-profile

This script validates:

HugePages configuration in Performance Profile
Node HugePages allocation and availability
HugePages benefits and use cases
How to use HugePages in pod specifications
Total HugePages memory allocation

These validation scripts are educational tools that help you understand:

What each performance tuning component does
How to verify configurations are correct
Best practices for low-latency tuning
Troubleshooting common issues

Run them after applying Performance Profiles to ensure everything is configured correctly.

Performance Testing: Measuring Improvements

Now let’s run the same baseline test to measure the performance improvements from our optimizations.

Step 6: Re-run Performance Tests

Re-run the kube-burner performance test on the optimized cluster:

cd ~/kube-burner-configs

# Load cluster configuration
source /tmp/cluster-config

echo "=== Creating Tuned Performance Test Configuration ==="
echo "Cluster type: $CLUSTER_TYPE"
echo "Profile name: $PROFILE_NAME"
echo "Target node: $TARGET_NODE"

# Create a new test configuration for the tuned cluster
cat > tuned-config.yml << EOF
global:
  measurements:
    - name: podLatency
      thresholds:
        - conditionType: Ready
          metric: P99
          threshold: 15000ms  # Expect better performance after tuning

metricsEndpoints:
  - indexer:
      type: local
      metricsDirectory: collected-metrics-tuned

jobs:
  - name: tuned-workload
    jobType: create
    jobIterations: 20
    namespace: tuned-workload
    namespacedIterations: true
    cleanup: false
    podWait: false
    waitWhenFinished: true
    verifyObjects: true
    errorOnVerify: false
    objects:
      - objectTemplate: tuned-pod.yml
        replicas: 5
        inputVars:
          containerImage: registry.redhat.io/ubi8/ubi:latest
EOF

# Create a tuned pod template with appropriate node selector
if [ "$CLUSTER_TYPE" = "SNO" ]; then
    NODE_SELECTOR_YAML='nodeSelector:
    node-role.kubernetes.io/master: ""'
    echo "📝 SNO: Using master node selector for pod placement"
elif [ "$CLUSTER_TYPE" = "MULTI_NODE" ]; then
    NODE_SELECTOR_YAML='nodeSelector:
    node-role.kubernetes.io/worker-rt: ""'
    echo "📝 Multi-Node: Using worker-rt node selector for pod placement"
else
    NODE_SELECTOR_YAML='nodeSelector:
    node-role.kubernetes.io/master-rt: ""'
    echo "📝 Multi-Master: Using master-rt node selector for pod placement"
fi

cat > tuned-pod.yml << EOF
apiVersion: v1
kind: Pod
metadata:
  name: tuned-pod-{{.Iteration}}-{{.Replica}}
  labels:
    app: tuned-test
    iteration: "{{.Iteration}}"
    cluster-type: "$CLUSTER_TYPE"
spec:
  $NODE_SELECTOR_YAML
  containers:
  - name: tuned-container
    image: {{.containerImage}}
    command: ["sleep"]
    args: ["300"]
    resources:
      requests:
        memory: "64Mi"
        cpu: "100m"
      limits:
        memory: "128Mi"
        cpu: "200m"
  restartPolicy: Never
EOF

echo ""
echo "🚀 Running Performance Test on Tuned Cluster"
echo "   - Test configuration: tuned-config.yml"
echo "   - Pod template: tuned-pod.yml"
echo "   - Target: $CLUSTER_TYPE cluster with $PROFILE_NAME"
echo ""

# Run the performance test
kube-burner init -c tuned-config.yml --log-level=info

echo ""
echo "✅ Tuned performance test completed!"
echo "📊 Results stored in: collected-metrics-tuned/"

Analyze the tuned performance results using Python script:

cd ~/kube-burner-configs

echo "🔍 Analyzing tuned performance results..."
echo ""

# Use Python script for clean, color-coded analysis
python3 ~/low-latency-performance-workshop/scripts/analyze-performance.py --single collected-metrics-tuned

echo ""
echo "✅ Tuned performance analysis completed!"
echo ""
echo "💡 The enhanced analysis provides:"
echo "   � Educational context explaining what each metric means"
echo "   🎯 Performance explanations (why some metrics are slower)"
echo "   🚀 Color-coded results: Excellent/Good/Needs Attention"
echo "   📋 Suggested next steps based on your results"

Compare results with your baseline using Python analysis:

cd ~/kube-burner-configs

echo "📊 Comparing baseline vs tuned performance..."
echo ""

# Use module-specific analysis script for clean Module 4 results
echo "🎯 Module 4 Focused Analysis (Container Performance Only)..."
python3 ~/low-latency-performance-workshop/scripts/module-specific-analysis.py 4

echo ""
echo "📄 Generating Module 4 Performance Report..."
echo "   🎯 Focus: Container performance optimization (baseline vs tuned)"
echo "   📊 Scope: CPU isolation and HugePages impact on pod startup"
echo ""

# Generate Module 4 specific markdown report (baseline vs tuned only)
REPORT_FILE="module4-performance-comparison-$(date +%Y%m%d-%H%M).md"
python3 ~/low-latency-performance-workshop/scripts/analyze-performance.py \
    --baseline collected-metrics \
    --tuned collected-metrics-tuned \
    --report "$REPORT_FILE"

echo ""
echo "💡 Module 4 Analysis Scope:"
echo "   ✅ Baseline container performance (from Module 3)"
echo "   ✅ Tuned container performance (from Module 4)"
echo "   ⚠️  Note: If VMI data exists from Module 5, it will also be shown"
echo "   🎯 Focus on the 'Performance Comparison' section for Module 4 results"
echo "   ℹ️  Comprehensive analysis across all modules happens in Module 6"

echo ""
echo "📚 How to Read the Analysis:"
echo "   1. Individual test sections show raw performance data"
echo "   2. 'Performance Comparison' section shows Module 4 improvements"
echo "   3. VMI data (if shown) is for reference - focus on container metrics"

echo ""
echo "📊 Performance Comparison Summary:"
echo "=================================="
if [ -f "$REPORT_FILE" ]; then
    # Display key sections of the report
    head -30 "$REPORT_FILE"
    echo ""
    echo "📄 Full report available at: $REPORT_FILE"
else
    echo "⚠️  Report generation failed - check if both baseline and tuned metrics exist"
fi

echo ""
echo "✅ Performance comparison completed!"
echo ""
echo "💡 The comparison analysis explains:"
echo "   � What P99/P95/P50 improvements mean for your workloads"
echo "   🎯 Why scheduling became instant (0ms) with CPU isolation"
echo "   ⚖️ Why container operations may be slower (expected trade-off)"
echo "   🏆 Overall assessment of your performance tuning effectiveness"

Step 7: Validate kube-burner Test Pods and Cluster Stability

Before proceeding, let’s validate that our performance tuning doesn’t interfere with normal cluster operations and that kube-burner can successfully create and schedule pods.

Check the status of kube-burner test pods:

# Load cluster configuration
source /tmp/cluster-config

echo "=== Validating kube-burner Test Pod Scheduling ==="
echo "Cluster type: $CLUSTER_TYPE"
echo "Target node: $TARGET_NODE"

# Check if tuned test pods were created successfully
echo ""
echo "📋 Tuned Test Pod Status:"
TUNED_PODS=$(oc get pods -A -l app=tuned-test --no-headers 2>/dev/null | wc -l)
if [ $TUNED_PODS -gt 0 ]; then
    echo "   ✅ Found $TUNED_PODS tuned test pods"
    echo ""
    echo "   📊 Pod Distribution by Node:"
    oc get pods -A -l app=tuned-test -o wide --no-headers | awk '{print $8}' | sort | uniq -c | sed 's/^/      /'

    echo ""
    echo "   📊 Pod Status Summary:"
    oc get pods -A -l app=tuned-test --no-headers | awk '{print $4}' | sort | uniq -c | sed 's/^/      /'

    # Check for any failed pods
    FAILED_PODS=$(oc get pods -A -l app=tuned-test --no-headers | grep -v "Running\|Completed" | wc -l)
    if [ $FAILED_PODS -gt 0 ]; then
        echo ""
        echo "   ⚠️  Found $FAILED_PODS pods not in Running/Completed state:"
        oc get pods -A -l app=tuned-test --no-headers | grep -v "Running\|Completed" | sed 's/^/      /'
    fi
else
    echo "   ⚠️  No tuned test pods found - this may indicate scheduling issues"
fi

# Check baseline test pods as well
echo ""
echo "📋 Baseline Test Pod Status:"
BASELINE_PODS=$(oc get pods -A -l app=baseline-test --no-headers 2>/dev/null | wc -l)
if [ $BASELINE_PODS -gt 0 ]; then
    echo "   ✅ Found $BASELINE_PODS baseline test pods"
    echo "   📊 Baseline Pod Status Summary:"
    oc get pods -A -l app=baseline-test --no-headers | awk '{print $4}' | sort | uniq -c | sed 's/^/      /'
else
    echo "   ℹ️  No baseline test pods found (may have been cleaned up)"
fi

Test cluster responsiveness and pod scheduling:

# Load cluster configuration
source /tmp/cluster-config

echo "=== Testing Cluster Responsiveness ==="

# Create a simple test pod to verify scheduling works
echo "🧪 Creating test pod to verify cluster functionality..."

cat << EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: cluster-health-test
  namespace: default
  labels:
    app: cluster-health-test
spec:
  containers:
  - name: test-container
    image: registry.redhat.io/ubi8/ubi-minimal:latest
    command: ["sleep"]
    args: ["60"]
    resources:
      requests:
        memory: "32Mi"
        cpu: "50m"
      limits:
        memory: "64Mi"
        cpu: "100m"
  restartPolicy: Never
EOF

# Wait for pod to be scheduled and running
echo "⏱️  Waiting for test pod to start..."
oc wait --for=condition=Ready pod/cluster-health-test --timeout=60s -n default

if [ $? -eq 0 ]; then
    echo "   ✅ Test pod started successfully - cluster is responsive"

    # Check which node it was scheduled on
    TEST_POD_NODE=$(oc get pod cluster-health-test -n default -o jsonpath='{.spec.nodeName}')
    echo "   📍 Test pod scheduled on node: $TEST_POD_NODE"

    # Clean up test pod
    oc delete pod cluster-health-test -n default --ignore-not-found=true
else
    echo "   ❌ Test pod failed to start - cluster may have scheduling issues"
    echo "   🔍 Pod events:"
    oc describe pod cluster-health-test -n default | grep -A 10 "Events:" | sed 's/^/      /'
fi

Step 8: Optional - Revert Performance Tuning for Workshop Stability

If you experience any issues with cluster stability or want to continue with other workshop modules without the aggressive performance tuning, you can revert the changes.

Create a revert script for easy cleanup:

# Load cluster configuration
source /tmp/cluster-config

echo "=== Creating Performance Tuning Revert Script ==="

# Create revert script
cat > ~/revert-performance-tuning.sh << 'EOF'
#!/bin/bash

# Load cluster configuration
if [ -f /tmp/cluster-config ]; then
    source /tmp/cluster-config
    echo "=== Reverting Performance Tuning ==="
    echo "Cluster type: $CLUSTER_TYPE"
    echo "Profile name: $PROFILE_NAME"
    echo "Target node: $TARGET_NODE"
else
    echo "❌ Cluster configuration not found. Cannot revert automatically."
    echo "💡 You can manually delete performance profiles with:"
    echo "   oc get performanceprofile"
    echo "   oc delete performanceprofile <profile-name>"
    exit 1
fi

# Delete the Performance Profile
echo ""
echo "🗑️  Removing Performance Profile: $PROFILE_NAME"
if oc get performanceprofile $PROFILE_NAME >/dev/null 2>&1; then
    oc delete performanceprofile $PROFILE_NAME
    echo "   ✅ Performance Profile deleted"
else
    echo "   ℹ️  Performance Profile not found (may already be deleted)"
fi

# Remove custom Machine Config Pool (if created)
if [ "$CLUSTER_TYPE" = "MULTI_NODE" ]; then
    echo ""
    echo "🗑️  Removing worker-rt Machine Config Pool..."
    if oc get mcp worker-rt >/dev/null 2>&1; then
        oc delete mcp worker-rt
        echo "   ✅ worker-rt Machine Config Pool deleted"
    else
        echo "   ℹ️  worker-rt Machine Config Pool not found"
    fi

    # Remove worker-rt label from nodes
    echo ""
    echo "🏷️  Removing worker-rt labels from nodes..."
    oc label nodes -l node-role.kubernetes.io/worker-rt node-role.kubernetes.io/worker-rt- --ignore-not-found=true

elif [ "$CLUSTER_TYPE" = "MULTI_MASTER" ]; then
    echo ""
    echo "🗑️  Removing master-rt Machine Config Pool..."
    if oc get mcp master-rt >/dev/null 2>&1; then
        oc delete mcp master-rt
        echo "   ✅ master-rt Machine Config Pool deleted"
    else
        echo "   ℹ️  master-rt Machine Config Pool not found"
    fi

    # Remove master-rt label from nodes
    echo ""
    echo "🏷️  Removing master-rt labels from nodes..."
    oc label nodes -l node-role.kubernetes.io/master-rt node-role.kubernetes.io/master-rt- --ignore-not-found=true
fi

echo ""
echo "⏳ Waiting for nodes to revert to standard kernel..."
echo "   💡 This process will take 10-15 minutes as nodes reboot"
echo "   📊 Monitor progress with: watch 'oc get nodes; oc get mcp'"

# Wait for machine config pools to be updated
if [ "$CLUSTER_TYPE" = "SNO" ]; then
    echo "   🔄 Waiting for master MCP to update..."
    oc wait --for=condition=Updated mcp/master --timeout=1200s
elif [ "$CLUSTER_TYPE" = "MULTI_NODE" ]; then
    echo "   🔄 Waiting for worker MCP to update..."
    oc wait --for=condition=Updated mcp/worker --timeout=1200s
else
    echo "   🔄 Waiting for master MCP to update..."
    oc wait --for=condition=Updated mcp/master --timeout=1200s
fi

echo ""
echo "✅ Performance tuning revert completed!"
echo "🔍 Verify with: oc debug node/$TARGET_NODE -- chroot /host uname -r"
echo "   (Should show standard kernel without 'rt')"

EOF

chmod +x ~/revert-performance-tuning.sh

echo "✅ Revert script created: ~/revert-performance-tuning.sh"
echo ""
echo "💡 To revert performance tuning later, run:"
echo "   ~/revert-performance-tuning.sh"
echo ""
echo "⚠️  Note: Reverting will cause nodes to reboot back to standard kernel"

Optional - Run the revert script if needed:

echo "=== Performance Tuning Revert Decision ==="
echo ""
echo "🤔 Do you want to revert the performance tuning now?"
echo ""
echo "✅ Keep Performance Tuning if:"
echo "   - Cluster is stable and responsive"
echo "   - Test pods are scheduling successfully"
echo "   - You want to continue with optimized performance"
echo ""
echo "🔄 Revert Performance Tuning if:"
echo "   - Experiencing cluster stability issues"
echo "   - Pods are failing to schedule"
echo "   - Want to continue workshop with standard settings"
echo ""
echo "💡 You can always re-apply performance tuning later by re-running this module"
echo ""
echo "To revert now, run: ~/revert-performance-tuning.sh"
echo "To keep current settings, continue to the next step"

Expected Improvements

With proper performance tuning, you should see significant improvements:

Typical Improvements

Pod Creation P99: 50-70% reduction in latency
Pod Creation P95: 40-60% reduction in latency
Consistency: Much lower variance between P50 and P99
Jitter Reduction: More predictable response times

Performance Factors

CPU Isolation: Eliminates interference from system processes
Real-time Kernel: Provides deterministic scheduling
HugePages: Reduces memory management overhead
NUMA Optimization: Ensures local memory access

Troubleshooting Common Issues

This section provides architecture-specific troubleshooting guidance for common Performance Profile issues.

Architecture-Specific Troubleshooting

Node Not Updating

If nodes don’t start updating after Performance Profile creation:

# Load cluster configuration
source /tmp/cluster-config

echo "=== Node Update Troubleshooting ==="
echo "Cluster type: $CLUSTER_TYPE"
echo "Target node: $TARGET_NODE"

if [ "$CLUSTER_TYPE" = "SNO" ]; then
    echo ""
    echo "🔍 SNO Troubleshooting:"
    echo "   - Check master Machine Config Pool status"
    oc describe mcp master | grep -A 10 -B 5 "Conditions:"

    echo ""
    echo "   - Check for conflicting machine configs:"
    oc get mc | grep master

elif [ "$CLUSTER_TYPE" = "MULTI_NODE" ]; then
    echo ""
    echo "🔍 Multi-Node Troubleshooting:"
    echo "   - Check worker-rt Machine Config Pool status"
    oc describe mcp worker-rt | grep -A 10 -B 5 "Conditions:"

    echo ""
    echo "   - Check for conflicting machine configs:"
    oc get mc | grep worker-rt

else
    echo ""
    echo "🔍 Multi-Master Troubleshooting:"
    echo "   - Check master-rt Machine Config Pool status"
    oc describe mcp master-rt | grep -A 10 -B 5 "Conditions:"

    echo ""
    echo "   - Check for conflicting machine configs:"
    oc get mc | grep master-rt
fi

echo ""
echo "📋 Performance Profile Status:"
oc describe performanceprofile $PROFILE_NAME | grep -A 10 -B 5 "Status:"

Real-time Kernel Issues

If the RT kernel fails to install:

# Load cluster configuration
source /tmp/cluster-config

echo "=== Real-Time Kernel Troubleshooting ==="
echo "Target node: $TARGET_NODE"

# Check node events for errors
echo "📋 Recent Node Events:"
oc get events --sort-by='.lastTimestamp' --field-selector involvedObject.name=$TARGET_NODE | tail -15

echo ""
echo "📋 Machine Config Daemon Logs:"
MCD_POD=$(oc get pods -n openshift-machine-config-operator -l k8s-app=machine-config-daemon --field-selector spec.nodeName=$TARGET_NODE -o jsonpath='{.items[0].metadata.name}')
if [ -n "$MCD_POD" ]; then
    echo "   MCD Pod: $MCD_POD"
    oc logs -n openshift-machine-config-operator $MCD_POD --tail=20
else
    echo "   ❌ No MCD pod found for node $TARGET_NODE"
fi

echo ""
echo "📋 RT Kernel Package Availability:"
oc debug node/$TARGET_NODE -- chroot /host sh -c 'yum list available | grep kernel-rt || dnf list available | grep kernel-rt' 2>/dev/null || echo "   Unable to check RT kernel packages"

echo ""
echo "📋 Current Kernel Information:"
oc debug node/$TARGET_NODE -- chroot /host uname -a 2>/dev/null || echo "   Unable to check current kernel"

HugePages Not Allocated

If HugePages aren’t configured correctly:

# Load cluster configuration
source /tmp/cluster-config

echo "=== HugePages Troubleshooting ==="
echo "Target node: $TARGET_NODE"

# Check available memory
echo "📋 Memory Information:"
oc debug node/$TARGET_NODE -- chroot /host free -h 2>/dev/null || echo "   Unable to check memory"

echo ""
echo "📋 HugePages Configuration:"
oc debug node/$TARGET_NODE -- chroot /host cat /proc/meminfo 2>/dev/null | grep -i huge || echo "   No HugePages information found"

echo ""
echo "📋 HugePages Mount Points:"
oc debug node/$TARGET_NODE -- chroot /host mount 2>/dev/null | grep huge || echo "   No HugePages mount points found"

echo ""
echo "📋 Kernel Command Line (HugePages args):"
oc debug node/$TARGET_NODE -- chroot /host cat /proc/cmdline 2>/dev/null | grep -o 'hugepages[^[:space:]]*' || echo "   No HugePages kernel arguments found"

echo ""
echo "📋 Performance Profile HugePages Spec:"
oc get performanceprofile $PROFILE_NAME -o jsonpath='{.spec.hugepages}' | jq '.' 2>/dev/null || echo "   Unable to read Performance Profile HugePages spec"

Performance Profile Not Applied

If the Performance Profile exists but optimizations aren’t applied:

# Load cluster configuration
source /tmp/cluster-config

echo "=== Performance Profile Application Troubleshooting ==="

# Check Performance Profile status
echo "📋 Performance Profile Status:"
oc get performanceprofile $PROFILE_NAME -o yaml | grep -A 20 "status:" || echo "   No status information available"

echo ""
echo "📋 Node Tuning Operator Status:"
oc get pods -n openshift-cluster-node-tuning-operator

echo ""
echo "📋 TuneD Daemon Status:"
TUNED_POD=$(oc get pods -n openshift-cluster-node-tuning-operator -l openshift-app=tuned --field-selector spec.nodeName=$TARGET_NODE -o jsonpath='{.items[0].metadata.name}')
if [ -n "$TUNED_POD" ]; then
    echo "   TuneD Pod: $TUNED_POD"
    oc logs -n openshift-cluster-node-tuning-operator $TUNED_POD --tail=10
else
    echo "   ❌ No TuneD pod found for node $TARGET_NODE"
fi

echo ""
echo "📋 Generated TuneD Profiles:"
oc get tuned -n openshift-cluster-node-tuning-operator

Module Summary

In this module, you have successfully implemented architecture-aware performance tuning:

✅ Detected your cluster architecture (SNO, Multi-Node, or Multi-Master)
✅ Created an optimized Performance Profile tailored to your environment
✅ Configured CPU isolation appropriate for your cluster’s control plane needs
✅ Allocated HugePages sized correctly for your available resources
✅ Applied real-time kernel settings for deterministic scheduling
✅ Verified all performance optimizations are working correctly
✅ Measured significant performance improvements through comparative testing

Architecture-Specific Achievements

Single Node OpenShift (SNO): * Conservative CPU allocation preserving control plane stability * Optimized HugePages configuration for resource-constrained environments * Master node targeting with appropriate performance isolation

Multi-Node Clusters: * Aggressive CPU isolation on dedicated worker nodes * Maximum performance optimization without control plane impact * Dedicated Machine Config Pool for isolated performance tuning

Multi-Master Clusters: * Balanced CPU allocation for control plane and workload performance * Strategic node selection for performance optimization * Maintained cluster stability during rolling updates

Key Takeaways

Performance Profiles adapt automatically to different cluster architectures
CPU isolation strategies must account for control plane requirements
Real-time kernels provide predictable, low-latency scheduling across all architectures
HugePages allocation should be sized appropriately for available resources
Proper architecture-aware tuning can achieve 50-70% latency improvements
SNO environments can achieve exceptional performance due to simplified architecture

Performance Impact Summary

Based on your cluster type, you should observe: * Pod Creation Latency: 50-70% reduction in P99 times * Consistency: Dramatically reduced variance between P50 and P99 * Jitter: More predictable and deterministic response times * Resource Utilization: Optimized CPU and memory usage patterns

Optional: Reverting Performance Tuning for Workshop Continuation

If you need to revert the performance tuning to continue with other workshop modules or if you experience any cluster stability issues, you can easily remove the performance optimizations.

Creating a Revert Script

Create an automated revert script:

# Load cluster configuration
source /tmp/cluster-config

echo "=== Creating Performance Tuning Revert Script ==="

# Create revert script
cat > ~/revert-performance-tuning.sh << 'EOF'
#!/bin/bash

# Load cluster configuration
if [ -f /tmp/cluster-config ]; then
    source /tmp/cluster-config
    echo "=== Reverting Performance Tuning ==="
    echo "Cluster type: $CLUSTER_TYPE"
    echo "Profile name: $PROFILE_NAME"
    echo "Target node: $TARGET_NODE"
else
    echo "❌ Cluster configuration not found. Attempting manual cleanup..."
    PROFILE_NAME=$(oc get performanceprofile -o jsonpath='{.items[0].metadata.name}' 2>/dev/null)
    if [ -z "$PROFILE_NAME" ]; then
        echo "No Performance Profiles found to delete."
        exit 0
    fi
fi

# Delete the Performance Profile
echo ""
echo "🗑️  Removing Performance Profile: $PROFILE_NAME"
if oc get performanceprofile $PROFILE_NAME >/dev/null 2>&1; then
    oc delete performanceprofile $PROFILE_NAME
    echo "   ✅ Performance Profile deleted"
else
    echo "   ℹ️  Performance Profile not found (may already be deleted)"
fi

# Remove custom Machine Config Pool and labels (if created)
if [ "$CLUSTER_TYPE" = "MULTI_NODE" ]; then
    echo ""
    echo "🗑️  Cleaning up worker-rt configuration..."
    oc delete mcp worker-rt --ignore-not-found=true
    oc label nodes -l node-role.kubernetes.io/worker-rt node-role.kubernetes.io/worker-rt- --ignore-not-found=true
    echo "   ✅ Worker-rt configuration cleaned up"

elif [ "$CLUSTER_TYPE" = "MULTI_MASTER" ]; then
    echo ""
    echo "🗑️  Cleaning up master-rt configuration..."
    oc delete mcp master-rt --ignore-not-found=true
    oc label nodes -l node-role.kubernetes.io/master-rt node-role.kubernetes.io/master-rt- --ignore-not-found=true
    echo "   ✅ Master-rt configuration cleaned up"
fi

echo ""
echo "⏳ Waiting for nodes to revert to standard kernel..."
echo "   💡 This process will take 10-15 minutes as nodes reboot"
echo "   📊 Monitor progress with: watch 'oc get nodes; oc get mcp'"

# Wait for machine config pools to be updated
if [ "$CLUSTER_TYPE" = "SNO" ] || [ "$CLUSTER_TYPE" = "MULTI_MASTER" ]; then
    echo "   🔄 Waiting for master MCP to update..."
    oc wait --for=condition=Updated mcp/master --timeout=1200s
elif [ "$CLUSTER_TYPE" = "MULTI_NODE" ]; then
    echo "   🔄 Waiting for worker MCP to update..."
    oc wait --for=condition=Updated mcp/worker --timeout=1200s
fi

echo ""
echo "✅ Performance tuning revert completed!"
echo "🔍 Verify standard kernel with:"
echo "   oc debug node/\$(oc get nodes -o jsonpath='{.items[0].metadata.name}') -- chroot /host uname -r"
echo "   (Should show standard kernel without 'rt')"

EOF

chmod +x ~/revert-performance-tuning.sh

echo "✅ Revert script created: ~/revert-performance-tuning.sh"
echo ""
echo "💡 To revert performance tuning at any time, run:"
echo "   ~/revert-performance-tuning.sh"
echo ""
echo "⚠️  Note: Reverting will cause nodes to reboot back to standard kernel"

When to Use the Revert Script

Revert Performance Tuning if you experience:

Pods failing to schedule or start
Cluster becoming unresponsive
High resource contention
Need to continue with other workshop modules that require standard settings

Keep Performance Tuning if:

Cluster is stable and responsive
Test pods are scheduling successfully
You want to continue with performance-optimized settings
Planning to proceed directly to Module 5 (Virtualization)

Manual Cleanup (Alternative)

If the automated script doesn’t work, you can manually clean up:

# Manual cleanup commands
echo "=== Manual Performance Tuning Cleanup ==="

# List and delete performance profiles
echo "📋 Current Performance Profiles:"
oc get performanceprofile

echo ""
echo "🗑️  Delete Performance Profile:"
echo "oc delete performanceprofile <profile-name>"

# List and delete custom MCPs
echo ""
echo "📋 Current Machine Config Pools:"
oc get mcp

echo ""
echo "🗑️  Delete custom MCP (if created):"
echo "oc delete mcp worker-rt  # or master-rt"

echo ""
echo "💡 After deletion, nodes will automatically reboot to standard kernel"

Next Steps

In Module 5, you will learn how to apply these performance optimizations to OpenShift Virtualization, creating high-performance virtual machines that leverage your tuned infrastructure to achieve near bare-metal latency characteristics. The architecture-aware approach you’ve learned will be essential for optimizing VM placement and resource allocation.

Workshop Flexibility: You can proceed to Module 5 either with or without the performance tuning active. The virtualization module will adapt to your current cluster configuration.