Module 5: Low-Latency Virtualization

Module Overview

This module focuses on optimizing virtual machines for low-latency performance using OpenShift Virtualization. You’ll learn how to configure VMs with dedicated CPUs, HugePages, and SR-IOV networking, then validate performance improvements using advanced kube-burner measurements.

Prerequisites

Completed Module 3 (Baseline performance metrics collected)
OpenShift Virtualization operator installed (from Module 2)
Single Node OpenShift (SNO) or multi-node cluster
Baseline performance metrics from Module 3
Optional: Completed Module 4 (Performance Profiles for enhanced VM performance)

Key Learning Objectives

Configure OpenShift Virtualization for low-latency workloads
Optimize Virtual Machine Instances (VMIs) with dedicated resources
Implement SR-IOV networking for high-performance VM networking
Measure VMI startup and network latency using kube-burner
Validate network policy performance in virtualized environments
Compare VM performance against containerized workloads

OpenShift Virtualization Overview

OpenShift Virtualization enables running virtual machines alongside containers on the same OpenShift cluster, providing:

Unified Management: VMs and containers managed through the same platform
Performance Optimization: CPU pinning, HugePages, and NUMA alignment
Advanced Networking: SR-IOV, Multus, and high-performance networking
Live Migration: Zero-downtime VM migration between nodes
Security: VM isolation with OpenShift security policies

Architecture Components

Component	Purpose	Low-Latency Features
KubeVirt	VM management engine	CPU pinning, dedicated resources
Containerized Data Importer (CDI)	VM disk image management	Optimized storage provisioning
Multus CNI	Multiple network interfaces	SR-IOV and high-performance networking
Node Feature Discovery	Hardware capability detection	NUMA topology awareness

Component

Purpose

Low-Latency Features

KubeVirt

VM management engine

CPU pinning, dedicated resources

Containerized Data Importer (CDI)

VM disk image management

Optimized storage provisioning

Multus CNI

Multiple network interfaces

SR-IOV and high-performance networking

Node Feature Discovery

Hardware capability detection

NUMA topology awareness

Verifying OpenShift Virtualization Installation

OpenShift Virtualization was deployed in Module 2 via GitOps. Let’s verify it’s ready for low-latency workloads.

Check if OpenShift Virtualization is installed and ready:

  # Check the HyperConverged operator status
oc get hyperconverged -n openshift-cnv

  # Verify virtualization components are running
oc get pods -n openshift-cnv --field-selector=status.phase=Running | head -10

  # Check if KVM virtualization is available on the cluster
oc get nodes -o jsonpath='{.items[*].status.allocatable.devices\.kubevirt\.io/kvm}' | grep -q "1k" && echo "✅ KVM available on cluster nodes" || echo "❌ KVM not available"

  # Verify the operator CSV status
oc get csv -n openshift-cnv | grep kubevirt-hyperconverged

  # Check available VM templates
echo "Available Fedora VM templates:"
oc get templates -n openshift --field-selector metadata.name=fedora-server-small

Check the cluster environment and available resources:

  # Check cluster node configuration
echo "--- Cluster Node Information ---"
oc get nodes -o wide

  # Check available CPU resources
echo ""
echo "--- CPU Resources ---"
oc debug node/$(oc get nodes -o jsonpath='{.items[0].metadata.name}') -- chroot /host nproc

  # Check if Fedora VM DataSource is available
echo ""
echo "--- Available VM DataSources ---"
oc get datasource -n openshift-virtualization-os-images | grep fedora

Check current performance profile status (may not exist yet):

# Check if performance profile exists (from Module 4)
echo "--- Performance Profile Status ---"
PERF_PROFILES=$(oc get performanceprofile --no-headers 2>/dev/null | wc -l)
if [ "$PERF_PROFILES" -gt 0 ]; then
    echo "✅ Performance profile found:"
    oc get performanceprofile -o custom-columns=NAME:.metadata.name,ISOLATED:.spec.cpu.isolated,RESERVED:.spec.cpu.reserved

    # Get current HugePages from Performance Profile
    echo ""
    echo "--- Current HugePages Configuration ---"
    PROFILE_NAME=$(oc get performanceprofile -o jsonpath='{.items[0].metadata.name}')
    HUGEPAGES_COUNT=$(oc get performanceprofile "$PROFILE_NAME" -o jsonpath='{.spec.hugepages.pages[0].count}' 2>/dev/null || echo "0")
    echo "Performance Profile HugePages: ${HUGEPAGES_COUNT}GB"

    if [ "$HUGEPAGES_COUNT" -lt 8 ]; then
        echo ""
        echo "⚠️  Current HugePages (${HUGEPAGES_COUNT}GB) may be insufficient for Module 5"
        echo "   Module 4 allocates minimal HugePages (1GB) for demonstration"
        echo "   Module 5 needs more HugePages to run multiple VMs"
        echo ""
        echo "💡 Recommended HugePages for Module 5:"
        echo "   • SNO (64GB+ RAM): 8-16GB HugePages"
        echo "   • Multi-Node (64GB+ RAM): 16-24GB HugePages"
        echo ""
        echo "   The next step will update HugePages automatically!"
    else
        echo "✅ HugePages sufficient for Module 5 (${HUGEPAGES_COUNT}GB)"
    fi
else
    echo "⚠️  No performance profile found"
    echo "   This is expected if Module 4 hasn't been completed yet"
    echo "   VMI tests will use default cluster resources"
    echo ""
    echo "💡 Want to see enhanced VM performance?"
    echo "   You can go back to Module 4 to configure performance profiles"
    echo "   This will enable:"
    echo "   • CPU isolation and dedicated CPU placement for VMs"
    echo "   • HugePages for reduced memory latency"
    echo "   • NUMA alignment for optimal performance"
    echo "   • Significant improvement in VMI startup times"
    echo ""
    echo "   After completing Module 4, return here to see the performance difference!"
fi

Understanding HugePages Allocation:

Module 4: Allocates 1GB HugePages (minimal for demonstration)
Module 5: Needs 8-16GB HugePages (for running multiple VMs)

If you see "HugePages may be insufficient", don’t worry! The next step will automatically update HugePages to the optimal amount for your cluster.

Why the difference?

Module 4 focuses on demonstrating performance tuning concepts
Module 5 focuses on running actual VMs with realistic workloads
The scripts automatically handle the transition between modules

Update HugePages allocation for VMI testing (if Performance Profile exists):

# Update HugePages to support multiple VMs
bash ~/low-latency-performance-workshop/scripts/module05-update-hugepages.sh

What This Script Does:

Detects current HugePages allocation from Module 4
Calculates optimal HugePages based on total memory and cluster type
Accounts for VMI overhead: Each VMI needs ~3GB (2GB guest + 1GB virt-launcher)
Updates Performance Profile if more HugePages are needed
Triggers node reboot if changes are required

Why Update HugePages?

Module 4 allocates minimal HugePages (1GB) for demonstration purposes. Module 5 needs more HugePages to run multiple VMs:

1GB HugePages: Only 1 small VM possible
12GB HugePages: 4 VMs with 2GB memory each
24GB HugePages: 8 VMs with 2GB memory each (Module 5 default test)
32GB HugePages: 10+ VMs with 2GB memory each

Updated Allocation Strategy:

SNO (125GB+ RAM): 24GB HugePages (~8 VMIs)
SNO (64-128GB RAM): 24GB HugePages (~8 VMIs)
SNO (32-64GB RAM): 12GB HugePages (~4 VMIs)
Multi-Node (128GB+ worker): 48GB HugePages (~16 VMIs)
Multi-Node (64-128GB worker): 32GB HugePages (~10 VMIs)

The script automatically calculates the optimal allocation for your cluster!

If Node Reboots:

This is expected and required for HugePages changes. Wait 10-15 minutes for the node to come back online, then continue with the next step.

If No Performance Profile:

The script will inform you that no Performance Profile exists and suggest completing Module 4 first for enhanced VM performance.

Validate resources before testing (Important learning step!):

# Validate that your cluster has sufficient resources for VMI testing
bash ~/low-latency-performance-workshop/scripts/module05-validate-vmi-resources.sh

Why This Validation Step is Critical:

This is a key learning opportunity that demonstrates real-world capacity planning for virtualized workloads!

What You’ll Learn:

Resource Calculation: How to calculate VMI memory requirements including overhead
Capacity Planning: How many VMs your cluster can support
Pre-Flight Validation: Why validating resources before deployment prevents failures
Troubleshooting: How to identify and fix resource constraints

What the Script Validates:

HugePages Availability: Checks if sufficient HugePages are allocated
VMI Capacity: Calculates max concurrent VMIs based on available resources
Test Scale Validation: Verifies default test (10 VMIs) will succeed
CPU Isolation: Validates sufficient isolated CPUs for dedicated placement
Recommendations: Provides specific guidance if resources are insufficient

Understanding VMI Memory Requirements:

Each VMI requires more memory than just the guest allocation:

VMI Guest Memory:        2GB  (configured in VMI spec)
virt-launcher overhead:  1GB  (KubeVirt management pod)
────────────────────────────
Total per VMI:           3GB

Example Calculation:

Default test: 10 VMIs × 3GB = 30GB required
Your cluster: 16GB HugePages available
Result: ❌ Insufficient! (need 24GB minimum)

If Validation Fails:

The script will provide specific recommendations:

Option 1: Increase HugePages allocation (recommended)
Option 2: Reduce test scale to match available resources
Option 3: Run without HugePages (reduced performance)

Real-World Application:

This validation process mirrors production capacity planning:

✅ Always validate resources before deploying VMs
✅ Account for overhead (virt-launcher, QEMU, etc.)
✅ Plan for headroom (don’t use 100% of resources)
✅ Monitor and adjust based on actual usage

This is exactly what you’d do in production before deploying VMs!

VM Optimization for Low-Latency

Understanding VM Performance Characteristics

Virtual machines have different performance characteristics compared to containers:

Boot Time: VMs require OS initialization (typically 30-60 seconds)
Resource Overhead: Hypervisor and guest OS consume additional resources
I/O Path: Additional virtualization layer affects storage and network performance
Memory Management: Guest OS memory management plus hypervisor overhead

Low-Latency VM Configuration

CPU Optimization

Feature Purpose Configuration

Feature	Purpose	Configuration
CPU Pinning	Dedicated CPU cores for VM	`dedicatedCpuPlacement: true`
NUMA Alignment	Memory and CPU on same NUMA node	Automatic with performance profile
CPU Model	Host CPU features exposed to VM	`cpu.model: host-model` (compatible) or `host-passthrough` (if supported)
CPU Topology	Optimal vCPU to pCPU mapping	Match host topology

CPU Pinning

Dedicated CPU cores for VM

dedicatedCpuPlacement: true

NUMA Alignment

Memory and CPU on same NUMA node

Automatic with performance profile

CPU Model

Host CPU features exposed to VM

cpu.model: host-model (compatible) or host-passthrough (if supported)

CPU Topology

Optimal vCPU to pCPU mapping

Match host topology

Memory Optimization

Feature Purpose Configuration

Feature	Purpose	Configuration
HugePages	Reduced TLB misses	`hugepages.pageSize: 1Gi`
Memory Backing	Shared memory optimization	`memoryBacking.hugepages`
NUMA Policy	Memory locality	`numaPolicy: preferred`
Memory Overcommit	Disabled for predictable performance	`memoryOvercommitPercentage: 100`

HugePages

Reduced TLB misses

hugepages.pageSize: 1Gi

Memory Backing

Shared memory optimization

memoryBacking.hugepages

NUMA Policy

Memory locality

numaPolicy: preferred

Memory Overcommit

Disabled for predictable performance

memoryOvercommitPercentage: 100

Creating VMs for Performance Testing

Instead of creating a custom template, we’ll use the existing Fedora template and customize it for our performance testing needs.

Create a performance-optimized Fedora VM for testing:

  # Create a namespace for our VM testing
oc new-project vmi-performance-test || oc project vmi-performance-test

  # Clean up any existing VMs to avoid PVC conflicts
echo "🧹 Cleaning up any existing performance test VMs..."
oc delete vm --selector=app=vmi-performance-test --ignore-not-found=true
oc delete dv --selector=app=vmi-performance-test --ignore-not-found=true

  # Wait a moment for cleanup to complete
sleep 5

  # Create a Fedora VM using the existing template with performance optimizations
  # Generate unique name to avoid PVC conflicts
VM_NAME="fedora-perf-$(date +%s)"
echo "Creating VM with unique name: $VM_NAME"

cat << EOF | oc apply -f -
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: $VM_NAME
  labels:
    app: vmi-performance-test
    vm.kubevirt.io/template: fedora-server-small
spec:
  dataVolumeTemplates:
  - apiVersion: cdi.kubevirt.io/v1beta1
    kind: DataVolume
    metadata:
      name: $VM_NAME
    spec:
      sourceRef:
        kind: DataSource
        name: fedora
        namespace: openshift-virtualization-os-images
      storage:
        resources:
          requests:
            storage: 30Gi
  runStrategy: Manual
  template:
    metadata:
      labels:
        kubevirt.io/domain: $VM_NAME
        kubevirt.io/size: small
    spec:
      domain:
        cpu:
          cores: 2
          sockets: 1
          threads: 1
          # Enable performance features if performance profile exists
          dedicatedCpuPlacement: false  # Will be enabled conditionally
          model: host-model  # More compatible than host-passthrough
        memory:
          guest: 2Gi
          # HugePages will be enabled conditionally based on availability
        devices:
          disks:
          - disk:
              bus: virtio
            name: rootdisk
          - disk:
              bus: virtio
            name: cloudinitdisk
          interfaces:
          - masquerade: {}
            model: virtio
            name: default
          rng: {}
        features:
          smm:
            enabled: true
        firmware:
          bootloader:
            efi: {}
      networks:
      - name: default
        pod: {}
      terminationGracePeriodSeconds: 180
      volumes:
      - dataVolume:
          name: $VM_NAME
        name: rootdisk
      - cloudInitNoCloud:
          userData: |
            #cloud-config
            user: fedora
            password: workshop123
            chpasswd: { expire: False }
            packages:
              - qemu-guest-agent
            runcmd:
              - systemctl enable --now qemu-guest-agent
              - echo "VM ready for performance testing" > /tmp/vm-ready
        name: cloudinitdisk
EOF

echo "✅ Fedora VM '$VM_NAME' created for performance testing"

  # Verify the VM and DataVolume were created
echo ""
echo "📋 Verifying VM creation:"
oc get vm $VM_NAME
echo ""
echo "📋 Verifying DataVolume creation:"
oc get dv $VM_NAME

  # Check for any PVC binding issues
echo ""
echo "📋 Checking for PVC issues:"
if oc get events -n vmi-performance-test | grep -i "bound incorrectly\|pvc.*conflict" >/dev/null 2>&1; then
    echo "⚠️  PVC binding issues detected. This may be due to duplicate VM names."
    echo "   The cleanup step above should have resolved this."
    echo "   If issues persist, check: oc get events -n vmi-performance-test"
else
    echo "✅ No PVC binding issues detected"
fi

Troubleshooting PVC Conflicts

If you encounter PVC binding errors like "Two claims are bound to the same volume, this one is bound incorrectly", this typically happens when:

Duplicate VM names: Multiple VMs created with the same name
Incomplete cleanup: Previous test runs left resources behind

Resolution steps:

  # Clean up all performance test VMs and DataVolumes
oc delete vm --selector=app=vmi-performance-test --ignore-not-found=true
oc delete dv --selector=app=vmi-performance-test --ignore-not-found=true

  # Wait for cleanup to complete
sleep 10

  # Check for any remaining PVCs
oc get pvc -n vmi-performance-test

  # If PVCs remain, delete them manually
oc delete pvc <pvc-name> -n vmi-performance-test

VMI Latency Testing with Kube-burner

Now let’s measure Virtual Machine Instance startup performance using kube-burner’s VMI latency measurement capabilities. We’ll adapt the test for our SNO environment.

Understanding VirtualMachine vs VirtualMachineInstance Architecture

This is a crucial concept for understanding OpenShift Virtualization performance testing:

What exists in our cluster:

  # Check VirtualMachine objects (high-level management)
oc get VirtualMachine -A
  # Shows: vmi-performance-test/fedora-perf-1759292486 (1 VM)

  # Check VirtualMachineInstance objects (actual running VMs)
oc get VirtualMachineInstance -A
  # Shows: 11 VMIs total (1 managed by VM + 10 direct VMIs)

Two Different Approaches:

VirtualMachine (VM) Approach - Used for fedora-perf-1759292486:
- Higher-level management object
- Persistent lifecycle - Can start/stop/restart
- Production-ready - Survives cluster restarts
- Creates VMI automatically when started
- Use case: Interactive testing, production workloads
VirtualMachineInstance (VMI) Approach - Used by kube-burner:
- Direct hypervisor objects - No management layer
- Ephemeral - Once deleted, they’re gone
- Pure performance testing - No controller overhead
- Created directly by kube-burner templates
- Use case: Automated latency measurements

Why kube-burner uses direct VMIs: * ✅ Precise timing - Measures pure hypervisor startup * ✅ No controller overhead - Eliminates VM management latency * ✅ Consistent results - No management layer variability * ✅ Automated testing - Perfect for ephemeral performance tests

Architecture Relationship:

Production Usage:    VirtualMachine → creates/manages → VirtualMachineInstance
Performance Testing: kube-burner → creates directly → VirtualMachineInstance

This architectural difference is why you see different objects in different namespaces!

Verify the architectural difference yourself:

  # Compare the two approaches in your cluster
echo "--- VirtualMachine Objects (Management Layer) ---"
oc get VirtualMachine -A
echo ""
echo "--- VirtualMachineInstance Objects (Running VMs) ---"
oc get VirtualMachineInstance -A
echo ""
echo "--- Owner Relationships ---"
echo "VM-managed VMI (has owner reference):"
oc get vmi fedora-perf-1759292486 -n vmi-performance-test -o jsonpath='{.metadata.ownerReferences[0].kind}' 2>/dev/null && echo " ← Managed by VirtualMachine" || echo "No owner reference"

echo ""
echo "Direct VMI (no owner reference):"
oc get vmi fedora-vmi-0-1 -n vmi-latency-test-0 -o jsonpath='{.metadata.ownerReferences}' 2>/dev/null
if [ $? -eq 0 ] && [ -n "$(oc get vmi fedora-vmi-0-1 -n vmi-latency-test-0 -o jsonpath='{.metadata.ownerReferences}' 2>/dev/null)" ]; then
    echo "Has owner reference"
else
    echo "No owner reference ← Created directly by kube-burner"
fi

Create a VMI-specific kube-burner configuration adapted for SNO:

cd ~/kube-burner-configs

cat << EOF > vmi-latency-config.yml
global:
  measurements:
    - name: vmiLatency
      thresholds:
        - conditionType: VMIRunning
          metric: P99
          threshold: 90000ms  # Increased for SNO environment
        - conditionType: VMIScheduled
          metric: P99
          threshold: 60000ms  # Increased for SNO environment

metricsEndpoints:
  - indexer:
      type: local
      metricsDirectory: collected-metrics-vmi

jobs:
  - name: vmi-latency-test
    jobType: create
    jobIterations: 5  # Reduced for SNO environment
    namespace: vmi-latency-test
    namespacedIterations: true
    cleanup: false
    podWait: false
    waitWhenFinished: true
    verifyObjects: true
    errorOnVerify: false
    objects:
      - objectTemplate: fedora-vmi.yml
        replicas: 2  # Small scale for SNO
EOF

Create the Fedora VMI template for testing:

  # Create VMI template using containerDisk for faster, ephemeral testing
  # This approach is ideal for performance testing as it doesn't require PVC provisioning
echo "Creating Fedora VMI template for kube-burner testing"

cat << EOF > fedora-vmi.yml
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
  name: fedora-vmi-{{.Iteration}}-{{.Replica}}
  labels:
    app: vmi-latency-test
    iteration: "{{.Iteration}}"
spec:
  # No nodeSelector for SNO - will schedule on the single node
  domain:
    cpu:
      cores: 1
      sockets: 1
      threads: 1
      # Performance features will be enabled conditionally
      # Using host-model instead of host-passthrough for better compatibility
      model: host-model
    memory:
      guest: 2Gi  # Minimum required for Fedora
      # HugePages will be added conditionally if available
    devices:
      disks:
      - name: containerdisk
        disk:
          bus: virtio
      - name: cloudinitdisk
        disk:
          bus: virtio
      interfaces:
      - name: default
        masquerade: {}
        model: virtio
      rng: {}
    features:
      smm:
        enabled: true
    firmware:
      bootloader:
        efi: {}
  networks:
  - name: default
    pod: {}
  terminationGracePeriodSeconds: 180
  volumes:
  - name: containerdisk
    containerDisk:
      image: quay.io/containerdisks/fedora:latest
  - name: cloudinitdisk
    cloudInitNoCloud:
      userData: |
        #cloud-config
        user: fedora
        password: workshop123
        chpasswd: { expire: False }
        bootcmd:
          - "echo 'Fedora VMI started at' \$(date) > /tmp/vmi-start-time"
EOF

Why we use containerDisk instead of DataVolumes for performance testing

For kube-burner performance testing, we use containerDisk instead of DataVolumes because:

Faster startup: No PVC provisioning or DataVolume import delays
Simpler template: Single VMI object instead of VMI + DataVolume
Ephemeral by design: Perfect for performance testing where persistence isn’t needed
Consistent results: No storage backend variability affecting measurements

containerDisk approach:

volumes:
- name: containerdisk
  containerDisk:
    image: quay.io/containerdisks/fedora:latest

DataVolume approach (for production VMs):

volumes:
- name: rootdisk
  dataVolume:
    name: my-vm-disk
---
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: my-vm-disk
spec:
  sourceRef:
    kind: DataSource
    name: fedora
    namespace: openshift-virtualization-os-images

For this performance testing module, containerDisk provides the most accurate VMI startup measurements!

Configure VMI with optimal performance settings:

# Generate optimized VMI configuration
bash ~/low-latency-performance-workshop/scripts/module05-configure-vmi.sh

What This Script Does:

Auto-detects Performance Profile availability
Auto-detects HugePages configuration
Generates optimized VMI YAML with:
- CPU pinning (if Performance Profile exists)
- HugePages (if available)
- Appropriate CPU model (host-passthrough or host-model)
- Educational comments

Benefits of Using the Script:

Dynamic Configuration: Adapts to your cluster’s capabilities
Educational Feedback: Explains what features are enabled and why
Flexible Options: Customize VMI name, memory, CPUs, namespace
Consistent Results: Same configuration across different clusters

Script Options:

--name NAME: VMI name (default: fedora-vmi)
--namespace NS: Namespace (default: default)
--memory SIZE: Memory size (default: 2Gi)
--cpus NUM: Number of CPUs (default: 2)
--output FILE: Output file (default: fedora-vmi.yml)

Example with Custom Settings:

bash ~/low-latency-performance-workshop/scripts/module05-configure-vmi.sh \
  --name my-vm \
  --memory 4Gi \
  --cpus 4 \
  --output my-vm.yml

If No Performance Profile:

The script will generate a VMI configuration with default settings and provide guidance on completing Module 4 for enhanced performance.

Clean up any existing VMI test resources before starting:

  # Clean up any existing VMI test resources to avoid PVC conflicts
echo "🧹 Cleaning up any existing VMI test resources..."
oc delete vmi --selector=app=vmi-latency-test --all-namespaces --ignore-not-found=true
oc delete dv --selector=app=vmi-latency-test --all-namespaces --ignore-not-found=true

  # Wait for cleanup to complete
sleep 5

echo "✅ Cleanup completed"

Run the VMI latency test using the corrected configuration:

  # Execute the VMI latency test with containerDisk approach
echo "Starting Fedora VMI latency performance test..."
echo "   Test approach: Direct VMI creation with containerDisk (no PVC provisioning)"
echo "   Test scale: 5 iterations × 2 replicas = 10 VMIs total"
echo "   Environment: Single Node OpenShift (SNO)"
echo "   Unique namespaces: vmi-latency-test-0 through vmi-latency-test-4"
echo ""

kube-burner init -c vmi-latency-config.yml --log-level=info

  # The test will:
  # 1. Create VMIs directly in each namespace using containerDisk
  # 2. Measure pure VMI startup latency (no storage provisioning overhead)
  # 3. Track VMI lifecycle phases from creation to running
  # 4. Generate performance metrics in collected-metrics-vmi/

Understanding the test results:

The kube-burner test measures several key VMI startup phases:

  # View the key metrics from the test
echo "VMI Latency Test Results Summary:"
echo ""
echo "Key Metrics Measured:"
echo "• VMICreated: Time to create VMI object (should be ~0ms)"
echo "• VMIPending: Time VMI spends in Pending state"
echo "• VMIScheduling: Time to schedule VMI to a node"
echo "• VMIScheduled: Time until VMI is scheduled (containerDisk pull + pod creation)"
echo "• VMIRunning: Total time until VMI is fully running (includes OS boot)"
echo ""
echo "Expected Results for SNO Environment with containerDisk:"
echo "• VMIScheduled P99: ~30-45 seconds (container image pull + pod start)"
echo "• VMIRunning P99: ~45-60 seconds (full VM boot from containerDisk)"
echo "• VMIScheduling P99: <1 second (fast on SNO)"
echo ""
echo "📁 Detailed metrics saved in: collected-metrics-vmi/"
ls -la collected-metrics-vmi/

Monitor VMI creation progress:

  # Watch VMIs being created (press Ctrl+C to exit watch)
echo "Monitoring VMI creation progress..."
echo "   Use Ctrl+C to exit the watch command when test completes"
echo ""

  # Watch VMIs and their launcher pods being created
watch -n 5 "echo '--- VMIs ---' && oc get vmi --all-namespaces --selector=app=vmi-latency-test && echo '' && echo '--- Launcher Pods ---' && oc get pods --all-namespaces --selector=kubevirt.io=virt-launcher | grep vmi-latency"

Check VMI status and verify the architectural difference:

  # Comprehensive verification of VMI test results
echo "=================================================="
echo "📋 VMI Latency Test - Current Status"
echo "=================================================="
echo ""
echo "✅ VirtualMachine Objects (Management Layer):"
oc get VirtualMachine -A 2>/dev/null || echo "No VMs found"
echo ""
echo "✅ VirtualMachineInstance Objects (Running VMs):"
oc get VirtualMachineInstance -A 2>/dev/null || echo "No VMIs found"
echo ""
echo "=================================================="
echo "� Kube-burner Test Results"
echo "=================================================="
echo ""
echo "VMIs created by kube-burner test:"
oc get vmi --all-namespaces --selector=app=vmi-latency-test -o custom-columns=NAMESPACE:.metadata.namespace,NAME:.metadata.name,PHASE:.status.phase,IP:.status.interfaces[0].ipAddress,READY:.status.conditions[?\(@.type==\"Ready\"\)].status 2>/dev/null || echo "No test VMIs found"
echo ""
echo "📋 DataVolume Status (should be empty with containerDisk):"
oc get dv --all-namespaces --selector=app=vmi-latency-test 2>/dev/null || echo "No DataVolumes found (expected with containerDisk)"
echo ""
echo "📋 VMI Launcher Pods:"
oc get pods --all-namespaces --selector=kubevirt.io=virt-launcher -o custom-columns=NAMESPACE:.metadata.namespace,NAME:.metadata.name,STATUS:.status.phase,NODE:.spec.nodeName 2>/dev/null | grep -E "NAMESPACE|vmi-latency" || echo "No launcher pods found"
echo ""
echo "=================================================="
echo "✅ Test Results Summary"
echo "=================================================="
TOTAL_VMIS=\$(oc get vmi --all-namespaces --selector=app=vmi-latency-test --no-headers 2>/dev/null | wc -l)
RUNNING_VMIS=\$(oc get vmi --all-namespaces --selector=app=vmi-latency-test --no-headers 2>/dev/null | grep -c "Running" || echo "0")
echo "Total VMIs created: \$TOTAL_VMIS"
echo "VMIs in Running phase: \$RUNNING_VMIS"
echo ""
if [ "\$RUNNING_VMIS" -eq 10 ]; then
    echo "🎉 SUCCESS! All 10 test VMIs are running!"
    echo "📊 This demonstrates direct VMI creation with containerDisk"
    echo "✅ No DataVolumes needed - faster startup for performance testing"
    echo ""
    echo "Key Observations:"
    echo "• All VMIs have IP addresses assigned"
    echo "• All VMIs are in Ready state"
    echo "• No PVC/DataVolume provisioning delays"
    echo "• Pure VMI startup latency measured"
elif [ "\$TOTAL_VMIS" -eq 10 ]; then
    echo "⚠️  All 10 VMIs created, \$RUNNING_VMIS are running"
    echo "   Some may still be pulling containerDisk images"
    echo "   Check: oc get pods --all-namespaces | grep virt-launcher"
else
    echo "⚠️  Expected 10 VMIs, found \$TOTAL_VMIS"
    echo "   Review kube-burner logs for errors"
    echo ""
    echo "💡 If VMIs failed, see troubleshooting section below"
fi

Troubleshooting VMI Failures

If your VMIs are not running successfully, this section will help you diagnose and fix common issues.

Check VMI and Pod Status:

# Get detailed status of all VMIs
echo "=== VMI Status ==="
oc get vmi --all-namespaces --selector=app=vmi-latency-test

echo ""
echo "=== virt-launcher Pod Status ==="
oc get pods --all-namespaces --selector=kubevirt.io=virt-launcher | grep vmi-latency

echo ""
echo "=== Failed/OOMKilled Pods ==="
oc get pods --all-namespaces --selector=kubevirt.io=virt-launcher | grep -E "OOMKilled|Error|CrashLoop" || echo "No failed pods"

Diagnose OOMKilled VMIs (Most Common Issue):

# Check if VMIs are OOMKilled due to insufficient HugePages
echo "=== Checking for OOMKilled VMIs ==="
OOMKILLED_COUNT=$(oc get pods --all-namespaces --selector=kubevirt.io=virt-launcher -o jsonpath='{.items[*].status.containerStatuses[*].lastState.terminated.reason}' 2>/dev/null | grep -o "OOMKilled" | wc -l)

if [ "$OOMKILLED_COUNT" -gt 0 ]; then
    echo "❌ Found $OOMKILLED_COUNT OOMKilled virt-launcher pods"
    echo ""
    echo "Root Cause: Insufficient HugePages for VMI memory + overhead"
    echo ""
    echo "Explanation:"
    echo "  • Each VMI needs: 2GB guest + 1GB virt-launcher overhead = 3GB total"
    echo "  • Test creates: 10 VMIs × 3GB = 30GB required"
    echo "  • Available HugePages: $(oc get node -o jsonpath='{.items[0].status.allocatable.hugepages-1Gi}' | sed 's/Gi//g')GB"
    echo ""
    echo "Solutions:"
    echo ""
    echo "  Option 1: Increase HugePages (Recommended)"
    echo "  ─────────────────────────────────────────"
    echo "  bash ~/low-latency-performance-workshop/scripts/module05-update-hugepages.sh"
    echo ""
    echo "  This will:"
    echo "  • Calculate optimal HugePages for your cluster"
    echo "  • Update Performance Profile"
    echo "  • Trigger node reboot (wait 10-15 minutes)"
    echo "  • Allocate sufficient HugePages for 10 VMIs"
    echo ""
    echo "  Option 2: Reduce Test Scale"
    echo "  ───────────────────────────"
    echo "  Edit ~/kube-burner-configs/vmi-latency-config.yml:"
    echo ""
    echo "  Current:"
    echo "    jobIterations: 5"
    echo "    replicas: 2"
    echo "    Total: 10 VMIs"
    echo ""
    echo "  Recommended for 16GB HugePages:"
    echo "    jobIterations: 2"
    echo "    replicas: 2"
    echo "    Total: 4 VMIs (fits in 16GB)"
    echo ""
    echo "  Then clean up and re-run:"
    echo "    oc delete vmi --selector=app=vmi-latency-test --all-namespaces"
    echo "    kube-burner init -c vmi-latency-config.yml"
    echo ""
else
    echo "✅ No OOMKilled pods found"
fi

Check HugePages Allocation:

# Detailed HugePages analysis
echo "=== HugePages Allocation Analysis ==="
echo ""

# Get HugePages from Performance Profile
PERF_PROFILE=$(oc get performanceprofile -o jsonpath='{.items[0].metadata.name}' 2>/dev/null)
if [ -n "$PERF_PROFILE" ]; then
    HUGEPAGES_COUNT=$(oc get performanceprofile "$PERF_PROFILE" -o jsonpath='{.spec.hugepages.pages[0].count}' 2>/dev/null)
    HUGEPAGES_SIZE=$(oc get performanceprofile "$PERF_PROFILE" -o jsonpath='{.spec.hugepages.pages[0].size}' 2>/dev/null)
    echo "Performance Profile: $PERF_PROFILE"
    echo "  Configured: ${HUGEPAGES_COUNT} × ${HUGEPAGES_SIZE} = ${HUGEPAGES_COUNT}GB"
fi

echo ""

# Get HugePages from node
NODE=$(oc get nodes -o jsonpath='{.items[0].metadata.name}')
HUGEPAGES_CAPACITY=$(oc get node "$NODE" -o jsonpath='{.status.capacity.hugepages-1Gi}' | sed 's/Gi//g')
HUGEPAGES_ALLOCATABLE=$(oc get node "$NODE" -o jsonpath='{.status.allocatable.hugepages-1Gi}' | sed 's/Gi//g')

echo "Node: $NODE"
echo "  Capacity: ${HUGEPAGES_CAPACITY}GB"
echo "  Allocatable: ${HUGEPAGES_ALLOCATABLE}GB"

echo ""
echo "VMI Capacity Calculation:"
echo "  • VMI memory requirement: 3GB per VMI (2GB guest + 1GB overhead)"
echo "  • Available HugePages: ${HUGEPAGES_ALLOCATABLE}GB"
echo "  • Max concurrent VMIs: ~$((HUGEPAGES_ALLOCATABLE / 3))"
echo "  • Test requires: 10 VMIs = 30GB"
echo ""

if [ "$HUGEPAGES_ALLOCATABLE" -ge 30 ]; then
    echo "✅ Sufficient HugePages for 10 VMIs"
elif [ "$HUGEPAGES_ALLOCATABLE" -ge 24 ]; then
    echo "⚠️  Sufficient for 8 VMIs, reduce test scale to 8"
elif [ "$HUGEPAGES_ALLOCATABLE" -ge 18 ]; then
    echo "⚠️  Sufficient for 6 VMIs, reduce test scale to 6"
elif [ "$HUGEPAGES_ALLOCATABLE" -ge 12 ]; then
    echo "⚠️  Sufficient for 4 VMIs, reduce test scale to 4"
else
    echo "❌ Insufficient HugePages, increase allocation to at least 24GB"
fi

Check VMI Events for Errors:

# Check events for failed VMIs
echo "=== Recent VMI Events ==="
oc get events --all-namespaces --field-selector involvedObject.kind=VirtualMachineInstance --sort-by='.lastTimestamp' | tail -20

View virt-launcher Pod Logs:

# Get logs from a failed virt-launcher pod
echo "=== virt-launcher Pod Logs (first failed pod) ==="
FAILED_POD=$(oc get pods --all-namespaces --selector=kubevirt.io=virt-launcher -o jsonpath='{.items[?(@.status.phase!="Running")].metadata.name}' | awk '{print $1}' | head -1)
FAILED_NS=$(oc get pods --all-namespaces --selector=kubevirt.io=virt-launcher -o jsonpath='{.items[?(@.status.phase!="Running")].metadata.namespace}' | awk '{print $1}' | head -1)

if [ -n "$FAILED_POD" ]; then
    echo "Pod: $FAILED_POD (namespace: $FAILED_NS)"
    echo ""
    oc logs -n "$FAILED_NS" "$FAILED_POD" --tail=50 2>/dev/null || echo "No logs available"
else
    echo "No failed pods found"
fi

Common VMI Failure Patterns:

OOMKilled = Insufficient memory/HugePages
- Solution: Increase HugePages or reduce test scale
Scheduling (stuck) = No HugePages available
- Solution: Increase HugePages allocation
ImagePullBackOff = Cannot pull containerDisk image
- Solution: Check network connectivity, image registry access
CrashLoopBackOff = VMI starts but crashes
- Solution: Check virt-launcher logs, verify CPU/memory settings
Pending (stuck) = Cannot schedule to node
- Solution: Check node resources, taints, tolerations

Prevention:

Always run the validation script before testing:

bash ~/low-latency-performance-workshop/scripts/module05-validate-vmi-resources.sh

This will catch resource issues before they cause failures!

Analyzing VMI Latency Results

Now let’s analyze the VMI performance results and understand what the metrics tell us about virtualization performance characteristics.

Examine the VMI latency metrics generated by kube-burner:

cd ~/kube-burner-configs

  # Check what metrics were generated
echo "📊 VMI Latency Test Results:"
ls -la collected-metrics-vmi/

  # View the summary of VMI latency measurements
echo ""
echo "📋 VMI Latency Quantiles (Key Performance Indicators):"
echo "   All times in milliseconds (ms)"
echo ""
if [ -f "collected-metrics-vmi/vmiLatencyQuantilesMeasurement-vmi-latency-test.json" ]; then
    cat collected-metrics-vmi/vmiLatencyQuantilesMeasurement-vmi-latency-test.json | jq -r '.[] | "\(.quantileName) - P99: \(.P99)ms | P50: \(.P50)ms | Avg: \(.avg)ms"' | grep -v "VMReady" | sort
else
    echo "VMI latency quantiles file not found"
fi

  # Show job summary
echo ""
echo "📈 Test Execution Summary:"
if [ -f "collected-metrics-vmi/jobSummary.json" ]; then
    cat collected-metrics-vmi/jobSummary.json | jq -r '.[] | "Job: \(.jobConfig.name) | Status: \(if .passed then "✅ PASSED" else "❌ FAILED" end) | Duration: \(.elapsedTime)s | QPS: \(.achievedQps)"'
else
    echo "Job summary file not found"
fi

Analyze VMI startup phases and understand the performance characteristics:

cd ~/kube-burner-configs

  # Analyze the detailed VMI latency measurements
echo "🔍 Detailed VMI Startup Phase Analysis:"
echo ""

if [ -f "collected-metrics-vmi/vmiLatencyMeasurement-vmi-latency-test.json" ]; then
    echo "VMI Startup Phases (in chronological order):"
    echo "1. VMICreated → VMIPending: Object creation time"
    echo "2. VMIPending → VMIScheduling: Waiting for scheduling"
    echo "3. VMIScheduling → VMIScheduled: Node assignment + pod creation"
    echo "4. VMIScheduled → VMIRunning: containerDisk pull + VM boot"
    echo ""

    # Show actual timing data
    echo "📊 Actual Timing Results (Average across all VMIs):"
    cat collected-metrics-vmi/vmiLatencyMeasurement-vmi-latency-test.json | jq -r '
        [.[] | {
            vmiCreated: .vmiCreatedLatency,
            vmiPending: .vmiPendingLatency,
            vmiScheduling: .vmiSchedulingLatency,
            vmiScheduled: .vmiScheduledLatency,
            vmiRunning: .vmiRunningLatency,
            podCreated: .podCreatedLatency,
            podScheduled: .podScheduledLatency,
            podInitialized: .podInitializedLatency,
            podContainersReady: .podContainersReadyLatency
        }] |
        {
            vmiCreated: ([.[].vmiCreated] | add / length),
            vmiPending: ([.[].vmiPending] | add / length),
            vmiScheduling: ([.[].vmiScheduling] | add / length),
            vmiScheduled: ([.[].vmiScheduled] | add / length),
            vmiRunning: ([.[].vmiRunning] | add / length),
            podCreated: ([.[].podCreated] | add / length),
            podScheduled: ([.[].podScheduled] | add / length),
            podInitialized: ([.[].podInitialized] | add / length),
            podContainersReady: ([.[].podContainersReady] | add / length)
        } |
        to_entries |
        .[] |
        "  \(.key): \(.value | floor)ms"
    '

    echo ""
    echo "🎯 Performance Analysis (containerDisk approach):"
    echo "• VMICreated should be ~0ms (object creation)"
    echo "• VMIScheduling should be <2000ms (fast scheduling on SNO)"
    echo "• VMIScheduled includes containerDisk image pull time (major component)"
    echo "• VMIRunning includes full Fedora boot time from containerDisk (~45-55s typical)"
    echo ""
    echo "💡 Key Insight: With containerDisk, most time is spent pulling the container"
    echo "   image and booting the OS. No PVC provisioning or DataVolume import delays!"

else
    echo "❌ VMI latency measurement file not found"
    echo "This may indicate the test didn't complete successfully"
fi

Analyze VMI performance using the main performance analyzer:

cd ~/kube-burner-configs

  # Use the main performance analyzer for VMI metrics
echo "🎓 Running VMI Performance Analysis..."
python3 ~/low-latency-performance-workshop/scripts/analyze-performance.py \
    --single collected-metrics-vmi

  # This analysis provides:
  # • VMI startup phase breakdown and timing analysis
  # • Performance bottleneck identification
  # • Statistical analysis of latency variations
  # • Comparison with performance thresholds
  # • Color-coded performance assessment

Compare VMI performance characteristics with container baselines:

cd ~/kube-burner-configs

  # Generate comprehensive comparison between VMs and containers
echo "📊 VMI vs Container Performance Comparison..."

  # Check what metrics are available for comparison
BASELINE_AVAILABLE=false
TUNED_AVAILABLE=false

if [ -d "collected-metrics" ]; then
    echo "✅ Container baseline metrics found"
    BASELINE_AVAILABLE=true
fi

if [ -d "collected-metrics-tuned" ]; then
    echo "✅ Container tuned metrics found"
    TUNED_AVAILABLE=true
fi

if [ -d "collected-metrics-vmi" ]; then
    echo "✅ VMI metrics found"
else
    echo "❌ VMI metrics not found - check test execution above"
    exit 1
fi

echo ""

  # Module 5 focused analysis - VMI performance with intelligent container context
echo "🎯 Module 5 Focused Analysis (VMI Performance with Context)..."
python3 ~/low-latency-performance-workshop/scripts/module-specific-analysis.py 5

echo ""
echo "💡 Module 5 Learning Focus:"
echo "   🔍 VMI startup phases and timing"
echo "   ⚖️  Virtualization vs containerization trade-offs"
echo "   🎯 When to choose VMs vs containers for workloads"
if [ "$TUNED_AVAILABLE" = true ]; then
    echo "   🚀 How performance profiles benefit both VMs and containers"
else
    echo "   ℹ️  Performance profiles (Module 4) would improve both VMs and containers"
fi

echo ""
echo "📚 How to Read the Module 5 Analysis:"
echo "   1. Individual sections show raw performance for each test type"
echo "   2. VMI metrics (🖥️ section) are the focus of this module"
echo "   3. Container metrics provide context for comparison"
echo "   4. Look for VMI-specific phases: VMICreated → VMIPending → VMIScheduled → VMIRunning"

echo ""
echo "💡 This comparison explains:"
echo "• Why VMs take longer to start than containers (OS boot vs process start)"
echo "• The performance trade-offs of virtualization (isolation vs overhead)"
echo "• When to use VMs vs containers for different workloads"
echo "• How performance profiles affect both VMs and containers"

Generate a comprehensive performance report:

cd ~/kube-burner-configs

  # Generate a comprehensive markdown report with all available metrics
echo "Generating Comprehensive Performance Report..."

  # Determine what metrics are available and generate appropriate report
BASELINE_AVAILABLE=false
TUNED_AVAILABLE=false
VMI_AVAILABLE=false

[ -d "collected-metrics" ] && BASELINE_AVAILABLE=true
[ -d "collected-metrics-tuned" ] && TUNED_AVAILABLE=true
[ -d "collected-metrics-vmi" ] && VMI_AVAILABLE=true

  # Generate Module 5 specific report with available metrics
REPORT_FILE="module5-vmi-performance-report-$(date +%Y%m%d-%H%M).md"

echo "📄 Generating Module 5 VMI Performance Report..."
echo "   🎯 Focus: Virtual machine performance analysis"
echo "   📊 Context: VMI startup vs container performance"

if [ "$BASELINE_AVAILABLE" = true ] && [ "$TUNED_AVAILABLE" = true ] && [ "$VMI_AVAILABLE" = true ]; then
    echo "   📈 Scope: VMI + Container baseline + Container tuned"
    python3 ~/low-latency-performance-workshop/scripts/analyze-performance.py \
        --baseline collected-metrics \
        --tuned collected-metrics-tuned \
        --vmi collected-metrics-vmi \
        --report "$REPORT_FILE"
elif [ "$BASELINE_AVAILABLE" = true ] && [ "$VMI_AVAILABLE" = true ]; then
    echo "   📈 Scope: VMI + Container baseline"
    python3 ~/low-latency-performance-workshop/scripts/analyze-performance.py \
        --baseline collected-metrics \
        --vmi collected-metrics-vmi \
        --report "$REPORT_FILE"
elif [ "$VMI_AVAILABLE" = true ]; then
    echo "   📈 Scope: VMI standalone analysis"
    python3 ~/low-latency-performance-workshop/scripts/analyze-performance.py \
        --single collected-metrics-vmi \
        --report "$REPORT_FILE"
else
    echo "❌ No VMI performance metrics found for report generation"
    exit 1
fi

echo ""
echo "📄 Performance Report Generated: $REPORT_FILE"
echo "📊 Report Summary:"
if [ -f "$REPORT_FILE" ]; then
    head -20 "$REPORT_FILE"
    echo ""
    echo "💡 View the complete report: cat $REPORT_FILE"
else
    echo "❌ Report generation failed"
fi

SR-IOV Configuration for High-Performance VM Networking

SR-IOV (Single Root I/O Virtualization) provides direct hardware access to Virtual Machines, bypassing the software networking stack for maximum performance. This is particularly important for VMs that require near bare-metal network performance.

Lab Environment Considerations:

This workshop supports two approaches for high-performance VM networking:

Production SR-IOV (requires SR-IOV capable hardware)
- Direct hardware access via Virtual Functions (VFs)
- <1ms latency, near line-rate throughput
- Requires physical SR-IOV NICs
Lab Simulation with User Defined Networks (works in any environment)
- Uses OVN-Kubernetes secondary networks
- Better performance than default pod network
- No special hardware required
- Recommended for lab/learning environments

This module covers both approaches so you can learn SR-IOV concepts and test in your lab environment.

Choosing Your Networking Approach

Approach	Use Case	Hardware Required	Performance
Default Pod Network	Basic VMs, development	None	2-5ms latency
User Defined Networks (Lab Simulation)	Lab environments, learning, testing	None	1-3ms latency (30-50% improvement)
SR-IOV (Production)	Production NFV, real-time apps	SR-IOV capable NICs	<1ms latency (near bare-metal)

Approach

Use Case

Hardware Required

Performance

Default Pod Network

Basic VMs, development

None

2-5ms latency

User Defined Networks
(Lab Simulation)

Lab environments, learning, testing

None

1-3ms latency
(30-50% improvement)

SR-IOV
(Production)

Production NFV, real-time apps

SR-IOV capable NICs

<1ms latency
(near bare-metal)

Recommendation for This Workshop:

Lab/Learning Environment: Use User Defined Networks (covered in detail below)
Production Environment: Use SR-IOV (also covered for reference)

Both approaches teach the same concepts: - Dual-interface VM design - Network separation (management vs data plane) - Performance optimization techniques - Multi-network VM architecture

Understanding SR-IOV Benefits for VMs

Feature	VM with Pod Network	VM with SR-IOV
Latency	2-5ms (through virt-launcher pod)	<1ms (direct hardware access)
Throughput	5-20 Gbps (limited by pod network)	Near line-rate (40-100 Gbps)
CPU Usage	Higher (virtio + pod network overhead)	Lower (hardware offload)
Isolation	Software-based (pod network)	Hardware-enforced (dedicated VF)
Network Stack	VM → virtio → virt-launcher → CNI → host	VM → SR-IOV VF → physical NIC

Feature

VM with Pod Network

VM with SR-IOV

Latency

2-5ms (through virt-launcher pod)

<1ms (direct hardware access)

Throughput

5-20 Gbps (limited by pod network)

Near line-rate (40-100 Gbps)

CPU Usage

Higher (virtio + pod network overhead)

Lower (hardware offload)

Isolation

Software-based (pod network)

Hardware-enforced (dedicated VF)

Network Stack

VM → virtio → virt-launcher → CNI → host

VM → SR-IOV VF → physical NIC

Why SR-IOV Matters for VMs:

Eliminates Virtualization Overhead: VMs bypass the virt-launcher pod network entirely
Direct Hardware Access: Each VM gets a dedicated Virtual Function (VF) from the physical NIC
Predictable Performance: Hardware-enforced QoS and isolation
Production Workloads: Essential for NFV, real-time applications, and high-throughput VMs

SR-IOV is the key technology for achieving container-like network performance in VMs.

Verifying SR-IOV Network Operator

The SR-IOV Network Operator was deployed in Module 2. Let’s verify it’s ready for VM networking:

Check SR-IOV operator status:

# Check SR-IOV operator installation
oc get csv -n openshift-sriov-network-operator

# Verify SR-IOV operator pods
oc get pods -n openshift-sriov-network-operator

# Check if SR-IOV capable nodes are detected
oc get sriovnetworknodestates -n openshift-sriov-network-operator

# List available SR-IOV network node policies
oc get sriovnetworknodepolicy -n openshift-sriov-network-operator

# Check for SR-IOV networks configured for VMs
oc get sriovnetwork -n openshift-sriov-network-operator

If SR-IOV hardware is not available or the operator shows no SR-IOV capable nodes, proceed to the Lab Simulation section below to use User Defined Networks instead.

Lab Simulation: High-Performance VM Networking with User Defined Networks

For lab environments without SR-IOV hardware, we can simulate high-performance VM networking using OVN-Kubernetes User Defined Networks (also called Secondary Networks). While not as fast as SR-IOV, this provides better performance than the default pod network and demonstrates the same networking concepts.

Clean Up Previous Test VMIs

Before creating the high-performance VM, clean up VMIs from the previous kube-burner test to free HugePages:

Check current VMI resource usage:

# Check running VMIs and their HugePages usage
echo "=== Current VMIs ==="
oc get vmi --all-namespaces

echo ""
echo "=== HugePages Usage ==="
oc get node -o jsonpath='{.items[0].status.allocatable.hugepages-1Gi}' | sed 's/Gi/ GB available/g'
echo ""

# Calculate VMIs using HugePages
VMI_COUNT=$(oc get vmi --all-namespaces --no-headers 2>/dev/null | wc -l)
if [ "$VMI_COUNT" -gt 0 ]; then
    echo "Current VMIs: $VMI_COUNT"
    echo "Estimated HugePages in use: ~$((VMI_COUNT * 3)) GB (assuming 2GB guest + 1GB overhead per VMI)"
    echo ""
    echo "⚠️  Cleanup recommended before creating new VMs"
fi

Why Cleanup is Important:

Each VMI consumes HugePages memory that remains allocated even after testing completes:

VMI Guest Memory: 2GB per VMI (configured in VMI spec)
virt-launcher Overhead: ~1GB per VMI (KubeVirt management pod)
Total per VMI: ~3GB

Example:

8 running VMIs × 3GB = 24GB HugePages in use
Available HugePages: 24GB
Result: No HugePages available for new VMs! ❌

Best Practice: Always clean up test VMIs before starting new VM deployments to avoid resource exhaustion.

Clean up test VMIs and namespaces:

# Delete all VMIs from kube-burner test
echo "Cleaning up test VMIs..."
oc delete vmi --selector=app=vmi-latency-test --all-namespaces --wait=false

# Delete test namespaces
for i in {0..4}; do
    oc delete namespace vmi-latency-test-$i --wait=false 2>/dev/null || true
done

echo ""
echo "Cleanup initiated. Waiting for resources to be freed..."
sleep 10

# Verify cleanup
echo ""
echo "=== Remaining VMIs ==="
oc get vmi --all-namespaces

echo ""
echo "=== HugePages Now Available ==="
oc get node -o jsonpath='{.items[0].status.allocatable.hugepages-1Gi}' | sed 's/Gi/ GB available/g'
echo ""

If you see VMIs still terminating, wait a few moments for them to fully clean up. You can monitor with:

watch oc get vmi --all-namespaces

Press Ctrl+C to exit the watch command.

Create User Defined Network

Why User Defined Networks for Lab Environments:

No Special Hardware Required: Works on any OpenShift cluster
Better Performance: Dedicated network namespace, reduced overhead
Same Concepts: Dual-interface design, network separation
Production Pattern: Many production VMs use secondary networks
Learning Value: Understand multi-network VM architecture

Performance Comparison: * Default Pod Network: 2-5ms latency * User Defined Network: 1-3ms latency (30-50% improvement) * SR-IOV: <1ms latency (production target)

Create a User Defined Network for high-performance VM networking:

cat << EOF | oc apply -f -
apiVersion: k8s.ovn.org/v1
kind: UserDefinedNetwork
metadata:
  name: vm-high-perf-network
  namespace: default
spec:
  topology: Layer2
  layer2:
    role: Secondary
    subnets:
      - "192.168.100.0/24"
EOF

UserDefinedNetwork (UDN) - Modern OpenShift 4.18+ Approach:

This creates a Layer2 User Defined Network using native OVN-Kubernetes integration:

API: k8s.ovn.org/v1 (native OVN-Kubernetes, not Multus)
Topology: Layer2 (recommended for VM networking)
Role: Secondary (additional network, not replacing pod network)
Subnet: 192.168.100.0/24 (automatic IPAM by OVN-K)
Benefits:
- Simpler configuration than NetworkAttachmentDefinition
- Native OVN-Kubernetes IPAM (no manual IPAM configuration needed)
- Better integration with OpenShift Virtualization
- Recommended approach for OpenShift 4.18+

Why Layer2? - VMs can communicate at Layer2 (like a virtual switch) - Better for VM-to-VM communication - Supports VM live migration with persistent IPs - Simpler than Layer3 for most VM use cases

Note: OpenShift automatically creates a corresponding NetworkAttachmentDefinition for compatibility with VMs.

Verify the UserDefinedNetwork was created:

# Check UserDefinedNetwork
echo "=== UserDefinedNetwork ==="
oc get userdefinednetwork vm-high-perf-network -n default -o yaml

echo ""
echo "=== Auto-Generated NetworkAttachmentDefinition ==="
# OpenShift automatically creates a NetworkAttachmentDefinition for VM compatibility
oc get net-attach-def vm-high-perf-network -n default

echo ""
echo "=== Network Details ==="
oc describe userdefinednetwork vm-high-perf-network -n default

What Just Happened:

When you create a UserDefinedNetwork, OpenShift automatically:

Creates the UDN: The Layer2 network with OVN-K IPAM
Auto-generates NetworkAttachmentDefinition: For backward compatibility with VMs
Configures OVN: Sets up the virtual switch and subnet

Key Point: VMs still reference the network using multus.networkName in their spec, but the underlying implementation is now the modern UserDefinedNetwork instead of manual NetworkAttachmentDefinition configuration.

This is why UserDefinedNetwork is better: - ✅ You define the network once (simple YAML) - ✅ OpenShift handles the NetworkAttachmentDefinition automatically - ✅ Native OVN-K integration (no manual CNI JSON) - ✅ Built-in IPAM (no configuration needed)

Create a high-performance VM with dual network interfaces (lab simulation):

cat << EOF | oc apply -f -
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: high-perf-vm-lab
  namespace: default
  labels:
    app: high-perf-vm
spec:
  running: true
  dataVolumeTemplates:
    - metadata:
        name: high-perf-vm-lab-rootdisk
      spec:
        storage:
          resources:
            requests:
              storage: 30Gi
        sourceRef:
          kind: DataSource
          name: fedora
          namespace: openshift-virtualization-os-images
  template:
    metadata:
      labels:
        kubevirt.io/vm: high-perf-vm-lab
        app: high-perf-vm
    spec:
      domain:
        cpu:
          cores: 4
          dedicatedCpuPlacement: true  # Pin CPUs for low latency
        memory:
          hugepages:
            pageSize: 1Gi  # Use 1Gi HugePages (matches cluster configuration)
          guest: 4Gi
        resources:
          requests:
            memory: 4Gi
          limits:
            memory: 4Gi
        devices:
          disks:
            - name: rootdisk
              disk:
                bus: virtio
            - name: cloudinitdisk
              disk:
                bus: virtio
          interfaces:
            # Primary interface: Pod network (for management)
            - name: default
              masquerade: {}
            # Secondary interface: User Defined Network (for high-performance data)
            - name: high-perf-net
              bridge: {}
          networkInterfaceMultiqueue: true  # Enable multi-queue for better performance
      networks:
        # Pod network for management traffic
        - name: default
          pod: {}
        # User Defined Network for data plane traffic
        - name: high-perf-net
          multus:
            networkName: vm-high-perf-network
      volumes:
        - name: rootdisk
          dataVolume:
            name: high-perf-vm-lab-rootdisk
        - name: cloudinitdisk
          cloudInitNoCloud:
            userData: |
              #cloud-config
              user: fedora
              password: fedora
              chpasswd: { expire: False }
              runcmd:
                - nmcli con add type ethernet con-name eth1 ifname eth1 ip4 192.168.100.10/24
                - nmcli con up eth1
EOF

Lab VM Configuration Explained:

Disk Configuration (DataVolume):
- Uses dataVolumeTemplates to create persistent disk
- Source: VolumeSnapshot from openshift-virtualization-os-images
- Pre-installed Fedora image (fast boot, even without KVM)
- 30Gi storage allocation
- Why not containerDisk? containerDisk is slow without KVM hardware virtualization
Dual Network Interfaces (same as production SR-IOV):
- default: Pod network for management (SSH, monitoring)
- high-perf-net: User Defined Network for data plane
Performance Optimizations (Educational Examples):
- dedicatedCpuPlacement: true - Pins CPUs to VM (requires KVM for full benefit)
- hugepages: pageSize: 1Gi - Uses 1Gi HugePages (matches cluster config from Module 4)
- resources: requests/limits: 4Gi - Guarantees memory allocation
- networkInterfaceMultiqueue: true - Parallel packet processing (4 queues per interface)
- bridge: {} - Direct bridge attachment (better than masquerade)
HugePages Configuration:
- VM requests 4GB guest memory
- Uses 4 × 1Gi HugePages (matches Performance Profile)
- Plus ~1GB virt-launcher overhead = ~5GB total
- Must match cluster’s HugePages size (1Gi from Module 4)
- Note: HugePages work with or without KVM, but provide best performance with KVM
Cloud-init Configuration:
- Creates user fedora with password fedora
- Automatically configures eth1 with static IP (192.168.100.10/24)
- Sets up network interface on boot
- Ready for testing immediately

This simulates SR-IOV architecture without special hardware!

Note: This VM demonstrates performance features (HugePages, CPU pinning, multi-queue) that are typically used with KVM hardware virtualization. The VM will boot and run successfully even without KVM (using software emulation), but performance features provide maximum benefit when KVM is available.

Wait for the DataVolume to be created and the VM to start:

# Check DataVolume creation progress
echo "=== DataVolume Status ==="
oc get dv high-perf-vm-lab-rootdisk -n default

# Wait for DataVolume to be ready (cloning from snapshot)
echo ""
echo "Waiting for DataVolume to be ready (this may take 1-2 minutes)..."
oc wait --for=condition=Ready dv/high-perf-vm-lab-rootdisk -n default --timeout=300s

# Check VM status
echo ""
echo "=== VM Status ==="
oc get vm high-perf-vm-lab -n default
oc get vmi high-perf-vm-lab -n default

DataVolume Creation Process:

When you create a VM with dataVolumeTemplates, OpenShift Virtualization:

Creates a DataVolume - Persistent storage for the VM
Clones from VolumeSnapshot - Copies the Fedora image from the snapshot
Creates a PVC - Persistent Volume Claim for the disk
Starts the VM - Once the DataVolume is ready

This process takes 1-2 minutes but results in a fast-booting VM with persistent storage.

Advantages over containerDisk: - ✅ Faster boot (pre-installed image) - ✅ Persistent storage (survives VM restarts) - ✅ Works well without KVM hardware virtualization - ✅ Same image used by OpenShift Console VM wizard

Verify the VM has dual network interfaces:

# Wait for VM to be running
oc wait --for=condition=Ready vmi/high-perf-vm-lab --timeout=300s

# Check VM network interfaces
oc get vmi high-perf-vm-lab -o jsonpath='{.status.interfaces}' | jq

# Verify both networks are attached
echo "VM Network Configuration:"
oc get vmi high-perf-vm-lab -o jsonpath='{.spec.networks}' | jq

# Check that VM has both pod network and user defined network
oc describe vmi high-perf-vm-lab | grep -A 10 "Interfaces"

Test the VM’s network performance:

# Use the VMI network tester to validate connectivity
python3 ~/low-latency-performance-workshop/scripts/module05-vmi-network-tester.py \
    --namespace default

# Access the VM to verify network interfaces
virtctl console high-perf-vm-lab

# Inside the VM, check network interfaces
ip addr show

# You should see:
# - eth0: Pod network interface (management) - 10.x.x.x
# - eth1: User Defined Network (high-performance) - 192.168.100.10

# Test connectivity on both interfaces
ping -c 4 -I eth0 8.8.8.8  # Management network
ping -c 4 -I eth1 192.168.100.1  # High-performance network

# Check interface statistics
ip -s link show eth0
ip -s link show eth1

Lab Simulation Performance Expectations:

Pod Network (eth0): 2-5ms latency, 5-20 Gbps throughput
User Defined Network (eth1): 1-3ms latency, 10-30 Gbps throughput
Improvement: 30-50% better latency than pod network alone

While not as fast as SR-IOV (<1ms), this demonstrates: - Dual-interface VM architecture - Network separation (control vs data plane) - Performance optimization techniques - Production-ready patterns

This is perfect for learning and lab environments!

Configuring SR-IOV for Virtual Machines (Production)

When to Use This Section:

You have SR-IOV capable hardware (Intel X710, Mellanox ConnectX-5, etc.)
The SR-IOV Network Operator detected SR-IOV capable nodes
You need <1ms latency for production workloads
You’re deploying NFV or real-time applications

For Lab Environments: Use the User Defined Networks approach above instead.

Unlike pods, VMs require specific SR-IOV network configuration to attach Virtual Functions directly to the VM.

Create an SR-IOV Network for VM use (production hardware required):

cat << EOF | oc apply -f -
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: vm-sriov-network
  namespace: openshift-sriov-network-operator
spec:
  resourceName: vm_sriov_net
  networkNamespace: default
  vlan: 100  # Optional: VLAN tagging
  capabilities: '{"ips": true, "mac": true}'
  # Important: This network will be used by VMs
  ipam: |
    {
      "type": "host-local",
      "subnet": "192.168.100.0/24",
      "rangeStart": "192.168.100.10",
      "rangeEnd": "192.168.100.100",
      "gateway": "192.168.100.1"
    }
EOF

This creates an SR-IOV network specifically for VM use. The resourceName must match the SR-IOV Network Node Policy configured in Module 2.

Create a high-performance VM with SR-IOV networking:

cat << EOF | oc apply -f -
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: high-performance-vm-sriov
  namespace: default
spec:
  running: true
  template:
    metadata:
      labels:
        kubevirt.io/vm: high-performance-vm-sriov
    spec:
      domain:
        cpu:
          cores: 4
          dedicatedCpuPlacement: true  # Pin CPUs for low latency
        memory:
          hugepages:
            pageSize: 2Mi  # Use HugePages
          guest: 4Gi
        devices:
          disks:
            - name: containerdisk
              disk:
                bus: virtio
            - name: cloudinitdisk
              disk:
                bus: virtio
          interfaces:
            # Primary interface: Pod network (for management)
            - name: default
              masquerade: {}
            # Secondary interface: SR-IOV (for high-performance data plane)
            - name: sriov-net
              sriov: {}
      networks:
        # Pod network for management traffic
        - name: default
          pod: {}
        # SR-IOV network for data plane traffic
        - name: sriov-net
          multus:
            networkName: vm-sriov-network
      volumes:
        - name: containerdisk
          containerDisk:
            image: quay.io/containerdisks/fedora:latest
        - name: cloudinitdisk
          cloudInitNoCloud:
            userData: |
              #cloud-config
              password: fedora
              chpasswd: { expire: False }
EOF

VM SR-IOV Configuration Explained:

Two Network Interfaces:
- default: Pod network for management (SSH, monitoring)
- sriov-net: SR-IOV for high-performance data traffic
Why Two Interfaces?
- Management traffic doesn’t need SR-IOV performance
- Data plane traffic gets direct hardware access
- Separates control and data planes
Performance Features:
- dedicatedCpuPlacement: true - Pins CPUs to VM
- hugepages - Reduces memory overhead
- sriov: {} - Attaches SR-IOV VF directly to VM

Verify the VM has SR-IOV networking:

# Wait for VM to be running
oc wait --for=condition=Ready vmi/high-performance-vm-sriov --timeout=300s

# Check VM network interfaces
oc get vmi high-performance-vm-sriov -o jsonpath='{.status.interfaces}' | jq

# Verify SR-IOV VF is attached
oc describe vmi high-performance-vm-sriov | grep -A 10 "Interfaces"

# Check that VM has both pod network and SR-IOV
echo "VM Network Configuration:"
oc get vmi high-performance-vm-sriov -o jsonpath='{.spec.networks}' | jq

Testing VM SR-IOV Network Performance

Now let’s test the network performance of the VM with SR-IOV to see the improvement over pod networking.

Access the VM and check network interfaces:

# Access the VM console
virtctl console high-performance-vm-sriov

# Inside the VM, check network interfaces
ip addr show

# You should see:
# - eth0: Pod network interface (management)
# - eth1: SR-IOV interface (high-performance)

# Check SR-IOV interface details
ethtool -i eth1

# Test network performance (requires iperf3 installed)
# From another VM or pod, run iperf3 server
# Then from this VM: iperf3 -c <server-ip> -i 1 -t 30

Use the VMI network tester to validate SR-IOV VM connectivity:

# Test networking to the SR-IOV-enabled VM
python3 ~/low-latency-performance-workshop/scripts/module05-vmi-network-tester.py \
    --namespace default

# This will test connectivity to VMs including SR-IOV-enabled ones
# Expected results:
# - Pod network interface: 2-5ms latency
# - SR-IOV interface: <1ms latency (if tested directly)

SR-IOV Performance Expectations for VMs:

Pod Network (eth0): 2-5ms latency, 5-20 Gbps throughput
SR-IOV Network (eth1): <1ms latency, near line-rate throughput

The SR-IOV interface provides 5-10x better latency and 2-5x better throughput compared to pod networking for VMs.

Network Policy Latency Testing

Network policies can impact VM networking performance. Let’s test network policy enforcement latency using kube-burner’s network policy latency measurement.

Create network policy latency test configuration adapted for SNO:

cd ~/kube-burner-configs

cat << EOF > network-policy-latency-config.yml
global:
  measurements:
    - name: netpolLatency

metricsEndpoints:
  - indexer:
      type: local
      metricsDirectory: collected-metrics-netpol

jobs:
  # Job 1: Create pods and namespaces (reduced scale for SNO)
  - name: network-policy-setup
    jobType: create
    jobIterations: 3  # Reduced for SNO
    namespace: network-policy-perf
    namespacedIterations: true
    cleanup: false
    podWait: true
    waitWhenFinished: true
    verifyObjects: true
    errorOnVerify: false
    namespaceLabels:
      kube-burner.io/skip-networkpolicy-latency: "true"
    objects:
      - objectTemplate: network-test-pod.yml
        replicas: 2  # Reduced for SNO
        inputVars:
          containerImage: registry.redhat.io/ubi8/ubi-minimal:latest

  # Job 2: Apply network policies and test connectivity
  - name: network-policy-test
    jobType: create
    jobIterations: 3  # Reduced for SNO
    namespace: network-policy-perf
    namespacedIterations: false
    cleanup: false
    podWait: false
    waitWhenFinished: true
    verifyObjects: true
    errorOnVerify: false
    jobPause: 30s  # Reduced pause for faster testing
    objects:
      - objectTemplate: ingress-network-policy.yml
        replicas: 1  # Reduced for SNO
        inputVars:
          namespaces: 3  # Reduced for SNO
EOF

Create the network test pod template:

cat << EOF > network-test-pod.yml
apiVersion: v1
kind: Pod
metadata:
  name: network-test-pod-{{.Iteration}}-{{.Replica}}
  labels:
    app: network-test
    iteration: "{{.Iteration}}"
    replica: "{{.Replica}}"
spec:
  # No nodeSelector for SNO - will schedule on the single node
  containers:
  - name: network-test-container
    image: {{.containerImage}}
    command: ["/bin/bash"]
    args: ["-c", "microdnf install -y httpd && echo 'Hello from pod {{.Iteration}}-{{.Replica}}' > /var/www/html/index.html && httpd -D FOREGROUND"]
    ports:
    - containerPort: 80
      protocol: TCP
    resources:
      requests:
        memory: "128Mi"  # Increased for httpd
        cpu: "100m"
      limits:
        memory: "256Mi"
        cpu: "200m"
    readinessProbe:
      httpGet:
        path: /
        port: 80
      initialDelaySeconds: 10
      periodSeconds: 5
  restartPolicy: Never
EOF

Create the ingress network policy template:

cat << EOF > ingress-network-policy.yml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: ingress-policy-{{.Iteration}}-{{.Replica}}
spec:
  podSelector:
    matchLabels:
      app: network-test
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: network-policy-perf-{{.Iteration}}
    - podSelector:
        matchLabels:
          app: network-test
    ports:
    - protocol: TCP
      port: 80  # Updated to match httpd default port
  # Allow egress for DNS resolution and package installation
  - from: []
    ports:
    - protocol: TCP
      port: 53
    - protocol: UDP
      port: 53
EOF

Run the network policy latency test:

  # Execute the network policy latency test adapted for SNO
echo "Starting network policy latency test..."
echo "   Test scale: 3 iterations × 2 replicas = 6 pods total"
echo "   Environment: Single Node OpenShift (SNO)"
echo ""

kube-burner init -c network-policy-latency-config.yml --log-level=info

  # This test will:
  # 1. Create pods in multiple namespaces (reduced scale for SNO)
  # 2. Apply network policies with ingress rules
  # 3. Measure network policy enforcement latency

Monitor network policy test progress:

  # Watch network policies being created (press Ctrl+C to exit)
echo "Monitoring network policy test progress..."
echo "   Use Ctrl+C to exit the watch command when test completes"
echo ""

watch -n 5 "echo '--- Network Policies ---' && oc get networkpolicy --all-namespaces | grep network-policy-perf && echo '' && echo '--- Test Pods ---' && oc get pods --all-namespaces | grep network-test"

Check test results after completion:

  # Check final network policy status
echo "📋 Final Network Policy Status:"
oc get networkpolicy --all-namespaces | grep network-policy-perf

  # Check pod status
echo ""
echo "📋 Test Pod Status:"
oc get pods --all-namespaces | grep network-test

  # Check if pods are ready and accessible
echo ""
echo "📊 Pod Readiness:"
oc get pods --all-namespaces -o custom-columns=NAME:.metadata.name,READY:.status.containerStatuses[0].ready,STATUS:.status.phase | grep network-test

Educational Analysis Scripts for Virtualization

The workshop provides educational scripts to help you understand VM vs container trade-offs and test VM networking.

VM vs Container Comparison - Educational comparison tool:
```
# Compare VMs and containers comprehensively
python3 ~/low-latency-performance-workshop/scripts/module05-vm-vs-container-comparison.py

# Disable colored output for documentation
python3 ~/low-latency-performance-workshop/scripts/module05-vm-vs-container-comparison.py --no-color
```
This script provides:
- Architecture and design differences explained
- Startup time comparison (VMs: 60-90s vs Containers: 3-10s)
- Resource usage and overhead analysis
- Isolation and security characteristics
- Networking performance comparison
- Use case guidance for choosing VMs vs containers

VMI Network Tester - Test networking against Virtual Machines:

# Test networking against all VMIs in the cluster
python3 ~/low-latency-performance-workshop/scripts/module05-vmi-network-tester.py

# Test VMIs in specific namespace
python3 ~/low-latency-performance-workshop/scripts/module05-vmi-network-tester.py \
    --namespace vmi-latency-test-0

# Skip educational explanations
python3 ~/low-latency-performance-workshop/scripts/module05-vmi-network-tester.py \
    --skip-explanation

This script tests:

VMI connectivity and reachability
Network latency TO virtual machines (not pods!)
VMI IP assignment and configuration
Network policy impact on VM traffic
Creates test pods that ping VMIs to measure performance

The module05-vmi-network-tester.py script specifically tests networking against VMs (VMIs) rather than pods. This is important because:

VMs have different networking characteristics than containers
VMI networking goes through the virt-launcher pod
Network policies apply differently to VM traffic
SR-IOV can bypass the pod network entirely

This script helps you understand and validate VM networking performance.

Analyzing Network Policy Latency Results with Python

Use the educational Python scripts to analyze network policy enforcement latency and understand its impact on VM networking performance.

Run the network policy performance analyzer:

cd ~/low-latency-performance-workshop/scripts

  # Run the educational network policy latency analyzer
echo "🔍 Analyzing Network Policy Performance Impact..."
python3 ~/low-latency-performance-workshop/scripts/module05-network-policy-analyzer.py \
    --metrics-dir ~/kube-burner-configs \
    --analysis-type latency

  # The script provides:
  # 1. Educational analysis of policy enforcement overhead
  # 2. Color-coded performance assessment
  # 3. Performance vs security trade-off explanations
  # 4. Recommendations for policy optimization

Generate comprehensive network policy performance insights:

cd ~/low-latency-performance-workshop/scripts

  # Create detailed educational analysis with report generation
echo "📊 Generating Comprehensive Network Policy Analysis..."
python3 ~/low-latency-performance-workshop/scripts/module05-network-policy-analyzer.py \
    --metrics-dir ~/kube-burner-configs \
    --analysis-type comprehensive \
    --output-format educational

  # This educational analysis includes:
  # • Statistical analysis of policy enforcement latency
  # • Performance vs security trade-off explanations
  # • Best practices for low-latency network policies
  # • Detailed markdown report with optimization strategies
  # • Educational insights about CNI performance impact

Performance Optimization Best Practices

VM Configuration Best Practices

CPU Optimization:
- Use dedicatedCpuPlacement: true for guaranteed CPU access
- Match VM vCPU count to NUMA topology
- Use host-model CPU model for compatibility (or host-passthrough if supported)
- Consider specific CPU models (e.g., Haswell-noTSX) for consistent behavior across environments
Memory Optimization:
- Configure HugePages for reduced TLB misses
- Align memory allocation with NUMA topology
- Disable memory overcommit for predictable performance
Storage Optimization:
- Use high-performance storage classes
- Configure appropriate I/O schedulers
- Consider local storage for ultra-low latency
Network Optimization:
- Use SR-IOV for direct hardware access
- Configure multiple network interfaces for traffic separation
- Optimize network policies for minimal overhead

Monitoring and Validation

Key Metrics to Monitor:
- VMI startup latency (target: < 90 seconds for SNO)
- Network policy enforcement latency (target: < 10 seconds for SNO)
- CPU utilization and isolation effectiveness
- Memory allocation and HugePages usage
Performance Validation Tools:
- kube-burner for comprehensive latency testing
- iperf3 for network throughput testing
- stress-ng for CPU and memory stress testing
- fio for storage performance testing

Module Summary

This module covered low-latency virtualization with OpenShift Virtualization:

✅ Verified OpenShift Virtualization deployment from Module 2
✅ Configured high-performance VMs with dedicated CPUs and HugePages
✅ Measured VMI startup latency using kube-burner’s vmiLatency measurement
✅ Tested network policy performance with netpolLatency measurement
✅ Compared VM vs container performance to understand trade-offs
✅ Implemented SR-IOV networking for ultra-low latency networking

Key Performance Insights

Metric	Without Performance Profile	With Performance Profile	Improvement
Fedora VMI Startup (P99)	90-150 seconds	60-90 seconds	~30-40% faster
Network Policy Latency (P99)	10-20 seconds	5-10 seconds	~50% faster
VM vs Pod Startup	15-25x slower	10-15x slower	Reduced overhead
CPU Consistency	Variable performance	Predictable performance	Eliminated jitter
Memory Latency	Standard pages	HugePages optimization	Reduced TLB misses

Metric

Without Performance Profile

With Performance Profile

Improvement

Fedora VMI Startup (P99)

90-150 seconds

60-90 seconds

~30-40% faster

Network Policy Latency (P99)

10-20 seconds

5-10 seconds

~50% faster

VM vs Pod Startup

15-25x slower

10-15x slower

Reduced overhead

CPU Consistency

Variable performance

Predictable performance

Eliminated jitter

Memory Latency

Standard pages

HugePages optimization

Reduced TLB misses

Key Architectural Learning Points

VirtualMachine vs VirtualMachineInstance Usage Patterns:

Use Case	Object Type	Management	Best For
Production Workloads	VirtualMachine	Full lifecycle management	Long-running VMs, interactive use
Performance Testing	VirtualMachineInstance	Direct creation, ephemeral	Automated testing, precise metrics
Development/Testing	VirtualMachine	Start/stop capability	Development environments
Latency Measurement	VirtualMachineInstance	No controller overhead	Pure hypervisor performance

Use Case

Object Type

Management

Best For

Production Workloads

VirtualMachine

Full lifecycle management

Long-running VMs, interactive use

Performance Testing

VirtualMachineInstance

Direct creation, ephemeral

Automated testing, precise metrics

Development/Testing

VirtualMachine

Start/stop capability

Development environments

Latency Measurement

VirtualMachineInstance

No controller overhead

Pure hypervisor performance

What You Learned: * ✅ Architecture: VMs create and manage VMIs, but VMIs can exist independently * ✅ Performance Testing: Direct VMI creation eliminates management overhead * ✅ Measurement Precision: kube-burner measures pure hypervisor startup time * ✅ Real-world Usage: Production typically uses VMs for lifecycle management

Performance Profile Impact on VMs

The performance improvements from Module 4 are even more significant for VMs than containers because:

CPU Isolation: VMs benefit greatly from dedicated CPU cores without interference
HugePages: VM memory management sees substantial improvement with large pages
NUMA Alignment: VM memory and CPU locality reduces cross-NUMA penalties
Reduced Jitter: Consistent performance is critical for VM workloads

Consider completing Module 4 to see these benefits in action!

SNO Environment Considerations

Performance Characteristics: - Single Node: All workloads compete for the same resources - Control Plane Overhead: Master components consume CPU and memory - Storage Limitations: Single storage backend affects VM boot times - Network Simplicity: Reduced network complexity but shared bandwidth

Optimization Strategies: - Resource Allocation: Careful CPU and memory allocation for VMs - Test Scaling: Reduced test scale to prevent resource exhaustion - Performance Profiles: Even more important in resource-constrained environments - Monitoring: Close monitoring of resource utilization during tests

Troubleshooting Common Issues

PVC Binding Conflicts:

  # Check for PVC binding issues across all namespaces
oc get events --all-namespaces | grep -i "bound incorrectly"

  # Clean up orphaned PVCs if needed
oc get pvc --all-namespaces | grep -E "(Pending|Lost)"

VM Startup Issues:

  # Check VM status and events
oc describe vm <vm-name> -n <namespace>

  # Check DataVolume import progress
oc get dv -n <namespace> -w

  # Check CDI operator logs if DataVolume import fails
oc logs -n openshift-cnv deployment/cdi-deployment

  # Check virt-launcher pod logs for VM startup issues
oc logs -n <namespace> -l kubevirt.io/created-by=<vm-name>

CPU Model Compatibility Issues:

  # If you see "unsupported configuration: CPU mode 'host-passthrough'" error:

  # Check available CPU models
oc get nodes -o jsonpath='{.items[0].status.nodeInfo.machineID}'

  # The workshop uses 'host-model' for better compatibility
  # If issues persist, you can use a specific CPU model:
  # model: "Haswell-noTSX" or model: "Skylake-Client"

  # Check hypervisor capabilities
oc debug node/<node-name> -- chroot /host cat /proc/cpuinfo | head -20

Resource Constraints:

  # Monitor node resource usage during tests
oc adm top nodes

  # Check for resource pressure
oc describe node <node-name> | grep -A 10 "Conditions:"

Workshop Progress

✅ Module 1: Low-latency fundamentals and concepts
✅ Module 2: RHACM and GitOps deployment automation
✅ Module 3: Baseline performance measurement and analysis
✅ Module 4: Performance tuning with CPU isolation (optional but recommended)
✅ Module 5: Low-latency virtualization with OpenShift Virtualization (current)
🎯 Next: Module 6 - Monitoring, alerting, and continuous validation

Performance Comparison Opportunity

If you completed this module without performance profiles from Module 4: 1. Record your current VMI performance results from the Python analysis 2. Go back and complete Module 4 to configure performance profiles 3. Return and re-run the VMI tests to see the performance improvement 4. Compare the results to understand the impact of performance tuning on virtualization

This approach provides valuable insights into the performance benefits of proper cluster tuning for virtualized workloads.

Next Steps

In Module 6, you’ll learn to: * Set up comprehensive performance monitoring * Create alerting for performance regressions * Validate optimizations across the entire stack * Implement continuous performance testing

Knowledge Check

What are the key differences between VM and container startup latency in terms of performance characteristics?
How does SR-IOV improve network performance for VMs compared to traditional networking?
What network policy latency thresholds are acceptable for production workloads in SNO environments?
How do you configure a VM for maximum CPU performance using dedicated CPU placement?
What are the trade-offs between VM isolation and performance overhead?

Module 5: Low-Latency Virtualization

Module Overview

Prerequisites

Key Learning Objectives

OpenShift Virtualization Overview

Architecture Components

Verifying OpenShift Virtualization Installation

VM Optimization for Low-Latency

Understanding VM Performance Characteristics

Low-Latency VM Configuration

CPU Optimization

Memory Optimization

Creating VMs for Performance Testing

VMI Latency Testing with Kube-burner

Troubleshooting VMI Failures

Analyzing VMI Latency Results

SR-IOV Configuration for High-Performance VM Networking

Choosing Your Networking Approach

Understanding SR-IOV Benefits for VMs

Verifying SR-IOV Network Operator

Lab Simulation: High-Performance VM Networking with User Defined Networks

Clean Up Previous Test VMIs

Create User Defined Network

Configuring SR-IOV for Virtual Machines (Production)

Testing VM SR-IOV Network Performance

Network Policy Latency Testing

Educational Analysis Scripts for Virtualization

Analyzing Network Policy Latency Results with Python

Performance Optimization Best Practices

VM Configuration Best Practices

Monitoring and Validation

Module Summary

Key Performance Insights

Key Architectural Learning Points

SNO Environment Considerations

Troubleshooting Common Issues

Workshop Progress

Next Steps

Knowledge Check

Additional Resources