Skip to main content

Lab 2: Local Fake GPU Setup

BeginnerDuration: about 30 minutesEnvironment: macOS (OrbStack) ยท Linux (Ubuntu + kind) ยท no GPU requiredCost: freeVerified: 2026-05-21By: @rootsongjc, @maishivamhoo123

This lab walks you through setting up a fully local Kubernetes cluster โ€” using either OrbStack (macOS) or kind (Linux/Ubuntu) โ€” together with run-ai/fake-gpu-operator, then installing HAMi online.

This lab does not require a real NVIDIA GPU. It is designed for classroom preparation, understanding HAMi component architecture, verifying GPU Pod scheduling workflows, and quickly getting familiar with basic HAMi usage on a personal computer.

What You'll Getโ€‹

After completing this lab, you will have a local Kubernetes cluster with:

  • fake-gpu-operator simulating nvidia.com/gpu resources on CPU nodes
  • HAMi scheduler, admission webhook, and other control plane components running normally
  • Regular Pods can request simulated GPUs via nvidia.com/gpu
  • You can observe the complete chain from fake GPU resource discovery, Pod request, scheduling, to running
note

fake GPU does not represent real GPU memory isolation, compute isolation, CUDA runtime, or driver capabilities. This lab is for understanding HAMi components and basic scheduling workflows. Real memory slicing (nvidia.com/gpumem), compute limits (nvidia.com/gpucores), CUDA program execution, and performance isolation still require a real NVIDIA GPU environment.

Installation Overviewโ€‹

The entire local installation process consists of 7 steps:

Local Fake GPU Installation Overview
StepPurposeWhat It Solves
Set Up & Verify EnvironmentCreate/verify cluster, check kubectl and HelmEnsure Kubernetes cluster is available
Install fake-gpu-operatorSimulate NVIDIA GPU resourcesAllow nodes without GPUs to report nvidia.com/gpu
Install HAMiDeploy HAMi control planeObserve HAMi scheduler, webhook, and other components
Install PrometheusDeploy monitoring stackCollect GPU metrics and provide data source for HAMi WebUI
Run Simulated GPU WorkloadsVerify scheduling workflowExperience Pod GPU request and scheduling
Install HAMi WebUIDeploy visual management interfaceGraphically view GPU nodes, resource allocation, and usage trends
Observe HAMi and fake GPUUnderstand component responsibility boundariesClarify which capabilities require real GPUs

Prerequisitesโ€‹

  • macOS, Intel or Apple Silicon
  • OrbStack installed with built-in Kubernetes enabled
  • Access to GitHub, GHCR, and the HAMi Helm repository
  • At least 4 CPU and 8 GB of memory available
Why OrbStack?

OrbStack comes with built-in Kubernetes (based on k3s), so there's no need to install kind or Docker Desktop separately. It uses fewer resources, starts faster, and is the preferred choice for local labs on macOS.

Check Helm (needed later for installing fake-gpu-operator and HAMi):

helm version

If Helm is not installed:

brew install helm

Step 1: Set Up and Verify Local Environmentโ€‹

OrbStack's Kubernetes starts automatically once you enable it in the OrbStack UI. Verify the cluster is ready:

kubectl version

Example output:

Client Version: v1.33.9
Kustomize Version: v5.6.0
Server Version: v1.33.9+orb1
note

The +orb1 suffix in Server Version identifies OrbStack's built-in Kubernetes distribution.

View cluster nodes:

kubectl get nodes -o wide

Example output:

NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
orbstack Ready control-plane,master 148d v1.33.9+orb1 192.168.139.2 <none> OrbStack 7.0.5-orbstack-00330-ge3df4e19b0a0-dirty docker://29.4.0

Set the NODE_NAME Variableโ€‹

The rest of this lab uses a NODE_NAME shell variable to avoid hard-coding the node name. Set it once here; all subsequent commands consume it automatically:

NODE_NAME=$(kubectl get nodes -o jsonpath='{.items[0].metadata.name}')
echo "NODE_NAME=${NODE_NAME}"
PlatformExample value
macOSorbstack
Linuxhami-lab-control-plane

Steps 2โ€“7 are identical on both platforms

From here on, all commands work the same on macOS and Linux. The only difference you will notice is your node name in example outputs.

Step 2: Install fake-gpu-operatorโ€‹

fake-gpu-operator simulates GPU resources on nodes without NVIDIA GPUs and sets the node capacity to nvidia.com/gpu. This step replaces the NVIDIA GPU Operator, drivers, device-plugin, and DCGM metrics collection pipeline that would be present in a real GPU environment.

2.1 Create Namespace and Set Security Policyโ€‹

kubectl create namespace gpu-operator
kubectl label namespace gpu-operator pod-security.kubernetes.io/enforce=privileged
namespace/gpu-operator created
namespace/gpu-operator labeled

The gpu-operator namespace is dedicated to fake-gpu-operator components. The privileged label allows Pods to run in privileged mode; the fake-gpu-operator's device-plugin needs access to host device files.

2.2 Label the Nodeโ€‹

fake-gpu-operator uses node labels to determine which nodes should simulate GPUs:

kubectl label node ${NODE_NAME} run.ai/simulated-gpu-node-pool=default
node/<NODE_NAME> labeled

The label run.ai/simulated-gpu-node-pool=default tells fake-gpu-operator: "Simulate GPUs on this node."

2.3 Install fake-gpu-operatorโ€‹

export FAKE_GPU_OPERATOR_VERSION=0.0.80

helm upgrade -i gpu-operator \
oci://ghcr.io/run-ai/fake-gpu-operator/fake-gpu-operator \
--namespace gpu-operator \
--create-namespace \
--version ${FAKE_GPU_OPERATOR_VERSION}
Release "gpu-operator" does not exist. Installing it now.
Pulled: ghcr.io/run-ai/fake-gpu-operator/fake-gpu-operator:0.0.80
Digest: sha256:...
NAME: gpu-operator
LAST DEPLOYED: ...
NAMESPACE: gpu-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None

helm upgrade -i installs if the release doesn't exist, upgrades if it does. The oci:// prefix pulls the Helm Chart from GitHub Container Registry. 0.0.80 is a stable version released in 2026-04.

2.4 Wait for Components to Runโ€‹

kubectl get pods -n gpu-operator

Example output:

NAME READY STATUS RESTARTS AGE
device-plugin-l8m6j 1/1 Running 0 31m
kwok-gpu-device-plugin-5996cdf4f9-mfvpm 1/1 Running 0 31m
nvidia-dcgm-exporter-knfsn 1/1 Running 0 31m
nvidia-dcgm-exporter-kwok-b8fd4976-blb8c 1/1 Running 0 31m
status-updater-59965d7bc6-fbkmk 1/1 Running 0 31m
topology-server-9d57b6c79-7dv6h 1/1 Running 0 31m

Component descriptions:

  • device-plugin: DaemonSet running on each GPU node, reporting GPU resources to Kubernetes
  • kwok-gpu-device-plugin: Uses KWOK (Kubernetes WithOut Kubelet) to simulate GPU devices
  • nvidia-dcgm-exporter: Simulates DCGM metrics export (GPU temperature, utilization, etc.)
  • status-updater: Updates node GPU status
  • topology-server: Manages GPU topology information

If all Pods are Running with READY at 1/1, the installation is successful.

2.5 Verify Simulated GPU Resourcesโ€‹

Check whether the node reports GPU capacity:

kubectl get node ${NODE_NAME} \
-o custom-columns=NAME:.metadata.name,GPU:.status.capacity.nvidia\\.com/gpu
NAME GPU
<NODE_NAME> 2

The GPU column shows 2, meaning fake-gpu-operator has simulated 2 GPUs on the node.

View the node's detailed GPU labels:

kubectl get node ${NODE_NAME} --show-labels | tr ',' '\n' | grep -E 'nvidia.com/gpu|run.ai'
nvidia.com/gpu.count=2
nvidia.com/gpu.deploy.dcgm-exporter=true
nvidia.com/gpu.deploy.device-plugin=true
nvidia.com/gpu.memory=11441
nvidia.com/gpu.present=true
nvidia.com/gpu.product=Tesla-K80
run.ai/fake.gpu=true
run.ai/simulated-gpu-node-pool=default

These labels simulate a real GPU node:

  • nvidia.com/gpu.count=2: 2 GPUs simulated
  • nvidia.com/gpu.product=Tesla-K80: Simulated GPU model
  • nvidia.com/gpu.memory=11441: 11441 MiB VRAM per GPU
  • run.ai/fake.gpu=true: Marks this as a simulated GPU

If the GPU column is empty, confirm the node label was applied:

kubectl get node ${NODE_NAME} --show-labels | grep run.ai/simulated-gpu-node-pool

Step 3: Install HAMiโ€‹

This step installs HAMi's control plane components:

  • hami-scheduler: Scheduling enhancement component that participates in GPU Pod scheduling decisions
  • Admission webhook: Automatically rewrites GPU Pod scheduler configuration
  • Helm release: Manages all HAMi-related Kubernetes resources

In the fake GPU environment, GPU resources are provided by fake-gpu-operator. To avoid two device-plugins simultaneously registering nvidia.com/gpu, this lab does not let the HAMi device-plugin take over the fake nodes.

3.1 Add the HAMi Helm Repositoryโ€‹

helm repo add hami-charts https://project-hami.github.io/HAMi/
helm repo update
"hami-charts" has been added to your repositories
...Successfully got an update from the "hami-charts" chart repository
Update Complete. โŽˆHappy Helming!โŽˆ

3.2 Install HAMiโ€‹

helm install hami hami-charts/hami \
-n kube-system \
--set devicePlugin.enabled=false
NAME: hami
LAST DEPLOYED: ...
NAMESPACE: kube-system
STATUS: deployed
REVISION: 1
TEST SUITE: None

--set devicePlugin.enabled=false is the key parameter. Because fake-gpu-operator is already managing GPU devices, starting HAMi's device-plugin as well would cause a conflict. Only HAMi's scheduling enhancement components are installed here.

3.3 Verify HAMi Componentsโ€‹

kubectl get pods -n kube-system | grep hami
hami-scheduler-5d9678f989-dnf65 2/2 Running 0 28m

2/2 indicates this Pod has 2 containers (scheduler + webhook), both running normally.

View HAMi's control plane resources:

kubectl get deploy,svc,cm,sa -n kube-system | grep hami
deployment.apps/hami-scheduler 1/1 1 1 28m
service/hami-scheduler NodePort 192.168.194.156 <none> 443:31998/TCP,31993:31993/TCP 28m
configmap/hami-scheduler 1 28m
configmap/hami-scheduler-device 1 28m
serviceaccount/hami-scheduler 0 28m

3.4 Confirm All Helm Releasesโ€‹

helm list -A
NAME NAMESPACE REVISION STATUS CHART APP VERSION
gpu-operator gpu-operator 1 deployed fake-gpu-operator-0.0.80 0.0.80
hami kube-system 1 deployed hami-2.9.0 2.9.0

Both Helm Releases show deployed status.

Step 4: Install Prometheusโ€‹

HAMi WebUI needs to read GPU metrics from Prometheus. This step deploys a complete monitoring stack using kube-prometheus-stack.

Why install Prometheus?

HAMi WebUI's cluster overview, GPU utilization, memory usage, and other chart data all come from Prometheus. Without Prometheus, the WebUI can only display blank pages.

4.1 Add the Helm Repositoryโ€‹

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

4.2 Install kube-prometheus-stackโ€‹

helm install prometheus prometheus-community/kube-prometheus-stack \
-n monitoring --create-namespace \
--set grafana.enabled=false \
--version=75.15.1
NAME: prometheus
LAST DEPLOYED: ...
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1

--set grafana.enabled=false skips Grafana installation. This lab only uses Prometheus as the data source for HAMi WebUI. Remove this parameter if you want Grafana for richer visualizations.

4.3 Wait for Prometheus to Be Readyโ€‹

kubectl get pods -n monitoring

Example output:

NAME READY STATUS RESTARTS AGE
alertmanager-prometheus-kube-prometheus-alertmanager-0 2/2 Running 0 2m
prometheus-kube-prometheus-operator-d89fb8945-htjjd 1/1 Running 0 2m
prometheus-kube-state-metrics-7f5f75c85d-mbsbh 1/1 Running 0 2m
prometheus-prometheus-kube-prometheus-prometheus-0 2/2 Running 0 2m
prometheus-prometheus-node-exporter-77pxd 1/1 Running 0 2m

4.4 Create ServiceMonitor to Collect GPU Metricsโ€‹

The ServiceMonitors included with kube-prometheus-stack do not cover fake-gpu-operator's GPU metrics. Create one manually:

kubectl apply -f - <<'EOF'
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: nvidia-dcgm-exporter
namespace: gpu-operator
labels:
release: prometheus
spec:
selector:
matchLabels:
app: nvidia-dcgm-exporter
namespaceSelector:
matchNames:
- gpu-operator
endpoints:
- port: gpu-metrics
path: /metrics
interval: 15s
EOF
servicemonitor.monitoring.coreos.com/nvidia-dcgm-exporter created

The ServiceMonitor tells Prometheus: "Go to the gpu-operator namespace, find the Service with label app: nvidia-dcgm-exporter, and scrape /metrics from its gpu-metrics port every 15 seconds." The release: prometheus label is required by kube-prometheus-stack's ServiceMonitor selector.

Wait about 30 seconds, then verify GPU metrics are being collected:

kubectl exec -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 -- \
promtool query instant http://localhost:9090 'DCGM_FI_DEV_GPU_UTIL'
DCGM_FI_DEV_GPU_UTIL{..., device="nvidia1", ..., modelName="Tesla-K80", ...} => 0
DCGM_FI_DEV_GPU_UTIL{..., device="nvidia0", ..., modelName="Tesla-K80", ...} => 0

Seeing DCGM_FI_DEV_GPU_UTIL data means Prometheus is collecting fake GPU metrics. A utilization of 0 is normal โ€” there are no real GPU compute tasks running.

Step 5: Run Simulated GPU Workloadsโ€‹

Verify that Kubernetes can schedule Pods requesting nvidia.com/gpu to the fake GPU node. fake-gpu-operator injects a simulated nvidia-smi tool into GPU Pods for observing GPU visibility.

Since this lab does not enable the HAMi device-plugin, the test Pod explicitly bypasses the HAMi webhook and uses the default Kubernetes scheduler with the simulated GPU resources provided by fake-gpu-operator.

5.1 Create a Test Podโ€‹

Review the Pod YAML before applying:

apiVersion: v1
kind: Pod
metadata:
name: fake-gpu-pod
labels:
hami.io/webhook: ignore
annotations:
run.ai/simulated-gpu-utilization: "10-30"
spec:
restartPolicy: Never
containers:
- name: app
image: ubuntu:22.04
command: [ "bash", "-lc", "sleep 3600" ]
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"
nvidia.com/gpu: 1
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName

YAML key points:

  • hami.io/webhook: ignore: Tells the HAMi webhook not to intercept this Pod, using the default scheduler instead
  • run.ai/simulated-gpu-utilization: "10-30": fake-gpu-operator will make nvidia-smi report 10%โ€“30% GPU utilization
  • resources.limits.nvidia.com/gpu: 1: Requests 1 GPU
  • sleep 3600: Keeps the container running for 1 hour for observation

Apply the Pod:

kubectl apply -f fake-gpu-pod.yaml
pod/fake-gpu-pod created

5.2 Wait for the Pod to Runโ€‹

kubectl get pod fake-gpu-pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
fake-gpu-pod 1/1 Running 0 7m <pod-ip> <NODE_NAME> <none> <none>

STATUS: Running and your node name in the NODE column confirm the Pod was successfully scheduled. If pulling ubuntu:22.04 for the first time, it may take a few dozen seconds.

5.3 View the Pod's GPU Resource Requestโ€‹

kubectl describe pod fake-gpu-pod | grep -A6 "Limits"
Limits:
nvidia.com/gpu: 1
Requests:
nvidia.com/gpu: 1

5.4 View Node GPU Resource Allocationโ€‹

kubectl describe node ${NODE_NAME} | grep -A10 "Allocated resources"
Allocated resources:
Resource Requests Limits
-------- -------- ------
cpu 750m (7%) 1700m (17%)
memory 870Mi (10%) 1996Mi (24%)
nvidia.com/gpu 1 1

The nvidia.com/gpu row shows both Requests and Limits as 1, meaning 1 GPU is occupied. The node has 2 GPUs total, so 1 more can be allocated.

5.5 Execute Simulated nvidia-smiโ€‹

Run nvidia-smi inside the Pod to confirm fake-gpu-operator successfully injected the simulated GPU tool:

kubectl exec fake-gpu-pod -- nvidia-smi
Thu May 21 08:44:31 2026
+------------------------------------------------------------------------------+
| NVIDIA-SMI 470.129.06 Driver Version: 470.129.06 CUDA Version: 11.4 |
+--------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
+--------------------------------+----------------------+----------------------+
| 0 Tesla-K80 Off | 00000001:00:00.0 Off | Off |
| N/A 33C P8 11W / 70W | 11441MiB / 11441MiB | 18% Default |
| | | N/A |
+--------------------------------+----------------------+----------------------+

+------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
+------------------------------------------------------------------------------+
| 0 N/A N/A 23 G sleep 3600 11441MiB |
+------------------------------------------------------------------------------+

Output interpretation:

  • GPU 0: Tesla-K80: Simulated GPU model, consistent with node label nvidia.com/gpu.product=Tesla-K80
  • 11441MiB / 11441MiB: Memory used/total, consistent with nvidia.com/gpu.memory=11441
  • GPU-Util: 18%: Within the range specified by run.ai/simulated-gpu-utilization: "10-30"
  • Processes: sleep 3600: The running process visible on the simulated GPU

This output is identical on macOS and Linux โ€” fake-gpu-operator produces the same simulation on both platforms.

Step 6: Install HAMi WebUIโ€‹

HAMi WebUI provides a graphical GPU resource management interface.

6.1 Add the HAMi WebUI Helm Repositoryโ€‹

helm repo add hami-webui https://Project-HAMi.github.io/HAMi-WebUI/
helm repo update

6.2 Add GPU Label to the Nodeโ€‹

HAMi WebUI discovers GPU nodes via the gpu=on node label:

kubectl label node ${NODE_NAME} gpu=on

6.3 Add Simulated GPU Registration Informationโ€‹

In a real environment, the HAMi device-plugin automatically writes the hami.io/node-nvidia-register annotation. Because this lab disables the device-plugin, add it manually:

kubectl annotate node ${NODE_NAME} \
hami.io/node-nvidia-register='[{"id":"GPU-3cef3724-8228-5a66-b391-b0901788f5d0","count":10,"devmem":11441,"devcore":100,"type":"NVIDIA-Tesla-K80","mode":"hami-core","health":true},{"id":"GPU-5127182e-f297-5a25-bb44-0444c3be540c","index":1,"count":10,"devmem":11441,"devcore":100,"type":"NVIDIA-Tesla-K80","mode":"hami-core","health":true}]' \
hami.io/node-handshake="Requesting_$(date '+%Y.%m.%d %H:%M:%S')"

One JSON object per GPU. id is the device UUID, count is vGPU partitions per card (default 10), devmem is VRAM in MiB, devcore is compute capacity in %, mode is hami-core for software-level partitioning.

6.4 Install HAMi WebUIโ€‹

helm install my-hami-webui hami-webui/hami-webui \
--set externalPrometheus.enabled=true \
--set externalPrometheus.address="http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090" \
--set dcgm-exporter.enabled=false \
-n kube-system

6.5 Wait for WebUI to Be Readyโ€‹

kubectl get pods -n kube-system | grep webui
my-hami-webui-85686fd65-77crx 2/2 Running 0 2m

2/2 indicates both the frontend and backend containers are running normally.

6.6 Access the WebUIโ€‹

kubectl -n kube-system port-forward svc/my-hami-webui 8080:3000

Open http://localhost:8080/admin/vgpu/monitor/overview in your browser.

HAMi WebUI Cluster Overview

Click Node Management in the left sidebar to view GPU node details:

HAMi WebUI Node Management

Step 7: Observe the Boundaries of HAMi and fake GPUโ€‹

What HAMi Is Responsible for in This Labโ€‹

kubectl get deploy,svc,cm,sa -n kube-system | grep hami

HAMi only deployed the scheduler in this lab. Since devicePlugin.enabled=false was set, HAMi's core GPU slicing capability is not enabled.

What fake-gpu-operator Is Responsible for in This Labโ€‹

kubectl get daemonset,deploy,pod -n gpu-operator

fake-gpu-operator handles the full lifecycle of simulated devices: device discovery, resource reporting, and metrics export.

View node capacity to confirm registration:

kubectl describe node ${NODE_NAME} | grep -A10 "Capacity:"
Capacity:
cpu: 10
ephemeral-storage: 148577276Ki
memory: 8185404Ki
nvidia.com/gpu: 2
pods: 110

What This Lab Cannot Verifyโ€‹

Real GPU required for the following
  • HAMi device-plugin actually registering GPUs and writing hami.io/node-nvidia-register
  • nvidia.com/gpumem memory slicing
  • nvidia.com/gpucores compute ratio limits
  • CUDA programs actually running
  • Memory overcommit, memory analysis, memory override
  • Real DCGM GPU metrics

To continue with these capabilities, use a real GPU environment as described in Lab 1: Online HAMi Installation.

Cleanupโ€‹

Delete the test Pod:

kubectl delete pod fake-gpu-pod

Uninstall HAMi WebUI, HAMi, and Prometheus:

helm uninstall my-hami-webui -n kube-system
helm uninstall hami -n kube-system
helm uninstall prometheus -n monitoring
kubectl delete namespace monitoring

Uninstall fake-gpu-operator:

helm uninstall gpu-operator -n gpu-operator
kubectl delete namespace gpu-operator

Clean up node labels and annotations:

kubectl label node ${NODE_NAME} gpu- run.ai/simulated-gpu-node-pool-
kubectl annotate node ${NODE_NAME} hami.io/node-nvidia-register- hami.io/node-handshake-

To stop the Kubernetes cluster, disable it from the OrbStack UI, or quit OrbStack entirely.

tip

If you want to keep the environment for further experimentation, skip the cleanup steps.

Next Stepsโ€‹

After completing this lab, we recommend continuing with HAMi Cluster Architecture, focusing on understanding the responsibility boundaries of the scheduler, device-plugin, webhook, and GPU Operator.

CNCFHAMi is a CNCF Sandbox project