Skip to main content

From Device Plugin to DRA: GPU Scheduling Paradigm Upgrade and HAMi-DRA Practice Review

Author: HAMi Community
Published: 3/23/2026

KCD Beijing 2026 was one of the largest Kubernetes community events in recent years.

Over 1,000 people registered, setting a new record for KCD Beijing.

The HAMi community not only gave a technical talk but also set up a booth, engaging deeply with developers and enterprise users from the cloud-native and AI infrastructure fields.

The topic of this talk was:

From Device Plugin to DRA: GPU Scheduling Paradigm Upgrade and HAMi-DRA Practice

This article combines the on-site presentation and slides for a more complete technical review. Slides download: GitHub - HAMi-DRA KCD Beijing 2026.

HAMi Community at the Event​

The talk was delivered by two core contributors of the HAMi community:

  • Wang Jifei (Dynamia, HAMi Approver, main HAMi-DRA contributor)
  • James Deng (Fourth Paradigm, HAMi Reviewer)

They have long focused on:

  • GPU scheduling and virtualization
  • Kubernetes resource models
  • Heterogeneous compute management

At the booth, the HAMi community discussed with attendees questions such as:

  • Is Kubernetes really suitable for AI workloads?
  • Should GPUs be treated as "scheduling resources" rather than "devices"?
  • How to introduce DRA without breaking the ecosystem?

Event Recap​

Main conference hall

Attendee registration

Attendees visiting the HAMi booth

Volunteers stamping for attendees

Wang Jifei presenting

James Deng presenting

GPU Scheduling Paradigm is Changing​

The core of this talk extends beyond DRA itself, pointing to a broader shift:

GPUs are evolving from "devices" to "resource objects".

1. The Ceiling of Device Plugin​

The problem with the traditional model is its limited expressiveness:

  • Can only describe "quantity" (nvidia.com/gpu: 1)
  • Cannot express:
    • Multi-dimensional resources (memory / core / slice)
    • Multi-card combinations
    • Topology (NUMA / NVLink)

This directly leads to:

  • Scheduling logic leakage (extender / sidecar)
  • Increased system complexity
  • Limited concurrency

2. DRA: Leap in Resource Modeling​

DRA's core advantages are:

  • Multi-dimensional resource modeling
  • Complete device lifecycle management
  • Fine-grained resource allocation

Key change:

Resource requests move from Pod fields → independent ResourceClaim objects

Key Reality: DRA is Too Complex​

A key slide in the PPT, often overlooked:

DRA request looks like this​

spec:
devices:
requests:
- exactly:
allocationMode: ExactCount
capacity:
requests:
memory: 4194304k
count: 1

You also need to write a CEL selector:

device.attributes["gpu.hami.io"].type == "hami-gpu"

Compared to Device Plugin​

resources:
limits:
nvidia.com/gpu: 1

The conclusion is clear:

DRA is an upgrade in capability, but UX is clearly degraded.

HAMi-DRA's Key Breakthrough: Automation​

One of the most valuable parts of this talk:

Webhook Automatically Generates ResourceClaim​

HAMi's approach is not to have users "use DRA directly", but:

Let users keep using Device Plugin, and the system automatically converts to DRA

How it works​

Input (user):

nvidia.com/gpu: 1
nvidia.com/gpumemory: 4000

↓

Webhook conversion:

  • Generate ResourceClaim
  • Build CEL selector
  • Inject device constraints (UUID / GPU type)

↓

Output (system internal):

  • Standard DRA objects
  • Schedulable resource expression

Core value​

Turn DRA from an "expert interface" into an interface ordinary users can use.

DRA Driver: Real Implementation Complexity​

A DRA driver encompasses more than resource registration; it provides full lifecycle management:

Three core interfaces​

  • Publish Resources
  • Prepare Resources
  • Unprepare Resources

Real challenges​

  • libvgpu.so injection
  • ld.so.preload
  • Environment variable management
  • Temporary directories (cache / lock)

This means:

GPU scheduling has entered the runtime orchestration layer, not just simple resource allocation.

Performance Comparison: DRA is Not Just "More Elegant"​

A key benchmark from the PPT:

Pod creation time comparison​

  • HAMi (traditional): up to ~42,000
  • HAMi-DRA: significantly reduced (~30%+ improvement)

This shows:

DRA's resource pre-binding mechanism can reduce scheduling conflicts and retries

Observability Paradigm Shift​

An underestimated change:

Traditional model​

  • Resource info: from Node
  • Usage: from Pod
  • → Needs aggregation, inference

DRA model​

  • ResourceSlice: device inventory
  • ResourceClaim: resource allocation
  • → Resource perspective is first-class

The change:

Observability shifts from "inference" to "direct modeling"

Unified Modeling for Heterogeneous Devices​

A key future direction from the PPT:

If device attributes are standardized, a vendor-agnostic scheduling model is possible

For example:

  • PCIe root
  • PCI bus ID
  • GPU attributes

This is a bigger narrative:

DRA is the starting point for heterogeneous compute abstraction

Bigger Trend: Kubernetes is Becoming the AI Control Plane​

Connecting these points reveals a bigger trend:

1. Node → Resource​

  • From "scheduling machines"
  • To "scheduling resource objects"

2. Device → Virtual Resource​

  • GPU is no longer just a card
  • But a divisible, composable resource

3. Imperative → Declarative​

  • Scheduling logic → resource declaration

Essentially:

Kubernetes is evolving into the AI Infra Control Plane

HAMi's Position​

HAMi's positioning is becoming clearer:

GPU Resource Layer on Kubernetes

  • Downward: adapts to heterogeneous GPUs
  • Upward: supports AI workloads (training / inference / Agent)
  • Middle: scheduling + virtualization + abstraction

HAMi-DRA:

is the key step aligning this resource layer with Kubernetes native models

Community Significance​

Another important point from this talk:

  • Contributors from different companies collaborated
  • Validated in real production environments
  • Shared experience through the community

This is the way HAMi has always insisted on:

Promoting AI infrastructure through community, not closed systems

Summary​

The real value of this talk lies in answering a key question, beyond introducing DRA:

How to turn a "correct but hard to use" model into a system you can use today?

HAMi-DRA's answer:

  • Don't change user habits
  • Absorb DRA capabilities
  • Handle complexity internally
CNCFHAMi is a CNCF Sandbox project