GPU Fleet Operations

This guide covers fleet design and day-2 operations for GPU bare metal servers managed by vMetal. For initial NodeProvider setup and provisioning a single GPU server, start with the GPU Quickstart.

Label strategy

Labels on BareMetalHost resources drive node type selection. The platform's label selector matches servers to node types, so a consistent labeling scheme is the foundation of a manageable fleet.

Recommended labels

Label	Example values	Purpose
`gpu-model`	`h100`, `a100`, `l40s`	GPU model identifier
`gpu-count`	`8`, `4`, `2`	Number of GPUs per server
`rack`	`rack-a`, `rack-b`	Physical location
`datacenter`	`us-east-1`, `eu-west-1`	Site identifier
`nvlink-domain`	`nvlink-0`, `nvlink-1`	NVLink fabric group for multi-server NVLink topologies

Apply labels when registering servers:

kubectl label baremetalhost server-01 -n metal3-system \
  gpu-model=h100 \
  gpu-count=8 \
  rack=rack-a \
  datacenter=us-east-1

Updating labels without re-provisioning

You can update labels on a BareMetalHost at any time. Label changes do not trigger re-provisioning. The platform reads labels only when selecting a server for a new Machine claim.

kubectl label baremetalhost server-01 -n metal3-system nvlink-domain=nvlink-0 --overwrite

Labels on a claimed server have no effect until you delete the Machine and provision a new one.

Node type design

One node type per hardware profile

Define a separate node type for each distinct GPU hardware profile. Servers with different GPU models, GPU counts, or CPU configurations should have distinct node types with matching selectors.

nodeTypes:
- name: "h100-8x"
  displayName: "H100 8-GPU"
  resources:
    cpu: "96"
    memory: 768Gi
    nvidia.com/gpu: "8"
  bareMetalHosts:
    selector:
      matchLabels:
        gpu-model: h100
        gpu-count: "8"
  properties:
    vcluster.com/os-image: ubuntu-noble-gpu

- name: "a100-4x"
  displayName: "A100 4-GPU"
  resources:
    cpu: "64"
    memory: 256Gi
    nvidia.com/gpu: "4"
  bareMetalHosts:
    selector:
      matchLabels:
        gpu-model: a100
        gpu-count: "4"
  properties:
    vcluster.com/os-image: ubuntu-noble-gpu

Per-tenant node types

AI Clouds often dedicate capacity to specific tenants by combining hardware labels with a tenant label. Add a tenant label to the BareMetalHost resources assigned to that tenant:

kubectl label baremetalhost server-01 -n metal3-system tenant=customer-a

Then define a node type whose selector includes both hardware and tenant attributes:

nodeTypes:
- name: "customer-a-h100-8x"
  displayName: "Customer A H100 8-GPU"
  resources:
    cpu: "96"
    memory: 768Gi
    nvidia.com/gpu: "8"
  bareMetalHosts:
    selector:
      matchLabels:
        gpu-model: h100
        gpu-count: "8"
        tenant: customer-a
  properties:
    vcluster.com/os-image: ubuntu-noble-gpu

Only servers labeled tenant: customer-a match this node type. Servers assigned to other tenants are not eligible, even if their hardware labels match.

Node types are also composable. You can define them along orthogonal dimensions. For example, you might create one set by hardware profile and another set by tenant. A server labeled with both dimensions matches both types simultaneously. Tenants select from the pool that combines whichever dimensions apply to their request.

Cost configuration

Each node type has an associated cost. Karpenter uses cost to select the cheapest matching node type when a provisioning request matches multiple types. For GPU node types, the default cost contribution is 10000 per GPU resource unit.

Override cost explicitly when you want to control scheduling preference independently of resource counts:

nodeTypes:
- name: "h100-8x"
  cost: 500000
  resources:
    cpu: "96"
    memory: 768Gi
    nvidia.com/gpu: "8"

A lower cost makes the node type preferred over higher-cost alternatives when both match.

Scheduling behavior

GPU resources inside a tenant cluster

When a bare metal server joins a tenant cluster as a private node, the platform sets the node's capacity based on the node type's resources field. Workloads inside the tenant cluster schedule to the node using standard Kubernetes resource requests.

resources:
  limits:
    nvidia.com/gpu: "1"

For GPU scheduling to work, the NVIDIA device plugin or GPU Operator must run inside the tenant cluster. Without it, the node does not advertise nvidia.com/gpu capacity to the scheduler, even if the node type specifies it.

NVIDIA device plugin vs. GPU Operator

The NVIDIA device plugin is a lightweight DaemonSet that discovers GPUs and exposes them to the Kubernetes scheduler. Deploy it directly when you control the driver installation and want minimal overhead.

The NVIDIA GPU Operator manages the device plugin alongside driver installation, container runtime configuration, and monitoring. It is the most common approach for production fleets where the OS image does not include pre-installed drivers.

Fleet lifecycle

Rotating OS images

To update the OS image across a fleet, create a new OSImage resource and update the node type to reference it. Do not edit the existing OSImage in place — updating an existing resource does not trigger re-provisioning for running Machines.

Create a new OSImage resource with the updated URL and checksum.
Update the node type's vcluster.com/os-image property to reference the new OSImage.

The node type change causes drift. The platform detects that running Machines no longer match the node type and rolls them over automatically, deprovisioning and re-provisioning servers with the updated image.

Heterogeneous GPU fleets

A single NodeProvider can manage servers with different GPU models. Define one node type per hardware profile and use label selectors to route requests to the right servers. The platform selects the cheapest matching type when multiple types satisfy a request.

nodeTypes:
- name: "h100-8x"
  bareMetalHosts:
    selector:
      matchLabels:
        gpu-model: h100
- name: "a100-4x"
  bareMetalHosts:
    selector:
      matchLabels:
        gpu-model: a100

Tenants request a specific node type by name using nodeTypeSelector. If the tenant omits a selector, the platform picks the cheapest available type.

Server pinning

Pin a Machine to a specific physical server when a workload requires a particular server, such as a node in a specific NVLink fabric or a server with a known hardware configuration.

Set metal3.vcluster.com/server-name on the Machine, not on the node type:

privateNodes:
  enabled: true
  autoNodes:
    - provider: metal3-provider
      static:
        - name: pinned-gpu
          quantity: 1
          nodeTypeSelector:
            - property: vcluster.com/node-type
              value: h100-8x
          properties:
            metal3.vcluster.com/server-name: server-01

The platform claims only the named server. If that server is unavailable, provisioning fails rather than selecting a different server.

Label strategy​

Recommended labels​

Updating labels without re-provisioning​

Node type design​

One node type per hardware profile​

Per-tenant node types​

Cost configuration​

Scheduling behavior​

GPU resources inside a tenant cluster​

NVIDIA device plugin vs. GPU Operator​

Fleet lifecycle​

Rotating OS images​

Heterogeneous GPU fleets​

Server pinning​