Skip to main content

Production and Security

This guide covers the security practices that matter most for production vMetal deployments. The four main risk areas are:

  • Baseboard Management Controller (BMC) credential handling
  • OS image integrity
  • platform role-based access control (RBAC)
  • isolation boundary

BMC credential management

Storing BMC credentials

The Metal3 Bare Metal Operator (BMO) connects to each server's BMC using credentials stored in a Kubernetes Secret. Every BareMetalHost (BMH) references this Secret by name with spec.bmc.credentialsName. The Secret must be in the same namespace as the BareMetalHost on the control plane cluster.

The Secret requires exactly two keys:

apiVersion: v1
kind: Secret
metadata:
name: server-01-bmc-credentials
namespace: <nodeprovider-namespace>
type: Opaque
stringData:
username: admin
password: <redacted>

Use a distinct Secret per server or per logical group of servers. A single shared Secret across all BMHs means one rotation event affects all of them at once. It also makes it harder to scope access.

Rotating BMC credentials

Rotating a BMC credential does not disturb an already-provisioned server. The BMO uses BMC credentials only for active operations (power on/off, boot device change, hardware inspection). A server in provisioned state is running its OS and is not subject to ongoing BMC calls.

To rotate:

  1. Update the BMC password on the server's BMC management interface.
  2. Update the Kubernetes Secret with the new password.

The BMO picks up the new credentials on its next reconcile cycle. You do not need to restart the Metal3 pod or deprovision the server.

If the BMH enters an error state after rotation (for example, a brief out-of-sync window between old and new passwords), annotate it to trigger a retry:

kubectl annotate baremetalhost <name> -n <namespace> \
baremetalhost.metal3.io/reboot='{"force":false}' --overwrite

Disable certificate verification

The spec.bmc.disableCertificateVerification: true field in a BareMetalHost tells Ironic to skip TLS certificate validation when connecting to the BMC over HTTPS. Do not set this in production. It allows a man-in-the-middle to intercept BMC traffic, including credentials.

If your BMC uses a self-signed certificate or an internal CA, add that CA to Ironic's trust store instead of disabling verification. Pass the CA bundle to the Ironic container using the NodeProvider's metal3.deploy.metal3.helmValues field:

spec:
metal3:
deploy:
metal3:
enabled: true
helmValues: |
ironic:
extraEnv:
- name: IRONIC_CACERT_FILE
value: /etc/ironic/ca/ca.crt
extraVolumeMounts:
- name: bmc-ca
mountPath: /etc/ironic/ca
readOnly: true
extraVolumes:
- name: bmc-ca
secret:
secretName: bmc-ca-bundle

Create the bmc-ca-bundle Secret in the NodeProvider namespace with the PEM-encoded CA certificate under key ca.crt.


OS image trust

Checksums required

When Ironic provisions a server, it downloads the OS image from the URL in spec.image.url. The checksum in spec.image.checksum ensures Ironic rejects corrupted or tampered images before writing them to disk. Ironic verifies the download before writing and aborts provisioning if the checksum does not match.

The checksum type defaults to MD5. Set spec.image.checksumType to sha256 or sha512 for stronger verification:

spec:
image:
url: https://images.internal/ubuntu-22.04.raw.gz
checksum: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
checksumType: sha256

Hosting OS images internally

Ironic fetches images at provisioning time from the control plane cluster. If the image URL is external, a network outage or a change at the remote host blocks all provisioning.

Host OS images on infrastructure you control, such as an internal artifact registry or object storage on a private endpoint. Platform admins create OSImage resources with the URL and checksum. Node types reference them by name. Tenants never supply raw image URLs, so the OSImage resource is the only place an image reference can be tampered with. Restrict write access to OSImage resources to platform admins.

Image signing

Ironic does not natively verify image signatures. Checksum verification confirms integrity against a known-good value but does not prove provenance. Where image provenance matters, mitigate this by:

  • Generating checksums in your image build pipeline and storing them alongside the image in your artifact registry.
  • Restricting write access to the image storage so that only your build system can publish new images.
  • Restricting which OS image names and URLs can be referenced by the OSImage resource. Platform admins create OSImage resources. NodeTypes reference them by name, so tenants never supply raw image URLs.

RBAC and project isolation

Which roles control what

vCluster Platform enforces a strict two-tier separation between infrastructure configuration and workload consumption.

Platform admins have full access to NodeProvider and NodeType resources. These are cluster-scoped, non-namespaced resources. Only platform admins can create or modify them. Tenants have no direct access to these objects.

Project roles determine what members of a project can do with NodeClaim (Machine) resources:

RoleNodeClaims
Project AdminFull CRUD
Project UserGet, List
Project ViewerGet, List, Watch

No project role has access to NodeProvider or NodeType resources. Tenants can request and observe Machines but cannot change the infrastructure configuration that backs them.

Restricting which NodeTypes a project can claim

By default, projects can reference any NodeType when creating a Machine. Restrict this with spec.allowedNodeTypes on the Project:

spec:
allowedNodeTypes:
- name: metal3.gpu-a100
- name: metal3.gpu-h100

Use a provider wildcard to allow all NodeTypes from a specific NodeProvider:

spec:
allowedNodeTypes:
- name: metal3.*

An empty allowedNodeTypes list disallows all NodeTypes for the project. An unset (nil) field allows all.

Preventing cross-project server access

Each Machine claim maps to a NodeClaim resource in the project's namespace (loft-p-<project>). The platform annotates the claimed BareMetalHost with metal3.vcluster.com/node-claim: <nodeclaim-name> and labels it with vcluster.com/project: <project>. The platform controller sets these annotations. Tenants cannot edit them.

BareMetalHost resources live on the control plane cluster, not in any tenant namespace. Tenants cannot directly list or modify them. The platform mediates all access through NodeClaim objects, which are namespaced to the project.

Tenant isolation boundaries

What vMetal provides

vMetal assigns a dedicated physical server to each Machine claim. No two projects share the same CPU, memory, or storage on a provisioned server. No hypervisor layer exists. The server runs the provisioned OS directly on hardware.

This model suits workloads with strict requirements for physical resource predictability, PCIe device exclusivity (GPUs, FPGAs, DPUs), or regulatory requirements that prohibit shared compute.

What vMetal does not provide

Network isolation on the provisioning network. During PXE boot and OS installation, all servers share a provisioning network segment. A compromised server can observe PXE traffic from other servers on this network during provisioning. Segment the provisioning network from production tenant traffic and treat it as an operations-only network.

Kernel-level isolation between workloads on the same server. Once a server is provisioned to a project and joined as a Kubernetes node, the workloads scheduled onto it share the kernel. A privileged container, a container with hostPID, or a Docker-in-Docker workload can affect other workloads on the same node.

When to add vNode

If you need to run privileged or hostile workloads on a shared server, add vNode as the container runtime. vNode uses Linux user namespaces and seccomp to sandbox pods at the node level. A container running as root inside vNode does not have real root on the host kernel.

vNode is the right choice when:

  • Multiple tenants schedule workloads onto the same physical server.
  • Workloads require privileged capabilities (Docker-in-Docker, hostPID, raw device access) and you cannot dedicate a server exclusively to them.
  • You want defense-in-depth on shared GPU nodes where workloads run user-supplied model inference code.

When physical isolation is the right answer

Use dedicated server assignment (one project per server, no vNode) when:

  • Workloads require direct kernel access that vNode cannot proxy, such as custom kernel modules or bare-metal DPDK.
  • Compliance requirements mandate physical isolation and disallow any software-layer sharing, even with user namespace sandboxing.
  • Performance isolation requirements are absolute: no noisy neighbor on the same server at the kernel scheduler level.

The tradeoff is utilization. A server assigned to one project sits idle when that project has no scheduled workloads. vNode allows higher density at the cost of a software isolation boundary.

Hardening the control plane cluster

The control plane cluster that runs Metal3 and Ironic has a uniquely elevated trust level. The Ironic container holds BMC credentials and can power cycle, reimage, or wipe every registered server. Treat access to this cluster as equivalent to physical access to your server room.

Limit access to BareMetalHost resources

Only the controller-manager ServiceAccount and the vCluster Platform controller need permission to create, update, or delete BareMetalHost resources on the control plane cluster. Apply a restrictive ClusterRole:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: bmh-admin
rules:
- apiGroups: ["metal3.io"]
resources: ["baremetalhosts"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

Bind this only to the service accounts that need it. Revoke any bindings that grant broad * verb access to metal3.io resources to human users.

Control plane network segmentation

The most effective way to restrict BMC network access is to dedicate specific control plane cluster nodes to Metal3 and Ironic, and block other workloads from scheduling there.

Taint the nodes that have BMC network access:

kubectl taint nodes <node-name> dedicated=metal3:NoSchedule
kubectl label nodes <node-name> dedicated=metal3

Configure the Metal3 deployment to schedule only on those nodes using the NodeProvider's metal3.deploy.metal3.helmValues:

deploy:
metal3:
enabled: true
helmValues: |
tolerations:
- key: dedicated
operator: Equal
value: metal3
effect: NoSchedule
nodeSelector:
dedicated: metal3

With this configuration, Metal3 and Ironic run only on nodes that have BMC network access. No other pod can be scheduled onto those nodes, so no other workload can reach the BMC network even if it obtains a credential Secret.

Audit logging

Enable Kubernetes audit logging on the control plane cluster's API server. Configure the audit policy to capture write operations on both secrets and baremetalhosts:

rules:
- level: Metadata
resources:
- group: ""
resources: ["secrets"]
- level: Request
resources:
- group: "metal3.io"
resources: ["baremetalhosts"]
verbs: ["create", "update", "patch", "delete"]

Forward audit logs to your SIEM or log aggregation system. Alert on any BareMetalHost delete or update event outside of the controller-manager ServiceAccount.

Minimize human access

Treat the control plane cluster as an operations cluster, not a development cluster. Human access should be:

  • Limited to platform operators by identity, not role.
  • Time-bounded where possible (break-glass access with audit trail).
  • Prohibited for tenant users entirely.

Do not attach the control plane cluster to the vCluster Platform as a cluster that tenants can target for workloads. The cluster exists to run infrastructure components, not to host tenant environments.