Skip to content

feat: add bare metal support for Intel TDX and AMD SEV-SNP#73

Open
butler54 wants to merge 17 commits intovalidatedpatterns:mainfrom
butler54:baremetal-tp-releases-squashed
Open

feat: add bare metal support for Intel TDX and AMD SEV-SNP#73
butler54 wants to merge 17 commits intovalidatedpatterns:mainfrom
butler54:baremetal-tp-releases-squashed

Conversation

@butler54
Copy link
Copy Markdown
Collaborator

@butler54 butler54 commented Mar 9, 2026

Summary

  • Adds a new baremetal clusterGroup for deploying CoCo on bare metal with Intel TDX or AMD SEV-SNP hardware
  • NFD auto-detects CPU TEE capabilities and labels nodes accordingly
  • RuntimeClasses for kata-tdx and kata-snp created automatically
  • MachineConfigs for kernel parameters (TDX) and vsock device access
  • Intel DCAP chart with PCCS and QGS services for TDX attestation
  • Storage support via HPP, LVMS, or external providers
  • PCCS secrets generation added to gen-secrets.sh
  • Platform override files for BareMetal and None platforms
  • Documentation for Dell TDX configuration, NFD notes, and bare metal PCR reference values

Test plan

  • Deploy baremetal clusterGroup on Intel TDX hardware
  • Deploy baremetal clusterGroup on AMD SEV-SNP hardware
  • Verify NFD correctly labels nodes with TEE capabilities
  • Verify kata-tdx/kata-snp RuntimeClasses are created
  • Verify PCCS and QGS services deploy on Intel nodes
  • Verify existing Azure deployments (simple, trusted-hub, spoke) are unaffected

🤖 Generated with Claude Code

@butler54 butler54 requested a review from a team March 9, 2026 06:30
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@butler54 butler54 force-pushed the baremetal-tp-releases-squashed branch from b4eaf36 to bad2552 Compare March 10, 2026 02:22
Comment thread ansible/detect-runtime-class.yaml Outdated
Comment thread charts/all/baremetal/bm-kernel-params.yaml Outdated
butler54 and others added 4 commits March 10, 2026 15:01
Replace git branch references (repoURL/targetRevision/path) with
released Helm chart references (chart/chartVersion) for trustee,
sandboxed-containers, and sandboxed-policies in values-baremetal.yaml.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add tdx.enabled flag (default true) to baremetal chart to conditionally
set kvm_intel.tdx=1 kernel argument. Without this, the kvm_intel module
does not activate TDX and NFD cannot detect it.

Enable intel-dcap application in values-baremetal.yaml for PCCS/QGS
attestation services.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mplates

Address PR review feedback:
- Remove detect-runtime-class.yaml (OSC operator manages RuntimeClass)
- Remove bm-kernel-params.yaml and kernel-params-mco.yaml (config should
  be provided via initdata or pod annotations to avoid inconsistencies)
- Remove commented-out runtimeclass templates for AMD SNP and Intel TDX

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@butler54 butler54 requested a review from bpradipt March 23, 2026 07:50
butler54 and others added 12 commits April 20, 2026 08:44
Signed-off-by: Chris Butler <chris.butler@redhat.com>
Conflicts resolved:
- _helpers.tpl: kept runtimeClassName override support from baremetal
- kbs-access/values.yaml: merged main's structure with runtimeClassName param
- kbs-access/secure-pod.yaml: accepted deletion (replaced by secure-deployment.yaml)
- kbs-access/secure-deployment.yaml: added runtimeClassName values override support

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add Kyverno chart and coco-kyverno-policies to baremetal values
- Update trustee chart to 0.3.* with kbs.admin.format v1.1
- Remove bypassAttestation (proper attestation via init_data)
- Remove explicit runtimeClassName overrides (auto-detected by platform)
- Add syncPolicy prune to hello-openshift and kbs-access
- Reset default clusterGroupName to simple

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The policy only fired on Pod/Deployment CREATE, so pods created before
the initdata ConfigMap existed never got the cc_init_data annotation.
Adding UPDATE allows Kyverno to inject the annotation when a Deployment
is updated (e.g. by ArgoCD sync), triggering a rolling restart with
the correct initdata.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e generation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds RAW_HASH field to both initdata and debug-initdata ConfigMaps.

PCR8_HASH = SHA256(zeros || SHA256(toml)) — used by Azure vTPM attestation
RAW_HASH = SHA256(toml) — used by baremetal TDX/SNP attestation

Both are needed because Azure and baremetal present initdata differently
in their attestation evidence. A single Trustee attestation server must
accept both formats to support multi-platform deployments.

Future: integrate veritas for comprehensive reference value generation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Temporarily uses butler54/trustee-chart feature/baremetal-attestation
branch instead of released chart. This branch includes:
- Baremetal TDX and SNP attestation rules
- Conditional pcr-stash (no error on baremetal without vTPM)
- Raw init_data hash (zero-padded) for baremetal attestation
- TDX QCNL config with use_secure_cert: false for local PCCS

Revert to chartVersion after merging and releasing trustee chart.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The kbs-access-app container image is ~1GB which causes container
creation timeouts with the default 2GB kata VM memory.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The autogen Deployment rule causes admission failures when the initdata
ConfigMap hasn't been propagated to the workload namespace yet. By
targeting Pods only (autogen-controllers: none), Deployments are admitted
without ConfigMap resolution. Pods get cc_init_data injected at creation
time when the ConfigMap is available. A rollout restart picks up new
initdata values.

Also removes UPDATE operation — only CREATE is needed since a rollout
restart creates new Pods.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Without braces, bash treats $initial_pcr followed by the hex hash
as a single undefined variable name, producing SHA-256 of empty
string instead of the correct PCR extend value.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants