Skip to content

AMD discrete GPU

hal0 supports AMD discrete GPUs through two paths: Vulkan (the default — already works in v1) and ROCm (opt-in via a separate toolbox image).

AMD discrete is a supported target, not the reference platform. The Strix Halo page covers the iGPU / unified-memory story; this page is for desktop and workstation cards where you trade unified memory for raw VRAM bandwidth.

  • Radeon RX 7900 XTX (24 GB GDDR6)
  • Radeon RX 7900 XT (20 GB GDDR6)

Both run Q4 7B–14B class models comfortably with room for a small embed slot. Q4 30B-A3B MoE works with shorter context. Q4 70B needs partial CPU offload.

  • Radeon RX 7800 XT (16 GB GDDR6)
  • Radeon RX 7700 XT (12 GB GDDR6)
  • Radeon PRO W7800 / W7900 (32 / 48 GB GDDR6 — workstation parts)

The 16 GB and below cards run one slot at a time as a rule — chat-only with a Q4 7B–13B, or a smaller chat model alongside an embed.

The Vulkan toolbox is the same one Strix Halo uses. No ROCm headers required, no kernel modules beyond amdgpu, no version-pinning hell. The hardware probe picks Vulkan automatically when it sees an AMD discrete GPU.

The trade is throughput: Vulkan llama.cpp on AMD discrete is solid but typically lags ROCm builds by a meaningful margin on chat throughput. For a daily-driver chat box it’s perfectly fine; for raw benchmarks you’ll want ROCm.

When the ROCm toolbox lands, opting in is one slot config change:

Terminal window
hal0 slot swap primary --provider llama-cpp-rocm

The slot lifecycle handles the toolbox swap atomically — unloading → starting → warming → ready.

These are repeated from the Strix Halo loadouts page with the discrete-card framing.

  • primary: Qwen3-30B-A3B-Instruct-2507-Q4_K_M (~18.6 GB) fits with shorter context, or gemma-3-12b-it-Q4_K_M (~6.6 GB) for a longer window.
  • embed: small Q4 embed only (nomic-embed-text-v2-moe ~140 MB).
  • Q4 70B requires partial CPU offload — works, but drops well below VRAM-resident speeds.
  • primary: Hermes-4-14B-Q4_K_M (~9 GB) or gemma-3-12b-it-Q4_K_M (~6.6 GB) with several GB for context.
  • embed: nomic-embed-text-v2-moe-Q4_K_M (~140 MB) on the 16 GB variant; skip on 12 GB.

Workstation cards (W7800/W7900, 32–48 GB VRAM)

Section titled “Workstation cards (W7800/W7900, 32–48 GB VRAM)”
  • Closer to Strix Halo territory. Q4 70B fits cleanly, with room for embed. Concurrent slots become realistic.

The standard one-liner from the install page handles everything for the Vulkan path:

Terminal window
curl -fsSL https://hal0.dev/install | bash

You’ll want:

  • A recent Mesa with RADV Vulkan installed.
  • The amdgpu kernel module loaded (dmesg | grep amdgpu).
  • The service user in the render group for /dev/dri/* access.

The hardware probe will detect the GPU’s VRAM correctly and size slot fit warnings to it.

No GPU detected by probe. Run vulkaninfo --summary to confirm the Vulkan runtime sees the card. If that’s empty, fix the host Vulkan install before troubleshooting hal0.

Slot won’t start, journal mentions permission denied on /dev/dri. Add the service user to the render group:

Terminal window
sudo usermod -aG render hal0
sudo systemctl restart hal0-api

Throughput much lower than expected. Check that the card is not power-limited (cat /sys/class/drm/card0/device/power_dpm_force_performance_level), that the toolbox is the Vulkan build (hal0 slot list --json), and that you’re not running with a larger context window than the model documentation suggests.