Slot lifecycle
Every slot in hal0 is in exactly one state at any moment. The set
of states is fixed; the transitions between them are validated against
LEGAL_TRANSITIONS; every transition is persisted atomically to
state.json and streamed over SSE to clients. The dashboard reflects
real state, not just systemctl is-active snapshots.
The canonical enum is src/hal0/slots/state.py — a StrEnum, wire-
stable across versions.
States
Section titled “States”| State | Meaning |
|---|---|
offline | No systemd unit active. |
pulling | Model files downloading / verifying. |
starting | systemd unit up; container booting. |
warming | Container live; model loading into VRAM / GTT. |
ready | Passed health probe (non-empty /v1/models + sentinel completion). |
serving | Inference request in-flight. |
idle | Ready but no traffic past the idle timeout — unload candidate. |
unloading | Graceful stop in progress. |
error | Failed; details in state.json + journald. |
error is a sideband — it’s reachable from most other states
when something goes wrong, and the slot returns to offline from
there once the failure is acknowledged.
Transition diagram
Section titled “Transition diagram”The canonical happy-path flow:
┌──────────────────────────────────────────────────────────┐ │ │ ▼ │┌────────┐ pull ┌─────────┐ done ┌──────────┐ spawn ┌────────┐││offline │────────▶│ pulling │────────▶│ starting │─────────▶│warming ││└────────┘ └─────────┘ └──────────┘ └────┬───┘│ ▲ │ │ │ health probe pass ▼ │ │ ┌────────┐│ │ │ ready ││ │ idle timer fires └───┬────┘│ │ ┌───────────────────┤ │ │ ▼ │ │ │ ┌────────┐ │ │ │ │ idle │ │ │ │ └───┬────┘ inference │ │ │ │ request │ │ │ ▼ ▼ │ │ ┌───────────┐ done ┌────────┐│ │ │ unloading │◀───────│serving ││ │ └─────┬─────┘ └────────┘│ │ │ │ └───────────────────────────────────────┘ │ │ error sideband — reachable from pulling / starting / warming │ / ready / serving / unloading; returns to offline when ack'd ──┘Legal transitions (text table)
Section titled “Legal transitions (text table)”| From | Allowed to: |
|---|---|
offline | pulling, starting, error |
pulling | starting, error, offline |
starting | warming, error |
warming | ready, error |
ready | serving, idle, unloading, error |
serving | ready, error, unloading |
idle | serving, unloading, ready |
unloading | offline, error |
error | offline |
Any transition not listed is rejected by the manager with a
slot.invalid_transition error. This is a hard invariant — there is
no escape hatch.
Persistence & streaming
Section titled “Persistence & streaming”- Every transition writes
/var/lib/hal0/slots/<name>/state.jsonthrough the atomic-write path (NamedTemporaryFile → fsync → os.replace()). - The slot manager emits one SSE event per transition on
GET /api/slots/events. - The dashboard’s Slots view subscribes to that stream — what you see in the UI is the same wire format the daemon writes to disk.
What each state implies for API behaviour
Section titled “What each state implies for API behaviour”| Slot state | API behaviour |
|---|---|
offline | Requests fail fast with slot.not_loaded. |
pulling | Requests fail with slot.pulling and a progress hint. |
starting | Requests block briefly; if it doesn’t reach warming quickly, returns 503. |
warming | Requests block on the adaptive cold-boot probe; succeed if the slot reaches ready within the request deadline. |
ready | Requests are served. Transitions to serving for the duration. |
serving | Concurrent requests stack on the slot’s queue. |
idle | Requests are served as in ready; first request resets the idle timer. |
unloading | Requests fail with slot.unloading. |
error | Requests fail with the structured error envelope from the failure. |
Why this matters
Section titled “Why this matters”A naive setup conflates “the systemd unit is up” with “the model is ready to serve”. They aren’t the same — the model still has to load into VRAM, the backend has to pass its first health probe, and the sentinel completion has to succeed. hal0’s state machine encodes that distinction explicitly, which is why the dashboard can show “warming — 12s elapsed” instead of “service is up” while the request still 503s.