Skip to content

← Back to list

Host architecture: events, settings, scheduler

Three decisions changed how I think about this host. None is dramatic on its own. Together, they make all the difference.

3 min read

Three decisions changed how I think about this host. None is dramatic on its own — they’re all “the thing you should do”. Together they have prevented three months of bugs I already know I won’t have.

Host architecture diagram: blocks (core, protocol, transport, importers, server) wired through the event bus.

The context

The host has three simultaneous jobs to coordinate:

  1. Read media (images, videos) and turn it into RGB frames at the strip’s resolution.
  2. Apply color transforms (gamma, kelvin, brightness) frame by frame.
  3. Encode and ship over serial at the rate the scheduler dictates.

On top of that, it has to be drivable by two different clients: a Unity process over stdin (legacy contract), and a web UI over WebSocket (new). Without duplicating logic across the two paths.

If all of that lives in a single thread with print() as the reporting channel, you’re back in the same movie as before. Three decisions change the mechanics.

The decision

1. An EventBus instead of print().

scripts/core/events.py exposes a thread-safe pub/sub with 14 well-defined event types:

  • PLAYBACK_* — started, paused, stopped, frame-sent, metrics.
  • TRANSPORT_* — connected, disconnected, error, rx.
  • RESOURCE_* — loaded, error.
  • CACHE_* — evicted, cleared.
  • SETTINGS_CHANGED.

Producers (orchestrator, scheduler, transport) emit events. Consumers (cli/ for Unity, server/ for the web UI) format them. Core doesn’t know who it’s talking to — it only knows what happened. Adding a new client doesn’t touch a single line of core.

2. Immutable settings.

Settings is a pydantic.BaseModel with frozen=True. It isn’t mutated — it’s replaced. SettingsStore.patch(...) does exactly that: take the keys in the patch, merge them with the current snapshot, re-validate the whole thing, swap the reference atomically. Then emit SETTINGS_CHANGED and schedule a debounced disk write (0.5 s).

The debounce isn’t cosmetic: it’s what stops a thousand slider movements from causing a thousand disk writes. The UI sends patches at 60 Hz; the disk sees at most 2/s.

3. FrameScheduler on time.perf_counter.

The old code did time.sleep(1/fps) between frames. Looks right. It isn’t: time.time() is not monotonic (it can jump backward on NTP correction), and sleep accumulates drift. It’s invisible at first; after an hour of playback, frame 100,000 no longer falls where it should.

The FrameScheduler computes target_t = start_t + idx / fps with time.perf_counter() (monotonic, high-resolution) and sleeps until exactly that moment. If we arrive late — because the previous frame took longer than the budget — we don’t sleep and we count the frame as “dropped” in the metrics window.

Every second it emits PLAYBACK_METRICS with measured_fps, dropped, latency_p50_ms, latency_p99_ms. The UI charts that without knowing anything about the scheduler.

class Settings(BaseModel):
    model_config = ConfigDict(frozen=True, extra="forbid")
    requested_fps: int = Field(default=30, gt=0, le=120)
    gamma: float = Field(default=0.4, gt=0)
    # ...

def patch(self, partial: dict[str, Any]) -> Settings:
    with self._lock:
        merged = {**self._current.model_dump(), **partial}
        new = Settings.model_validate(merged)  # validators run
        self._current = new
    self._bus.publish(EventType.SETTINGS_CHANGED, ...)
    if self._persist_path:
        self._schedule_write()  # debounced
    return new

Why the EventBus survives a misbehaving subscriber. Subscribers run on the publisher’s thread (zero per-event thread allocations), but inside a try/except: if a subscriber raises, the exception is caught and re-emitted as TRANSPORT_ERROR. The cascade is guarded — a TRANSPORT_ERROR subscriber that also raises does not trigger another TRANSPORT_ERROR, the inner exception is dropped instead. A bad listener (a dropped WebSocket mid-message, say) can never block the other listeners from receiving their event.

What comes next

We have a structured host. But none of this would have been viable without a way to plan the work before writing it. Next post: BMAD and the discipline of one branch per story.