Skip to content

Desktop snapshots

A desktop target snapshots a structured accessibility tree (role + name + state) rather than pixels — completing the web / desktop / terminal triad. Following the project's no-native-deps ethos, the raw tree is produced externally (an OS accessibility helper, a UI test harness, or a saved fixture) and ingested; native capture drivers can be layered on later.

Configure a target

json
{
  "kind": "desktop",
  "name": "settings-window",
  "tree": "desktop/tree.json"
}

Options:

  • tree — a saved accessibility-tree JSON file ({ tool?, root } or a bare node).
  • command — alternatively, a command that prints an accessibility tree as JSON on stdout (plug in your own OS capture helper for the app under test).
  • driver: "macos-ax" — capture a running macOS app's tree natively (below).
  • driver: "ocr" / ocrFallback — for apps without reliable accessibility access, snapshot a screenshot through OCR instead (below).

Snapshot model

Each node is normalized to a stable, diffable shape:

json
{
  "kind": "desktop",
  "root": {
    "role": "window",
    "name": "Settings",
    "children": [
      { "role": "button", "name": "Sign out", "state": ["enabled", "focusable"] }
    ]
  }
}
  • Known fields (role, name/title/label, value, description, state) are selected; everything else (bounds, coordinates, pids, handles) is dropped as volatile.
  • state is normalized to a sorted flag list (from an array or a boolean map).
  • name/value/description text is run through the configured mask rules.

How it's compared

Desktop trees reuse the structural DOM tree diff: nodes align by role, so a renamed control surfaces as a changed @name rather than a remove + add, and insertions don't cascade:

diff
~ button[1] @name: "Sign out" → "Log out"

Try it

The examples/desktop example uses a committed tree.json, so it needs no native capture:

sh
dungbeetle update --config examples/desktop/dungbeetle.config.json
dungbeetle test   --config examples/desktop/dungbeetle.config.json

Native macOS driver

The macos-ax driver captures a running macOS app's accessibility tree natively, with no native dependency: it walks the tree through System Events via JXA (osascript -l JavaScript) and feeds the result through the same normalization and diff as ingested trees.

json
{
  "kind": "desktop",
  "name": "calculator",
  "driver": "macos-ax",
  "app": "Calculator",
  "maxDepth": 40
}

Experimental — requirements and limitations

  • macOS only. Off macOS the driver fails fast with a clear message; doctor reports it as a warning rather than a hard failure.
  • Accessibility permission. The terminal / Node process must be granted Accessibility access in System Settings → Privacy & Security → Accessibility (and approve the System Events Automation prompt on first run).
  • Local / interactive only. It drives a live UI, so it can't run headless in CI — capture and commit baselines locally, then dungbeetle test compares them.
  • The target app must already be running; the driver reports if it isn't.

Windows (UIA) and Linux (AT-SPI) drivers are not yet implemented; for those platforms today, use a command that emits the tree as JSON.

Screenshot + OCR fallback

Some apps expose no usable accessibility tree (custom-drawn UIs, games, canvas apps, or platforms where access is denied). For these, Dungbeetle can snapshot a screenshot run through OCR instead: the recognized text becomes one AXStaticText node per line under the app root, so it's masked, normalized, and diffed exactly like a structured tree — no new snapshot type, no pixel comparison.

True to the no-native-deps ethos, the screenshot and OCR steps shell out to tools you choose, so nothing heavy is bundled:

  • screenshotCommand — captures an image to the {out} path (e.g. macOS screencapture). Or point screenshot at a pre-captured image file.
  • ocrCommand — turns the {image} into text on stdout. Defaults to tesseract {image} stdout; if Tesseract isn't installed you get a clear message to install it or set your own.

Use it two ways:

json
{
  "kind": "desktop",
  "name": "legacy-app",
  "driver": "macos-ax",
  "app": "LegacyApp",
  "ocrFallback": true,
  "screenshotCommand": "screencapture -x {out}"
}

ocrFallback keeps the structured tree as the primary source and only drops to OCR when that capture comes back empty or fails — so apps gain a safety net without losing fidelity where accessibility works.

json
{
  "kind": "desktop",
  "name": "canvas-app",
  "driver": "ocr",
  "app": "CanvasApp",
  "screenshotCommand": "screencapture -x {out}"
}

driver: "ocr" goes straight to pixels + OCR for UIs that never expose a tree.

Keep OCR snapshots stable

OCR text is noisier than a structured tree (recognition wobble, reordering). Use mask rules for dynamic text, capture at a consistent resolution, and prefer the structured drivers whenever an app supports them. screenshotCommand / ocrCommand run via a POSIX shell.

Source-available: CLI under FSL-1.1-ALv2, cloud server under BUSL-1.1. See Licensing.