Desktop snapshots
A desktop target snapshots a structured accessibility tree (role + name + state) rather than pixels — completing the web / desktop / terminal triad. Following the project's no-native-deps ethos, the raw tree is produced externally (an OS accessibility helper, a UI test harness, or a saved fixture) and ingested; native capture drivers can be layered on later.
Configure a target
{
"kind": "desktop",
"name": "settings-window",
"tree": "desktop/tree.json"
}Options:
tree— a saved accessibility-tree JSON file ({ tool?, root }or a bare node).command— alternatively, a command that prints an accessibility tree as JSON on stdout (plug in your own OS capture helper for the app under test).driver: "macos-ax"— capture a running macOS app's tree natively (below).driver: "ocr"/ocrFallback— for apps without reliable accessibility access, snapshot a screenshot through OCR instead (below).
Snapshot model
Each node is normalized to a stable, diffable shape:
{
"kind": "desktop",
"root": {
"role": "window",
"name": "Settings",
"children": [
{ "role": "button", "name": "Sign out", "state": ["enabled", "focusable"] }
]
}
}- Known fields (
role,name/title/label,value,description,state) are selected; everything else (bounds, coordinates, pids, handles) is dropped as volatile. stateis normalized to a sorted flag list (from an array or a boolean map).name/value/descriptiontext is run through the configured mask rules.
How it's compared
Desktop trees reuse the structural DOM tree diff: nodes align by role, so a renamed control surfaces as a changed @name rather than a remove + add, and insertions don't cascade:
~ button[1] @name: "Sign out" → "Log out"Try it
The examples/desktop example uses a committed tree.json, so it needs no native capture:
dungbeetle update --config examples/desktop/dungbeetle.config.json
dungbeetle test --config examples/desktop/dungbeetle.config.jsonNative macOS driver
The macos-ax driver captures a running macOS app's accessibility tree natively, with no native dependency: it walks the tree through System Events via JXA (osascript -l JavaScript) and feeds the result through the same normalization and diff as ingested trees.
{
"kind": "desktop",
"name": "calculator",
"driver": "macos-ax",
"app": "Calculator",
"maxDepth": 40
}Experimental — requirements and limitations
- macOS only. Off macOS the driver fails fast with a clear message;
doctorreports it as a warning rather than a hard failure. - Accessibility permission. The terminal / Node process must be granted Accessibility access in System Settings → Privacy & Security → Accessibility (and approve the System Events Automation prompt on first run).
- Local / interactive only. It drives a live UI, so it can't run headless in CI — capture and commit baselines locally, then
dungbeetle testcompares them. - The target app must already be running; the driver reports if it isn't.
Windows (UIA) and Linux (AT-SPI) drivers are not yet implemented; for those platforms today, use a command that emits the tree as JSON.
Screenshot + OCR fallback
Some apps expose no usable accessibility tree (custom-drawn UIs, games, canvas apps, or platforms where access is denied). For these, Dungbeetle can snapshot a screenshot run through OCR instead: the recognized text becomes one AXStaticText node per line under the app root, so it's masked, normalized, and diffed exactly like a structured tree — no new snapshot type, no pixel comparison.
True to the no-native-deps ethos, the screenshot and OCR steps shell out to tools you choose, so nothing heavy is bundled:
screenshotCommand— captures an image to the{out}path (e.g. macOSscreencapture). Or pointscreenshotat a pre-captured image file.ocrCommand— turns the{image}into text on stdout. Defaults totesseract {image} stdout; if Tesseract isn't installed you get a clear message to install it or set your own.
Use it two ways:
{
"kind": "desktop",
"name": "legacy-app",
"driver": "macos-ax",
"app": "LegacyApp",
"ocrFallback": true,
"screenshotCommand": "screencapture -x {out}"
}ocrFallback keeps the structured tree as the primary source and only drops to OCR when that capture comes back empty or fails — so apps gain a safety net without losing fidelity where accessibility works.
{
"kind": "desktop",
"name": "canvas-app",
"driver": "ocr",
"app": "CanvasApp",
"screenshotCommand": "screencapture -x {out}"
}driver: "ocr" goes straight to pixels + OCR for UIs that never expose a tree.
Keep OCR snapshots stable
OCR text is noisier than a structured tree (recognition wobble, reordering). Use mask rules for dynamic text, capture at a consistent resolution, and prefer the structured drivers whenever an app supports them. screenshotCommand / ocrCommand run via a POSIX shell.