25 KiB
Pitfalls Research
Domain: Unraid Update Status Sync for Existing Docker Management Bot Researched: 2026-02-08 Confidence: MEDIUM
Research combines verified Unraid architecture (HIGH confidence) with integration patterns from community sources (MEDIUM confidence). File format and API internals have LIMITED documentation — risk areas flagged for phase-specific investigation.
Critical Pitfalls
Pitfall 1: State Desync Between Docker API and Unraid's Internal Tracking
What goes wrong: After bot-initiated updates via Docker API (pull + recreate), Unraid's Docker tab continues showing "update ready" status. Unraid doesn't detect that the container was updated externally. This creates user confusion ("I just updated, why does it still show?") and leads to duplicate update attempts.
Why it happens: Unraid tracks update status through multiple mechanisms that aren't automatically synchronized with Docker API operations:
/var/lib/docker/unraid-update-status.json— cached update status file (stale after external updates)- DockerManifestService cache — compares local image digests to registry manifests
- Real-time DockerEventService — monitors Docker daemon events but doesn't trigger update status recalculation
The bot bypasses Unraid's template system entirely, so Unraid "probably doesn't check if a container has magically been updated and change its UI" (watchtower discussion).
How to avoid: Phase 1 (Investigation) must determine ALL state locations:
- Verify update status file format — inspect
/var/lib/docker/unraid-update-status.jsonstructure (undocumented, requires reverse engineering) - Document cache invalidation triggers — what causes DockerManifestService to recompute?
- Test event-based refresh — does recreating a container trigger update check, or only on manual "Check for Updates"?
Phase 2 (Sync Implementation) options (in order of safety):
- Option A (safest): Delete stale entries from
unraid-update-status.jsonfor updated containers (forces recalculation on next check) - Option B (if A insufficient): Call Unraid API update check endpoint after bot updates (triggers full recalc)
- Option C (last resort): Directly modify
unraid-update-status.jsonwith current digest (highest risk of corruption)
Warning signs:
- "Apply Update" shown in Unraid UI immediately after bot reports successful update
- Unraid notification shows update available for container that bot just updated
/var/lib/docker/unraid-update-status.jsonmodified timestamp doesn't change after bot update
Phase to address: Phase 1 (Investigation & File Format Analysis) — understand state structure Phase 2 (Sync Implementation) — implement chosen sync strategy Phase 3 (UAT) — verify sync works across Unraid versions
Pitfall 2: Race Condition Between Unraid's Periodic Update Check and Bot Sync-Back
What goes wrong:
Unraid periodically checks for updates (user-configurable interval, often 15-60 minutes). If the bot writes to unraid-update-status.json while Unraid's update check is running, data corruption or lost updates occur. Symptoms: Unraid shows containers as "update ready" immediately after sync, or sync writes are silently discarded.
Why it happens: Two processes writing to the same file without coordination:
- Unraid's update check: reads file → queries registries → writes full file
- Bot sync: reads file → modifies entry → writes full file
If both run concurrently, last writer wins (lost update problem). No evidence of file locking in Unraid's update status handling.
How to avoid:
- Read-modify-write atomicity: Use file locking or atomic write (write to temp file, atomic rename)
- Timestamp verification: Read file, modify, check mtime before write — retry if changed
- Idempotent sync: Deleting entries (Option A above) is safer than modifying — delete is idempotent
- Rate limiting: Don't sync immediately after update — wait 5-10 seconds to avoid collision with Unraid's Docker event handler
Phase 2 implementation requirements:
- Use Python's
fcntl.flock()or atomic file operations - Include retry logic with exponential backoff (max 3 attempts)
- Log all file modification failures for debugging
Warning signs:
- Sync reports success but Unraid state unchanged
- File modification timestamp inconsistent with sync execution time
- "Resource temporarily unavailable" errors when accessing the file
Phase to address: Phase 2 (Sync Implementation) — implement atomic file operations and retry logic
Pitfall 3: Unraid Version Compatibility — Internal Format Changes Break Integration
What goes wrong:
Unraid updates change the structure of /var/lib/docker/unraid-update-status.json or introduce new update tracking mechanisms. Bot's sync logic breaks silently (no status updates) or corrupts the file (containers disappear from UI, update checks fail).
Why it happens:
- File format is undocumented (no schema, no version field)
- Unraid 7.x introduced major API changes (GraphQL, new DockerService architecture)
- Past example: Unraid 6.12.8 template errors that "previously were silently ignored could cause Docker containers to fail to start"
- No backward compatibility guarantees for internal files
Historical evidence of breaking changes:
- Unraid 7.2.1 (Nov 2025): Docker localhost networking broke
- Unraid 6.12.8: Docker template validation strictness increased
- Unraid API open-sourced Jan 2025 — likely more changes incoming
How to avoid:
- Version detection: Read Unraid version from
/etc/unraid-versionor API - Format validation: Before modifying file, validate expected structure (reject unknown formats)
- Graceful degradation: If file format unrecognized, log error and skip sync (preserve existing bot functionality)
- Testing matrix: Test against Unraid 6.11, 6.12, 7.0, 7.2 (Phase 3)
Phase 1 requirements:
- Document current file format for Unraid 7.x
- Check Unraid forums for known format changes across versions
- Identify version-specific differences (if any)
Phase 2 implementation:
SUPPORTED_VERSIONS = ['6.11', '6.12', '7.0', '7.1', '7.2']
version = read_unraid_version()
if not version_compatible(version):
log_error(f"Unsupported Unraid version: {version}")
return # Skip sync, preserve bot functionality
Warning signs:
- After Unraid upgrade, sync stops working (no errors, just no state change)
- Unraid Docker tab shows errors or missing containers after bot update
- File size changes significantly after Unraid upgrade (format change)
Phase to address: Phase 1 (Investigation) — document current format, check version differences Phase 2 (Implementation) — add version detection and validation Phase 3 (UAT) — test across Unraid versions
Pitfall 4: Docker Socket Proxy Blocks Filesystem Access — n8n Can't Reach Unraid State Files
What goes wrong:
The bot runs inside n8n container, which accesses Docker via socket proxy (security layer). Socket proxy filters Docker API endpoints but doesn't provide filesystem access. /var/lib/docker/unraid-update-status.json is on the Unraid host, unreachable from n8n container.
Attempting to mount host paths into n8n violates security boundary and creates maintenance burden (n8n updates require preserving mounts).
Why it happens: Current architecture (from ARCHITECTURE.md):
n8n container → docker-socket-proxy → Docker Engine
Socket proxy security model:
- Grants specific Docker API endpoints (containers, images, exec)
- Blocks direct filesystem access
- n8n has no
/hostmount (intentional security decision)
Mounting /var/lib/docker into n8n container:
- Bypasses socket proxy security (defeats the purpose)
- Requires n8n container restart when file path changes
- Couples n8n deployment to Unraid internals
How to avoid: Three architectural options (order of preference):
Option A: Unraid API Integration (cleanest, highest effort)
- Use Unraid's native API (GraphQL or REST) if update status endpoints exist
- Requires: API key management, authentication flow, endpoint documentation
- Benefits: Version-safe, no direct file access, official interface
- Risk: API may not expose update status mutation endpoints
Option B: Helper Script on Host (recommended for v1.3)
- Small Python script runs on Unraid host (not in container)
- n8n triggers via
docker execto host helper or webhook - Helper has direct filesystem access, performs sync
- Benefits: Clean separation, no n8n filesystem access, minimal coupling
- Implementation:
.planning/research/ARCHITECTURE.mdshould detail this pattern
Option C: Controlled Host Mount (fallback, higher risk)
- Mount only
/var/lib/docker/unraid-update-status.json(not entire/var/lib/docker) - Read-only mount + separate write mechanism (requires Docker API or exec)
- Benefits: Direct access
- Risk: Tight coupling, version fragility
Phase 1 must investigate:
- Does Unraid API expose update status endpoints? (check GraphQL schema)
- Can Docker exec reach host scripts? (test in current deployment)
- Security implications of each option
Warning signs:
- "Permission denied" when attempting to read/write status file from n8n
- File not found errors (path doesn't exist in container filesystem)
- n8n container has no visibility of host filesystem
Phase to address: Phase 1 (Architecture Decision) — choose integration pattern Phase 2 (Implementation) — implement chosen pattern
Pitfall 5: Unraid Update Check Triggers While Bot Is Syncing — Notification Spam
What goes wrong: Bot updates container → syncs status back to Unraid → Unraid's periodic update check runs during sync → update check sees partially-written file or stale cache → sends duplicate "update available" notification to user. User receives notification storm when updating multiple containers.
Why it happens: Unraid's update check is asynchronous and periodic:
- Notification service triggers on update detection
- No debouncing for rapid state changes
- File write + cache invalidation not atomic
Community evidence:
- "Excessive notifications from unRAID" — users report notification spam
- "Duplicate notifications" — longstanding issue in notification system
- System excludes duplicates from archive but not from active stream
How to avoid:
- Sync timing: Delay sync by 10-30 seconds after update completion (let Docker events settle)
- Batch sync: If updating multiple containers, sync all at once (not per-container)
- Cache invalidation signal: If Unraid API provides cache invalidation, trigger AFTER all syncs complete
- Idempotent sync: Delete entries (forces recalc) rather than writing new digests (avoids partial state)
Phase 2 implementation pattern:
// In Update sub-workflow
if (responseMode === 'batch') {
return { success: true, skipSync: true } // Sync after batch completes
}
// In main workflow (after batch completion)
const updatedContainers = [...] // Collect all updated
await syncAllToUnraid(updatedContainers) // Single sync operation
Warning signs:
- Multiple "update available" notifications for same container within 1 minute
- Notifications triggered immediately after bot update completes
- Unraid notification log shows duplicate entries with close timestamps
Phase to address: Phase 2 (Sync Implementation) — add batch sync and timing delays Phase 3 (UAT) — verify no notification spam during batch updates
Pitfall 6: n8n Workflow State Doesn't Persist — Can't Queue Sync Operations
What goes wrong: Developer assumes n8n workflow static data persists between executions (like Phase 10.2 error logging attempt). Builds queue of "pending syncs" to batch them. Queue is lost between workflow executions. Each update triggers immediate sync attempt → file access contention, race conditions.
Why it happens: Known limitation from STATE.md:
n8n workflow static data does NOT persist between executions (execution-scoped, not workflow-scoped)
Phase 10.2 attempted ring buffer + debug commands — entirely removed due to this limitation.
Implications for sync-back:
- Can't queue sync operations across multiple update requests
- Can't implement retry queue for failed syncs
- Each workflow execution is stateless
How to avoid: Don't rely on workflow state for sync coordination. Options:
Option A: Synchronous sync (simplest)
- Update container → immediately sync (no queue)
- Atomic file operations handle contention
- Acceptable for single updates, problematic for batch
Option B: External queue (Redis, file-based)
- Write pending syncs to external queue
- Separate workflow polls queue and processes batch
- Higher complexity, requires infrastructure
Option C: Batch-aware sync (recommended)
- Single updates: sync immediately
- Batch updates: collect all container IDs in batch loop, sync once after completion
- No cross-execution state needed (batch completes in single execution)
Implementation in Phase 2:
// Batch loop already collects results
const batchResults = []
for (const container of containers) {
const result = await updateContainer(container)
batchResults.push({ containerId, updated: result.updated })
}
// After loop completes (still in same execution):
const toSync = batchResults.filter(r => r.updated).map(r => r.containerId)
await syncToUnraid(toSync) // Single sync call
Warning signs:
- Developer adds static data writes for sync queue
- Testing shows queue is empty on next execution
- Sync attempts happen per-container instead of batched
Phase to address: Phase 1 (Architecture) — document stateless constraint, reject queue-based designs Phase 2 (Implementation) — use in-execution batching, not cross-execution state
Pitfall 7: Unraid's br0 Network Recreate Breaks Container Resolution After Bot Update
What goes wrong:
Bot updates container using Docker API (remove + create) → Unraid recreates bridge network (br0) → Docker network ID changes → other containers using br0 fail to resolve updated container by name → service disruption beyond just the updated container.
Why it happens: Community report: "Unraid recreates 'br0' when the docker service restarts, and then services using 'br0' cannot be started because the ID of 'br0' has changed."
Bot update flow: docker pull → docker stop → docker rm → docker run with same config
- If container uses custom bridge network, recreation may trigger network ID change
- Unraid's Docker service monitors for container lifecycle events
- Network recreation is asynchronous to container operations
How to avoid:
- Preserve network settings: Ensure container recreation uses identical network config (Phase 2)
- Test network-dependent scenarios: UAT must include containers with custom networks (Phase 3)
- Graceful degradation: If network issue detected (container unreachable after update), log error and notify user
- Documentation: Warn users about potential network disruption during updates (README)
Phase 2 implementation check:
- Current update sub-workflow uses Docker API recreate — verify network config preservation
- Check if
n8n-update.jsoncopies network settings from old container to new - Test: update container on
br0, verify other containers still resolve it
Warning signs:
- Container starts successfully but is unreachable by hostname
- Other containers report DNS resolution failures after update
docker network lsshows new network ID forbr0after container update
Phase to address: Phase 2 (Update Flow Verification) — ensure network config preservation Phase 3 (UAT) — test multi-container network scenarios
Technical Debt Patterns
Shortcuts that seem reasonable but create long-term problems.
| Shortcut | Immediate Benefit | Long-term Cost | When Acceptable |
|---|---|---|---|
| Skip Unraid version detection | Faster implementation | Silent breakage on Unraid upgrades | Never — version changes are documented |
Mount /var/lib/docker into n8n |
Direct file access | Security bypass, tight coupling, upgrade fragility | Only if helper script impossible |
| Sync immediately after update (no delay) | Simpler code | Race conditions with Unraid update check | Only for single-container updates (not batch) |
| Assume file format from one Unraid version | Works on dev system | Breaks for users on different versions | Only during Phase 1 investigation (must validate before Phase 2) |
| Write directly to status file without locking | Avoids complexity | File corruption on concurrent access | Never — use atomic operations |
| Hardcode file paths | Works today | Breaks if Unraid changes internal structure | Acceptable if combined with version detection + validation |
Integration Gotchas
Common mistakes when connecting to external services.
| Integration | Common Mistake | Correct Approach |
|---|---|---|
| Unraid update status file | Assume JSON structure is stable | Validate structure before modification, reject unknown formats |
| Docker socket proxy | Expect filesystem access like Docker socket mount | Use helper script on host OR Unraid API if available |
| Unraid API (if used) | Assume unauthenticated localhost access | Check auth requirements, API key management |
| File modification timing | Write immediately after container update | Delay 5-10 seconds to avoid collision with Docker event handlers |
| Batch operations | Sync after each container update | Collect all updates, sync once after batch completes |
| Network config preservation | Assume Docker API preserves settings | Explicitly copy network settings from old container inspect to new create |
Performance Traps
Patterns that work at small scale but fail as usage grows.
| Trap | Symptoms | Prevention | When It Breaks |
|---|---|---|---|
| Sync per container in batch | File contention, slow batch updates | Batch sync after all updates complete | 5+ containers in batch |
| Full file rewrite for each sync | High I/O, race window increases | Delete stale entries OR modify only changed entries | 10+ containers tracked |
| No retry logic for file access | Silent sync failures | Exponential backoff retry (max 3 attempts) | Concurrent Unraid update check |
| Sync blocks workflow execution | Slow Telegram responses | Async sync (fire and forget) OR move to separate workflow | 3+ second file operations |
Note: Current system has 8-15 containers (from UAT scenarios). Performance traps unlikely to manifest, but prevention is low-cost.
Security Mistakes
Domain-specific security issues beyond general web security.
| Mistake | Risk | Prevention |
|---|---|---|
Mount entire /var/lib/docker into n8n |
n8n gains root-level access to all Docker data | Mount only specific file OR use helper script |
| World-writable status file permissions | Any container can corrupt Unraid state | Verify file permissions, use host-side helper with proper permissions |
| No validation before writing to status file | Malformed data corrupts Unraid Docker UI | Validate JSON structure, reject unknown formats |
| Expose Unraid API key in workflow | API key visible in n8n execution logs | Use n8n credentials, not hardcoded keys |
| Execute arbitrary commands on host | Container escape vector | Whitelist allowed operations in helper script |
UX Pitfalls
Common user experience mistakes in this domain.
| Pitfall | User Impact | Better Approach |
|---|---|---|
| Silent sync failure | User thinks status updated, Unraid still shows "update ready" | Log error to correlation ID, send Telegram notification on sync failure |
| No indication of sync status | User doesn't know if sync worked | Include in update success message: "Updated + synced to Unraid" |
| Sync delay causes confusion | User checks Unraid immediately, sees old status | Document 10-30 second sync delay in README troubleshooting |
| Unraid badge still shows after sync | User thinks update failed | README: explain Unraid caches aggressively, manual "Check for Updates" forces refresh |
| Batch update spam notifications | 10 updates = 10 Unraid notifications | Batch sync prevents this (if implemented correctly) |
"Looks Done But Isn't" Checklist
Things that appear complete but are missing critical pieces.
- File modification: Wrote to status file — verify atomic operation (temp file + rename, not direct write)
- Batch sync: Syncs after each update — verify batching for multi-container operations
- Version compatibility: Works on dev Unraid — verify against 6.11, 6.12, 7.0, 7.2
- Error handling: Sync returns success — verify retry logic for file contention
- Network preservation: Container starts after update — verify DNS resolution from other containers
- Race condition testing: Works in sequential tests — verify concurrent update + Unraid check scenario
- Filesystem access: Works on dev system — verify n8n container can actually reach file (or helper script exists)
- Notification validation: No duplicate notifications in single test — verify batch scenario (5+ containers)
Recovery Strategies
When pitfalls occur despite prevention, how to recover.
| Pitfall | Recovery Cost | Recovery Steps |
|---|---|---|
| Corrupted status file | LOW | Delete /var/lib/docker/unraid-update-status.json, Unraid recreates on next update check |
| State desync (Unraid shows stale) | LOW | Manual "Check for Updates" in Unraid UI forces recalculation |
| Unraid version breaks format | MEDIUM | Disable sync feature via feature flag, update sync logic for new format |
| Network resolution broken | MEDIUM | Restart Docker service in Unraid (Settings -> Docker -> Enable: No -> Yes) |
| File permission errors | LOW | Helper script with proper permissions, OR mount file read-only + use API |
| n8n can't reach status file | HIGH | Architecture change required (add helper script OR switch to API) |
| Notification spam | LOW | Unraid notification settings: disable Docker update notifications temporarily |
Pitfall-to-Phase Mapping
How roadmap phases should address these pitfalls.
| Pitfall | Prevention Phase | Verification |
|---|---|---|
| State desync (Docker API vs Unraid) | Phase 1 (Investigation) + Phase 2 (Sync) | UAT: update via bot, verify Unraid shows "up to date" |
| Race condition (concurrent access) | Phase 2 (Sync Implementation) | Stress test: simultaneous bot update + manual Unraid check |
| Unraid version compatibility | Phase 1 (Format Documentation) + Phase 3 (Multi-version UAT) | Test on Unraid 6.12, 7.0, 7.2 |
| Filesystem access from container | Phase 1 (Architecture Decision) | Deploy to prod, verify file access or helper script works |
| Notification spam | Phase 2 (Batch Sync) | UAT: batch update 5+ containers, count notifications |
| n8n state persistence assumption | Phase 1 (Architecture) | Code review: reject any staticData usage for sync queue |
| Network recreation (br0) | Phase 2 (Update Flow) + Phase 3 (UAT) | Test: update container on custom network, verify resolution |
Sources
HIGH confidence (official/authoritative):
- Unraid API — Docker and VM Integration — DockerService, DockerEventService architecture
- Unraid API — Notifications Service — Race condition handling, duplicate detection
- Docker Socket Proxy Security — Security model, endpoint filtering
- Docker Socket Security Critical Vulnerability Guide — Filesystem access risks
- n8n Docker File System Access — Container filesystem limitations
MEDIUM confidence (community-verified):
- Watchtower Discussion #1389 — Unraid doesn't detect external updates
- Unraid Docker Troubleshooting — br0 network recreation issue
- Unraid Forums: Docker Update Check — Status file location
- Unraid Forums: 7.2.1 Docker Issues — Version upgrade breaking changes
LOW confidence (single source, needs validation):
- File format structure (
/var/lib/docker/unraid-update-status.json) — inferred from forum posts, not officially documented - Unraid update check timing/frequency — user-configurable, no default documented
- Cache invalidation triggers — inferred from API docs, not explicitly tested
Project-specific (from existing codebase):
- STATE.md — n8n static data limitation (Phase 10.2 findings)
- ARCHITECTURE.md — Current system architecture, socket proxy usage
- CLAUDE.md — n8n workflow patterns, sub-workflow contracts
Pitfalls research for: Unraid Update Status Sync Researched: 2026-02-08