Files

T

Lucas Berger 07cde0490a docs: complete v1.3 project research (STACK, FEATURES, ARCHITECTURE, PITFALLS, SUMMARY)

2026-02-08 19:52:57 -05:00

25 KiB

Raw Blame History

Pitfalls Research

Domain: Unraid Update Status Sync for Existing Docker Management Bot Researched: 2026-02-08 Confidence: MEDIUM

Research combines verified Unraid architecture (HIGH confidence) with integration patterns from community sources (MEDIUM confidence). File format and API internals have LIMITED documentation — risk areas flagged for phase-specific investigation.

Critical Pitfalls

Pitfall 1: State Desync Between Docker API and Unraid's Internal Tracking

What goes wrong: After bot-initiated updates via Docker API (pull + recreate), Unraid's Docker tab continues showing "update ready" status. Unraid doesn't detect that the container was updated externally. This creates user confusion ("I just updated, why does it still show?") and leads to duplicate update attempts.

Why it happens: Unraid tracks update status through multiple mechanisms that aren't automatically synchronized with Docker API operations:

/var/lib/docker/unraid-update-status.json — cached update status file (stale after external updates)
DockerManifestService cache — compares local image digests to registry manifests
Real-time DockerEventService — monitors Docker daemon events but doesn't trigger update status recalculation

The bot bypasses Unraid's template system entirely, so Unraid "probably doesn't check if a container has magically been updated and change its UI" (watchtower discussion).

How to avoid: Phase 1 (Investigation) must determine ALL state locations:

Verify update status file format — inspect /var/lib/docker/unraid-update-status.json structure (undocumented, requires reverse engineering)
Document cache invalidation triggers — what causes DockerManifestService to recompute?
Test event-based refresh — does recreating a container trigger update check, or only on manual "Check for Updates"?

Phase 2 (Sync Implementation) options (in order of safety):

Option A (safest): Delete stale entries from unraid-update-status.json for updated containers (forces recalculation on next check)
Option B (if A insufficient): Call Unraid API update check endpoint after bot updates (triggers full recalc)
Option C (last resort): Directly modify unraid-update-status.json with current digest (highest risk of corruption)

Warning signs:

"Apply Update" shown in Unraid UI immediately after bot reports successful update
Unraid notification shows update available for container that bot just updated
/var/lib/docker/unraid-update-status.json modified timestamp doesn't change after bot update

Phase to address: Phase 1 (Investigation & File Format Analysis) — understand state structure Phase 2 (Sync Implementation) — implement chosen sync strategy Phase 3 (UAT) — verify sync works across Unraid versions

Pitfall 2: Race Condition Between Unraid's Periodic Update Check and Bot Sync-Back

What goes wrong: Unraid periodically checks for updates (user-configurable interval, often 15-60 minutes). If the bot writes to unraid-update-status.json while Unraid's update check is running, data corruption or lost updates occur. Symptoms: Unraid shows containers as "update ready" immediately after sync, or sync writes are silently discarded.

Why it happens: Two processes writing to the same file without coordination:

Unraid's update check: reads file → queries registries → writes full file
Bot sync: reads file → modifies entry → writes full file

If both run concurrently, last writer wins (lost update problem). No evidence of file locking in Unraid's update status handling.

How to avoid:

Read-modify-write atomicity: Use file locking or atomic write (write to temp file, atomic rename)
Timestamp verification: Read file, modify, check mtime before write — retry if changed
Idempotent sync: Deleting entries (Option A above) is safer than modifying — delete is idempotent
Rate limiting: Don't sync immediately after update — wait 5-10 seconds to avoid collision with Unraid's Docker event handler

Phase 2 implementation requirements:

Use Python's fcntl.flock() or atomic file operations
Include retry logic with exponential backoff (max 3 attempts)
Log all file modification failures for debugging

Warning signs:

Sync reports success but Unraid state unchanged
File modification timestamp inconsistent with sync execution time
"Resource temporarily unavailable" errors when accessing the file

Phase to address: Phase 2 (Sync Implementation) — implement atomic file operations and retry logic

Pitfall 3: Unraid Version Compatibility — Internal Format Changes Break Integration

What goes wrong: Unraid updates change the structure of /var/lib/docker/unraid-update-status.json or introduce new update tracking mechanisms. Bot's sync logic breaks silently (no status updates) or corrupts the file (containers disappear from UI, update checks fail).

Why it happens:

File format is undocumented (no schema, no version field)
Unraid 7.x introduced major API changes (GraphQL, new DockerService architecture)
Past example: Unraid 6.12.8 template errors that "previously were silently ignored could cause Docker containers to fail to start"
No backward compatibility guarantees for internal files

Historical evidence of breaking changes:

Unraid 7.2.1 (Nov 2025): Docker localhost networking broke
Unraid 6.12.8: Docker template validation strictness increased
Unraid API open-sourced Jan 2025 — likely more changes incoming

How to avoid:

Version detection: Read Unraid version from /etc/unraid-version or API
Format validation: Before modifying file, validate expected structure (reject unknown formats)
Graceful degradation: If file format unrecognized, log error and skip sync (preserve existing bot functionality)
Testing matrix: Test against Unraid 6.11, 6.12, 7.0, 7.2 (Phase 3)

Phase 1 requirements:

Document current file format for Unraid 7.x
Check Unraid forums for known format changes across versions
Identify version-specific differences (if any)

Phase 2 implementation:

SUPPORTED_VERSIONS = ['6.11', '6.12', '7.0', '7.1', '7.2']
version = read_unraid_version()
if not version_compatible(version):
    log_error(f"Unsupported Unraid version: {version}")
    return  # Skip sync, preserve bot functionality

Warning signs:

After Unraid upgrade, sync stops working (no errors, just no state change)
Unraid Docker tab shows errors or missing containers after bot update
File size changes significantly after Unraid upgrade (format change)

Phase to address: Phase 1 (Investigation) — document current format, check version differences Phase 2 (Implementation) — add version detection and validation Phase 3 (UAT) — test across Unraid versions

Pitfall 4: Docker Socket Proxy Blocks Filesystem Access — n8n Can't Reach Unraid State Files

What goes wrong: The bot runs inside n8n container, which accesses Docker via socket proxy (security layer). Socket proxy filters Docker API endpoints but doesn't provide filesystem access. /var/lib/docker/unraid-update-status.json is on the Unraid host, unreachable from n8n container.

Attempting to mount host paths into n8n violates security boundary and creates maintenance burden (n8n updates require preserving mounts).

Why it happens: Current architecture (from ARCHITECTURE.md):

n8n container → docker-socket-proxy → Docker Engine

Socket proxy security model:

Grants specific Docker API endpoints (containers, images, exec)
Blocks direct filesystem access
n8n has no /host mount (intentional security decision)

Mounting /var/lib/docker into n8n container:

Bypasses socket proxy security (defeats the purpose)
Requires n8n container restart when file path changes
Couples n8n deployment to Unraid internals

How to avoid: Three architectural options (order of preference):

Option A: Unraid API Integration (cleanest, highest effort)

Use Unraid's native API (GraphQL or REST) if update status endpoints exist
Requires: API key management, authentication flow, endpoint documentation
Benefits: Version-safe, no direct file access, official interface
Risk: API may not expose update status mutation endpoints

Option B: Helper Script on Host (recommended for v1.3)

Small Python script runs on Unraid host (not in container)
n8n triggers via docker exec to host helper or webhook
Helper has direct filesystem access, performs sync
Benefits: Clean separation, no n8n filesystem access, minimal coupling
Implementation: .planning/research/ARCHITECTURE.md should detail this pattern

Option C: Controlled Host Mount (fallback, higher risk)

Mount only /var/lib/docker/unraid-update-status.json (not entire /var/lib/docker)
Read-only mount + separate write mechanism (requires Docker API or exec)
Benefits: Direct access
Risk: Tight coupling, version fragility

Phase 1 must investigate:

Does Unraid API expose update status endpoints? (check GraphQL schema)
Can Docker exec reach host scripts? (test in current deployment)
Security implications of each option

Warning signs:

"Permission denied" when attempting to read/write status file from n8n
File not found errors (path doesn't exist in container filesystem)
n8n container has no visibility of host filesystem

Phase to address: Phase 1 (Architecture Decision) — choose integration pattern Phase 2 (Implementation) — implement chosen pattern

Pitfall 5: Unraid Update Check Triggers While Bot Is Syncing — Notification Spam

What goes wrong: Bot updates container → syncs status back to Unraid → Unraid's periodic update check runs during sync → update check sees partially-written file or stale cache → sends duplicate "update available" notification to user. User receives notification storm when updating multiple containers.

Why it happens: Unraid's update check is asynchronous and periodic:

Notification service triggers on update detection
No debouncing for rapid state changes
File write + cache invalidation not atomic

Community evidence:

"Excessive notifications from unRAID" — users report notification spam
"Duplicate notifications" — longstanding issue in notification system
System excludes duplicates from archive but not from active stream

How to avoid:

Sync timing: Delay sync by 10-30 seconds after update completion (let Docker events settle)
Batch sync: If updating multiple containers, sync all at once (not per-container)
Cache invalidation signal: If Unraid API provides cache invalidation, trigger AFTER all syncs complete
Idempotent sync: Delete entries (forces recalc) rather than writing new digests (avoids partial state)

Phase 2 implementation pattern:

// In Update sub-workflow
if (responseMode === 'batch') {
  return { success: true, skipSync: true }  // Sync after batch completes
}

// In main workflow (after batch completion)
const updatedContainers = [...]  // Collect all updated
await syncAllToUnraid(updatedContainers)  // Single sync operation

Warning signs:

Multiple "update available" notifications for same container within 1 minute
Notifications triggered immediately after bot update completes
Unraid notification log shows duplicate entries with close timestamps

Phase to address: Phase 2 (Sync Implementation) — add batch sync and timing delays Phase 3 (UAT) — verify no notification spam during batch updates

Pitfall 6: n8n Workflow State Doesn't Persist — Can't Queue Sync Operations

What goes wrong: Developer assumes n8n workflow static data persists between executions (like Phase 10.2 error logging attempt). Builds queue of "pending syncs" to batch them. Queue is lost between workflow executions. Each update triggers immediate sync attempt → file access contention, race conditions.

Why it happens: Known limitation from STATE.md:

n8n workflow static data does NOT persist between executions (execution-scoped, not workflow-scoped)

Phase 10.2 attempted ring buffer + debug commands — entirely removed due to this limitation.

Implications for sync-back:

Can't queue sync operations across multiple update requests
Can't implement retry queue for failed syncs
Each workflow execution is stateless

How to avoid: Don't rely on workflow state for sync coordination. Options:

Option A: Synchronous sync (simplest)

Update container → immediately sync (no queue)
Atomic file operations handle contention
Acceptable for single updates, problematic for batch

Option B: External queue (Redis, file-based)

Write pending syncs to external queue
Separate workflow polls queue and processes batch
Higher complexity, requires infrastructure

Option C: Batch-aware sync (recommended)

Single updates: sync immediately
Batch updates: collect all container IDs in batch loop, sync once after completion
No cross-execution state needed (batch completes in single execution)

Implementation in Phase 2:

// Batch loop already collects results
const batchResults = []
for (const container of containers) {
  const result = await updateContainer(container)
  batchResults.push({ containerId, updated: result.updated })
}
// After loop completes (still in same execution):
const toSync = batchResults.filter(r => r.updated).map(r => r.containerId)
await syncToUnraid(toSync)  // Single sync call

Warning signs:

Developer adds static data writes for sync queue
Testing shows queue is empty on next execution
Sync attempts happen per-container instead of batched

Phase to address: Phase 1 (Architecture) — document stateless constraint, reject queue-based designs Phase 2 (Implementation) — use in-execution batching, not cross-execution state

Pitfall 7: Unraid's br0 Network Recreate Breaks Container Resolution After Bot Update

What goes wrong: Bot updates container using Docker API (remove + create) → Unraid recreates bridge network (br0) → Docker network ID changes → other containers using br0 fail to resolve updated container by name → service disruption beyond just the updated container.

Why it happens: Community report: "Unraid recreates 'br0' when the docker service restarts, and then services using 'br0' cannot be started because the ID of 'br0' has changed."

Bot update flow: docker pull → docker stop → docker rm → docker run with same config

If container uses custom bridge network, recreation may trigger network ID change
Unraid's Docker service monitors for container lifecycle events
Network recreation is asynchronous to container operations

How to avoid:

Preserve network settings: Ensure container recreation uses identical network config (Phase 2)
Test network-dependent scenarios: UAT must include containers with custom networks (Phase 3)
Graceful degradation: If network issue detected (container unreachable after update), log error and notify user
Documentation: Warn users about potential network disruption during updates (README)

Phase 2 implementation check:

Current update sub-workflow uses Docker API recreate — verify network config preservation
Check if n8n-update.json copies network settings from old container to new
Test: update container on br0, verify other containers still resolve it

Warning signs:

Container starts successfully but is unreachable by hostname
Other containers report DNS resolution failures after update
docker network ls shows new network ID for br0 after container update

Phase to address: Phase 2 (Update Flow Verification) — ensure network config preservation Phase 3 (UAT) — test multi-container network scenarios

Technical Debt Patterns

Shortcuts that seem reasonable but create long-term problems.

Shortcut	Immediate Benefit	Long-term Cost	When Acceptable
Skip Unraid version detection	Faster implementation	Silent breakage on Unraid upgrades	Never — version changes are documented
Mount `/var/lib/docker` into n8n	Direct file access	Security bypass, tight coupling, upgrade fragility	Only if helper script impossible
Sync immediately after update (no delay)	Simpler code	Race conditions with Unraid update check	Only for single-container updates (not batch)
Assume file format from one Unraid version	Works on dev system	Breaks for users on different versions	Only during Phase 1 investigation (must validate before Phase 2)
Write directly to status file without locking	Avoids complexity	File corruption on concurrent access	Never — use atomic operations
Hardcode file paths	Works today	Breaks if Unraid changes internal structure	Acceptable if combined with version detection + validation

Integration Gotchas

Common mistakes when connecting to external services.

Integration	Common Mistake	Correct Approach
Unraid update status file	Assume JSON structure is stable	Validate structure before modification, reject unknown formats
Docker socket proxy	Expect filesystem access like Docker socket mount	Use helper script on host OR Unraid API if available
Unraid API (if used)	Assume unauthenticated localhost access	Check auth requirements, API key management
File modification timing	Write immediately after container update	Delay 5-10 seconds to avoid collision with Docker event handlers
Batch operations	Sync after each container update	Collect all updates, sync once after batch completes
Network config preservation	Assume Docker API preserves settings	Explicitly copy network settings from old container inspect to new create

Performance Traps

Patterns that work at small scale but fail as usage grows.

Trap	Symptoms	Prevention	When It Breaks
Sync per container in batch	File contention, slow batch updates	Batch sync after all updates complete	5+ containers in batch
Full file rewrite for each sync	High I/O, race window increases	Delete stale entries OR modify only changed entries	10+ containers tracked
No retry logic for file access	Silent sync failures	Exponential backoff retry (max 3 attempts)	Concurrent Unraid update check
Sync blocks workflow execution	Slow Telegram responses	Async sync (fire and forget) OR move to separate workflow	3+ second file operations

Note: Current system has 8-15 containers (from UAT scenarios). Performance traps unlikely to manifest, but prevention is low-cost.

Security Mistakes

Domain-specific security issues beyond general web security.

Mistake	Risk	Prevention
Mount entire `/var/lib/docker` into n8n	n8n gains root-level access to all Docker data	Mount only specific file OR use helper script
World-writable status file permissions	Any container can corrupt Unraid state	Verify file permissions, use host-side helper with proper permissions
No validation before writing to status file	Malformed data corrupts Unraid Docker UI	Validate JSON structure, reject unknown formats
Expose Unraid API key in workflow	API key visible in n8n execution logs	Use n8n credentials, not hardcoded keys
Execute arbitrary commands on host	Container escape vector	Whitelist allowed operations in helper script

UX Pitfalls

Common user experience mistakes in this domain.

Pitfall	User Impact	Better Approach
Silent sync failure	User thinks status updated, Unraid still shows "update ready"	Log error to correlation ID, send Telegram notification on sync failure
No indication of sync status	User doesn't know if sync worked	Include in update success message: "Updated + synced to Unraid"
Sync delay causes confusion	User checks Unraid immediately, sees old status	Document 10-30 second sync delay in README troubleshooting
Unraid badge still shows after sync	User thinks update failed	README: explain Unraid caches aggressively, manual "Check for Updates" forces refresh
Batch update spam notifications	10 updates = 10 Unraid notifications	Batch sync prevents this (if implemented correctly)

"Looks Done But Isn't" Checklist

Things that appear complete but are missing critical pieces.

File modification: Wrote to status file — verify atomic operation (temp file + rename, not direct write)
Batch sync: Syncs after each update — verify batching for multi-container operations
Version compatibility: Works on dev Unraid — verify against 6.11, 6.12, 7.0, 7.2
Error handling: Sync returns success — verify retry logic for file contention
Network preservation: Container starts after update — verify DNS resolution from other containers
Race condition testing: Works in sequential tests — verify concurrent update + Unraid check scenario
Filesystem access: Works on dev system — verify n8n container can actually reach file (or helper script exists)
Notification validation: No duplicate notifications in single test — verify batch scenario (5+ containers)

Recovery Strategies

When pitfalls occur despite prevention, how to recover.

Pitfall	Recovery Cost	Recovery Steps
Corrupted status file	LOW	Delete `/var/lib/docker/unraid-update-status.json`, Unraid recreates on next update check
State desync (Unraid shows stale)	LOW	Manual "Check for Updates" in Unraid UI forces recalculation
Unraid version breaks format	MEDIUM	Disable sync feature via feature flag, update sync logic for new format
Network resolution broken	MEDIUM	Restart Docker service in Unraid (`Settings -> Docker -> Enable: No -> Yes`)
File permission errors	LOW	Helper script with proper permissions, OR mount file read-only + use API
n8n can't reach status file	HIGH	Architecture change required (add helper script OR switch to API)
Notification spam	LOW	Unraid notification settings: disable Docker update notifications temporarily

Pitfall-to-Phase Mapping

How roadmap phases should address these pitfalls.

Pitfall	Prevention Phase	Verification
State desync (Docker API vs Unraid)	Phase 1 (Investigation) + Phase 2 (Sync)	UAT: update via bot, verify Unraid shows "up to date"
Race condition (concurrent access)	Phase 2 (Sync Implementation)	Stress test: simultaneous bot update + manual Unraid check
Unraid version compatibility	Phase 1 (Format Documentation) + Phase 3 (Multi-version UAT)	Test on Unraid 6.12, 7.0, 7.2
Filesystem access from container	Phase 1 (Architecture Decision)	Deploy to prod, verify file access or helper script works
Notification spam	Phase 2 (Batch Sync)	UAT: batch update 5+ containers, count notifications
n8n state persistence assumption	Phase 1 (Architecture)	Code review: reject any `staticData` usage for sync queue
Network recreation (br0)	Phase 2 (Update Flow) + Phase 3 (UAT)	Test: update container on custom network, verify resolution

Sources

HIGH confidence (official/authoritative):

Unraid API — Docker and VM Integration — DockerService, DockerEventService architecture
Unraid API — Notifications Service — Race condition handling, duplicate detection
Docker Socket Proxy Security — Security model, endpoint filtering
Docker Socket Security Critical Vulnerability Guide — Filesystem access risks
n8n Docker File System Access — Container filesystem limitations

MEDIUM confidence (community-verified):

Watchtower Discussion #1389 — Unraid doesn't detect external updates
Unraid Docker Troubleshooting — br0 network recreation issue
Unraid Forums: Docker Update Check — Status file location
Unraid Forums: 7.2.1 Docker Issues — Version upgrade breaking changes

LOW confidence (single source, needs validation):

File format structure (/var/lib/docker/unraid-update-status.json) — inferred from forum posts, not officially documented
Unraid update check timing/frequency — user-configurable, no default documented
Cache invalidation triggers — inferred from API docs, not explicitly tested

Project-specific (from existing codebase):

STATE.md — n8n static data limitation (Phase 10.2 findings)
ARCHITECTURE.md — Current system architecture, socket proxy usage
CLAUDE.md — n8n workflow patterns, sub-workflow contracts

Pitfalls research for: Unraid Update Status Sync Researched: 2026-02-08

25 KiB Raw Blame History

Pitfalls Research

Critical Pitfalls

Pitfall 1: State Desync Between Docker API and Unraid's Internal Tracking

Pitfall 2: Race Condition Between Unraid's Periodic Update Check and Bot Sync-Back

Pitfall 3: Unraid Version Compatibility — Internal Format Changes Break Integration

Pitfall 4: Docker Socket Proxy Blocks Filesystem Access — n8n Can't Reach Unraid State Files

Pitfall 5: Unraid Update Check Triggers While Bot Is Syncing — Notification Spam

Pitfall 6: n8n Workflow State Doesn't Persist — Can't Queue Sync Operations

Pitfall 7: Unraid's br0 Network Recreate Breaks Container Resolution After Bot Update

Technical Debt Patterns

Integration Gotchas

Performance Traps

Security Mistakes

UX Pitfalls

"Looks Done But Isn't" Checklist

Recovery Strategies

Pitfall-to-Phase Mapping

Sources

25 KiB

Raw Blame History