From 5880dc45734a779f6c63649f3ab5514e33da44bc Mon Sep 17 00:00:00 2001 From: Lucas Berger Date: Mon, 9 Feb 2026 09:10:14 -0500 Subject: [PATCH] docs(16): research Unraid GraphQL API migration patterns --- .../phases/16-api-migration/16-RESEARCH.md | 767 ++++++++++++++++++ 1 file changed, 767 insertions(+) create mode 100644 .planning/phases/16-api-migration/16-RESEARCH.md diff --git a/.planning/phases/16-api-migration/16-RESEARCH.md b/.planning/phases/16-api-migration/16-RESEARCH.md new file mode 100644 index 0000000..51db554 --- /dev/null +++ b/.planning/phases/16-api-migration/16-RESEARCH.md @@ -0,0 +1,767 @@ +# Phase 16: API Migration - Research + +**Researched:** 2026-02-09 +**Domain:** Unraid GraphQL API migration for Docker container operations +**Confidence:** HIGH + +## Summary + +Phase 16 replaces all Docker socket proxy API calls with Unraid GraphQL API mutations and queries. This is a **pure substitution migration** — the user experience remains identical (same Telegram commands, same responses, same timing), but the backend switches from Docker Engine REST API to Unraid's GraphQL API. + +The migration complexity is mitigated by Phase 15 infrastructure: Container ID Registry handles ID translation (Docker 64-char hex → Unraid 129-char PrefixedID), GraphQL Response Normalizer transforms API responses to Docker contract format, and GraphQL Error Handler standardizes error checking. The workflows already have 60+ Code nodes expecting Docker API response shapes — the normalizer ensures zero changes to these downstream nodes. + +Key architectural wins: (1) Single `updateContainer` GraphQL mutation replaces the 5-step Docker flow (inspect → stop → remove → create → start → cleanup), (2) Batch operations use efficient `updateContainers` plural mutation instead of N serial API calls, (3) Unraid update badges clear automatically (no manual "Apply Update" clicks), (4) No Docker socket proxy security boundary to manage. + +**Primary recommendation:** Migrate workflows in dependency order (n8n-status.json first for container listing, then n8n-actions.json for lifecycle, then n8n-update.json for updates), using the Phase 15 utility nodes as drop-in replacements for Docker API HTTP Request nodes. Keep existing Code node logic unchanged — let normalizer/error handler bridge the API differences. + +--- + +## Standard Stack + +### Core + +| Library | Version | Purpose | Why Standard | +|---------|---------|---------|--------------| +| Unraid GraphQL API | 7.2+ native | Container lifecycle and update operations | Official Unraid interface, same mechanism as WebGUI, v1.3 Phase 14 verified | +| Phase 15 utility nodes | Current | Data transformation layer | Container ID Registry, GraphQL Normalizer, Error Handler — purpose-built for this migration | +| n8n HTTP Request node | Built-in | GraphQL client | GraphQL-over-HTTP with POST method, 15s timeout for myunraid.net relay | + +### Supporting + +| Library | Version | Purpose | When to Use | +|---------|---------|---------|-------------| +| Unraid API HTTP Template | Phase 15-02 | Pre-configured HTTP node | Duplicate and modify query for each GraphQL call | +| Container ID Registry | Phase 15-01 | Name ↔ PrefixedID mapping | All GraphQL mutations (require 129-char PrefixedID format) | +| Callback Token Encoder/Decoder | Phase 15-01 | Telegram callback data encoding | Inline keyboard callbacks with PrefixedIDs (64-byte limit) | + +### Alternatives Considered + +| Instead of | Could Use | Tradeoff | +|------------|-----------|----------| +| GraphQL API | Keep Docker socket proxy | Misses architectural goal (single API), no update badge sync, security boundary remains | +| Single updateContainer mutation | 5-step Docker flow via GraphQL | Unraid doesn't expose low-level primitives — GraphQL abstracts container recreation | +| Normalizer layer | Rewrite 60+ Code nodes for Unraid response shape | High risk, massive changeset, testing nightmare | +| Container ID Registry | Store only container names, fetch ID on each mutation | N extra API calls, latency overhead, cache staleness risk | + +**Installation:** + +No new dependencies. Phase 15 utility nodes already deployed in n8n-workflow.json. Migration uses existing HTTP Request nodes (duplicate template, wire to normalizer/error handler). + +--- + +## Architecture Patterns + +### Pattern 1: GraphQL Query Migration (Container Listing) + +**What:** Replace Docker API `GET /containers/json` with Unraid GraphQL `containers` query + +**When to use:** n8n-status.json (container list/status), n8n-batch-ui.json (batch selection), main workflow (container lookups) + +**Example migration:** + +```javascript +// BEFORE (Docker API): +// HTTP Request node: GET http://docker-socket-proxy:2375/containers/json?all=true +// Response: [{ "Id": "abc123", "Names": ["/plex"], "State": "running" }] + +// AFTER (Unraid GraphQL): +// 1. Duplicate "Unraid API HTTP Template" node +// 2. Set query body: +{ + "query": "query { docker { containers { id names state image } } }" +} + +// 3. Wire: HTTP Request → GraphQL Response Normalizer → (existing downstream Code nodes) +// Normalizer output: [{ "Id": "server_hash:container_hash", "Names": ["/plex"], "State": "running", "_unraidId": "..." }] +``` + +**Key pattern:** Normalizer transforms Unraid response to Docker contract — downstream nodes see identical data structure. + +**Source:** Phase 15-02 Plan (GraphQL Response Normalizer implementation) + +--- + +### Pattern 2: GraphQL Mutation Migration (Container Start/Stop/Restart) + +**What:** Replace Docker API `POST /containers/{id}/start` with Unraid GraphQL `start(id: PrefixedID!)` mutation + +**When to use:** n8n-actions.json (start/stop/restart operations) + +**Example migration:** + +```javascript +// BEFORE (Docker API): +// HTTP Request: POST http://docker-socket-proxy:2375/v1.47/containers/abc123/start +// On 304: Container already started (handled by existing Code node checking statusCode === 304) + +// AFTER (Unraid GraphQL): +// 1. Look up PrefixedID from Container ID Registry (by container name) +// 2. Call GraphQL mutation: +{ + "query": "mutation { docker { start(id: \"server_hash:container_hash\") { id state } } }" +} + +// 3. Wire: HTTP Request → GraphQL Error Handler → (existing downstream Code nodes) +// Error Handler maps ALREADY_IN_STATE error to { statusCode: 304, alreadyInState: true } +// Existing Code node: if (response.statusCode === 304) { /* already started */ } +``` + +**RESTART special case:** No native `restart` mutation in Unraid GraphQL. Implement as sequential `stop` + `start`: + +```javascript +// GraphQL has no restart mutation — use two operations: +// 1. mutation { docker { stop(id: "...") { id state } } } +// 2. mutation { docker { start(id: "...") { id state } } } +// Wire: Stop HTTP → Error Handler → Start HTTP → Error Handler → Success Response +``` + +**Key pattern:** Error Handler maps GraphQL error codes to HTTP status codes (ALREADY_IN_STATE → 304) — existing Code nodes unchanged. + +**Source:** Unraid GraphQL schema (DockerMutations type), Phase 15-02 Plan (GraphQL Error Handler implementation) + +--- + +### Pattern 3: Single Container Update Migration (5-Step Flow → 1 Mutation) + +**What:** Replace Docker's 5-step update flow with single `updateContainer(id: PrefixedID!)` mutation + +**When to use:** n8n-update.json (single container update), main workflow (text command "update \") + +**Current 5-step Docker flow:** +1. Inspect container (get current config) +2. Stop container +3. Remove container +4. Create container (with new image) +5. Start container +6. Remove old image (cleanup) + +**New 1-step Unraid flow:** +```javascript +// Single GraphQL mutation replaces entire flow: +{ + "query": "mutation { docker { updateContainer(id: \"server_hash:container_hash\") { id state image imageId } } }" +} + +// Unraid internally handles: pull new image, stop, remove, recreate, start +// Returns: Updated container object (normalized by GraphQL Response Normalizer) +``` + +**Success criteria verification:** +- **Before:** Check old vs new image digest to confirm update happened +- **After:** Unraid mutation updates `imageId` field — compare before/after values + +**Migration steps:** +1. Get container name from user input +2. Look up current container state (for "before" imageId comparison) +3. Look up PrefixedID from Container ID Registry +4. Call `updateContainer` mutation +5. Normalize response +6. Compare imageId: if different → updated, if same → no update available +7. Return same success/failure messages as before + +**Key win:** Simpler flow, Unraid handles retry logic and state management, update badge clears automatically. + +**Source:** Unraid GraphQL schema (DockerMutations.updateContainer), WebSearch results (Unraid update implementation shells to Dynamix Docker Manager) + +--- + +### Pattern 4: Batch Update Migration (Serial → Parallel) + +**What:** Replace N serial Docker update flows with single `updateContainers(ids: [PrefixedID!]!)` mutation + +**When to use:** Batch update (multiple container selection), "Update All :latest" feature + +**Example migration:** + +```javascript +// BEFORE (Docker API): Loop over selected containers, call update flow N times serially +// for (const container of selectedContainers) { +// await updateDockerContainer(container.id); // 5-step flow each +// } + +// AFTER (Unraid GraphQL): +// 1. Look up all PrefixedIDs from Container ID Registry (by names) +// 2. Single mutation: +{ + "query": "mutation { docker { updateContainers(ids: [\"id1\", \"id2\", \"id3\"]) { id state imageId } } }" +} + +// Returns: Array of updated containers (each normalized) +``` + +**"Update All :latest" special case:** + +```javascript +// Option 1: Filter in workflow Code node, call updateContainers +// 1. Query all containers: query { docker { containers { id image } } } +// 2. Filter where image.endsWith(':latest') +// 3. Call updateContainers(ids: [...filteredIds]) + +// Option 2: Use updateAllContainers mutation (updates everything, slower) +{ + "query": "mutation { docker { updateAllContainers { id state imageId } } }" +} + +// Recommendation: Option 1 (filtered updateContainers) — matches current ":latest" filter behavior +``` + +**Key pattern:** Batch efficiency — 1 API call instead of N, Unraid handles parallelization internally. + +**Source:** Unraid GraphQL schema (DockerMutations.updateContainers, updateAllContainers) + +--- + +### Pattern 5: Container ID Registry Usage + +**What:** All GraphQL mutations require Unraid's 129-character PrefixedID format — use Container ID Registry to map container names to IDs + +**When to use:** Every mutation call (start, stop, update), every inline keyboard callback (encode PrefixedID into 64-byte limit) + +**Workflow integration:** + +```javascript +// 1. User input: container name (e.g., "plex") +// 2. Look up in Container ID Registry: +// Input: { action: "lookup", containerName: "plex" } +// Output: { prefixedId: "server_hash:container_hash", found: true } +// 3. Use prefixedId in GraphQL mutation +// 4. Store result back in registry (cache refresh) + +// Cache refresh pattern: +// After GraphQL query/mutation returns container data: +// Input: { action: "updateCache", containers: [...normalizedContainers] } +// Registry extracts Names[0] and Id, updates internal map +``` + +**Callback encoding:** + +```javascript +// Inline keyboard callbacks (64-byte limit): +// BEFORE: "s:abc123" (status, Docker ID) +// AFTER: Use Callback Token Encoder +// Input: { containerName: "plex", action: "status" } +// Output: "s:1a2b3c4d" (8-char hash token, deterministic) +// Decoder: "s:1a2b3c4d" → lookup in registry → "plex" → get PrefixedID +``` + +**Key pattern:** Registry is the single source of truth for name ↔ PrefixedID mapping. Update it after every GraphQL query/mutation that returns container data. + +**Source:** Phase 15-01 Plan (Container ID Registry implementation) + +--- + +### Anti-Patterns to Avoid + +- **Rewriting existing Code nodes:** GraphQL Normalizer exists to prevent this — use it +- **Storing PrefixedIDs in Telegram callback data directly:** Too long (129 chars vs 64-byte limit) — use Callback Token Encoder +- **Calling GraphQL mutations without Error Handler:** Skips ALREADY_IN_STATE → 304 mapping, breaks existing error logic +- **Querying containers without updating Registry cache:** Stale ID lookups, mutations fail with "container not found" +- **Using Docker container IDs in GraphQL calls:** Unraid expects PrefixedID format, Docker IDs are incompatible +- **Implementing custom restart via low-level operations:** Unraid doesn't expose container create/remove — use stop + start pattern + +--- + +## Don't Hand-Roll + +| Problem | Don't Build | Use Instead | Why | +|---------|-------------|-------------|-----| +| GraphQL response transformation | Custom mapping for each Code node | Phase 15 GraphQL Response Normalizer | 60+ Code nodes expect Docker contract, normalizer handles all | +| Container ID translation | Ad-hoc lookups in each workflow | Phase 15 Container ID Registry | Single source of truth, cache management, name resolution | +| Error code mapping | Custom error checks per node | Phase 15 GraphQL Error Handler | Standardized ALREADY_IN_STATE → 304, NOT_FOUND handling | +| Callback data encoding | Custom compression/truncation | Phase 15 Callback Token Encoder | Deterministic 8-char hash, 64-byte limit compliance | +| Restart mutation | Try to recreate container via GraphQL | Sequential stop + start | Unraid abstracts low-level ops, no create/remove exposed | + +**Key insight:** Phase 15 infrastructure was built specifically to make this migration low-risk. Using it prevents cascading changes across 60+ nodes. + +--- + +## Common Pitfalls + +### Pitfall 1: Forgetting to Update Container ID Registry Cache + +**What goes wrong:** User updates container via bot. Next command uses stale registry cache, mutation fails with "container not found: server_hash:old_container_hash". + +**Why it happens:** `updateContainer` mutation recreates the container with a new ID (same as Docker update flow). Registry still has the old PrefixedID. + +**How to avoid:** +1. After every GraphQL query/mutation that returns container data, wire through Registry's "updateCache" action +2. Extract normalized containers from response, pass to Registry +3. Registry refreshes name → PrefixedID mappings + +**Warning signs:** +- Mutation succeeds, but next command on same container fails +- "Container not found" errors after successful updates +- Registry lookup returns PrefixedID that doesn't exist in Unraid + +**Prevention pattern:** +```javascript +// After updateContainer mutation: +// 1. Normalize response (get updated container object) +// 2. Update Registry cache: +// Input: { action: "updateCache", containers: [normalizedContainer] } +// 3. Proceed with success message +``` + +**Source:** Docker behavior (container ID changes on recreate), Phase 15-01 design + +--- + +### Pitfall 2: GraphQL Timeout on Slow Update Operations + +**What goes wrong:** `updateContainer` mutation for large container (10GB+ image) times out at 15 seconds, leaving container in intermediate state (stopped, old image removed). + +**Why it happens:** Phase 15 HTTP Template uses 15-second timeout for myunraid.net cloud relay latency. Container updates can take 30+ seconds for large images. + +**How to avoid:** +1. **Increase timeout for update mutations specifically:** Duplicate HTTP Template, set timeout to 60000ms (60s) for updateContainer/updateContainers nodes +2. **Keep 15s timeout for queries and quick mutations** (start/stop) +3. Document in ARCHITECTURE.md: "Update operations have 60s timeout to accommodate large image pulls" + +**Warning signs:** +- Timeout errors during container updates (not start/stop) +- Containers stuck in "stopped" state after timeout +- Unraid shows "pulling image" in Docker tab, but bot reports failure + +**Recommended timeouts by operation:** +- Queries (containers list): 15s (current) +- Start/stop/restart: 15s (current) +- Single container update: 60s (increase) +- Batch updates: 120s (increase further) + +**Source:** Real-world Docker image pull times (10GB+ images take 20-30s on gigabit), myunraid.net relay adds 200-500ms per request + +--- + +### Pitfall 3: ALREADY_IN_STATE Not Mapped to HTTP 304 + +**What goes wrong:** User taps "Start" on running container. GraphQL returns ALREADY_IN_STATE error. Existing Code node expects `statusCode === 304`, throws generic error instead of "already started" message. + +**Why it happens:** Forgetting to wire GraphQL Error Handler between HTTP Request and existing Code node. + +**How to avoid:** +1. **Every GraphQL mutation HTTP Request node MUST wire through GraphQL Error Handler** +2. Error Handler maps `error.extensions.code === "ALREADY_IN_STATE"` → `{ statusCode: 304, alreadyInState: true }` +3. Existing Code nodes check `response.statusCode === 304` unchanged + +**Warning signs:** +- Generic error messages instead of "Container already started" +- Errors when user repeats same action (stop stopped container, etc.) +- Code nodes throwing on ALREADY_IN_STATE instead of graceful handling + +**Correct wiring:** +``` +HTTP Request (GraphQL mutation) + ↓ +GraphQL Error Handler (maps ALREADY_IN_STATE → 304) + ↓ +Existing Code node (checks statusCode === 304) +``` + +**Source:** Phase 15-02 Plan (GraphQL Error Handler implementation), n8n-actions.json existing pattern + +--- + +### Pitfall 4: Restart Implementation Without Error Handling + +**What goes wrong:** Restart operation calls `stop` mutation, which fails with ALREADY_IN_STATE (container already stopped). Sequential `start` mutation never executes, user sees error. + +**Why it happens:** Implementing restart as sequential `stop` + `start` without ALREADY_IN_STATE tolerance. + +**How to avoid:** +1. **Stop mutation:** Wire through Error Handler, **continue on 304** (already stopped is OK) +2. **Start mutation:** Wire through Error Handler, fail on ALREADY_IN_STATE for start (indicates logic error) +3. Use n8n "Continue On Fail" or explicit error checking in Code node + +**Correct implementation:** +``` +1. Stop mutation → Error Handler + - On 304: Continue to start (container was already stopped, fine) + - On error: Fail restart operation +2. Start mutation → Error Handler + - On success: Return "restarted" message + - On 304: Fail (container started during restart, unexpected) + - On error: Fail restart operation +``` + +**Alternative:** Check container state first, only stop if running. Adds latency but avoids ALREADY_IN_STATE on stop. + +**Source:** Unraid GraphQL schema (no native restart mutation), standard restart logic patterns + +--- + +### Pitfall 5: Batch Update Progress Not Visible + +**What goes wrong:** User selects 10 containers for batch update. Bot sends "Updating..." then silence for 2 minutes, then "Done". User doesn't know if bot is working or stuck. + +**Why it happens:** `updateContainers` mutation is atomic — returns only after all containers updated. No progress events. + +**How to avoid:** +1. **Keep existing Docker pattern:** Serial updates with Telegram message edits per container +2. **Alternative (faster but no progress):** Use `updateContainers` mutation, send initial "Updating X containers..." then final result +3. **Hybrid (recommended):** Small batches (≤5) use `updateContainers` for speed, large batches (>5) use serial with progress + +**Implementation for hybrid:** +```javascript +// In batch update Code node: +if (selectedContainers.length <= 5) { + // Fast path: Single updateContainers mutation + const ids = selectedContainers.map(c => lookupPrefixedId(c.name)); + await updateContainers(ids); + return { message: `Updated ${selectedContainers.length} containers` }; +} else { + // Progress path: Serial updates with Telegram edits + for (const container of selectedContainers) { + await updateContainer(container.prefixedId); + await editTelegramMessage(`Updated ${i}/${total}: ${container.name}`); + } +} +``` + +**Tradeoff:** Progress visibility vs speed. User decision from v1.2 batch work: progress is important. + +**Source:** v1.2 batch operations design, user feedback on "silent operations" + +--- + +### Pitfall 6: Update Badge Still Shows After Bot Update + +**What goes wrong:** User updates container via bot. Unraid Docker tab still shows "apply update" badge. User clicks badge, update completes instantly (image already cached). + +**Why it happens:** This is **the problem v1.4 solves**. If it still occurs, GraphQL mutation isn't properly clearing Unraid's internal update tracking. + +**How to avoid:** +1. **Verify GraphQL mutation returns success** (not just HTTP 200, but valid container object) +2. **Check Unraid version:** Update badge sync requires Unraid 7.2+ or Connect plugin with recent version +3. **Test in real environment:** Synthetic tests may not reveal badge state issues + +**Verification test:** +```bash +# 1. Via bot: Update container +# 2. Check Unraid Docker tab: Badge should be GONE +# 3. If badge remains: Check Unraid logs for GraphQL mutation execution +# 4. If logs show success but badge remains: Unraid bug, report to Unraid team +``` + +**Expected behavior (success):** After `updateContainer` mutation completes, refreshing Unraid Docker tab shows no update badge for that container. + +**If badge persists:** Check Unraid API version, verify mutation actually executed (not just HTTP success), check Unraid internal logs (`/var/log/syslog`). + +**Source:** v1.3 Known Limitations (update badge issue), v1.4 migration goal, Unraid GraphQL API design + +--- + +## Code Examples + +### Container List Query Migration + +```javascript +// BEFORE (Docker API): +// HTTP Request node: GET http://docker-socket-proxy:2375/containers/json?all=true +// Next node (Code): processes response as-is + +// AFTER (Unraid GraphQL): +// HTTP Request node (duplicate "Unraid API HTTP Template"): +{ + "method": "POST", + "url": "={{ $env.UNRAID_HOST }}/graphql", + "body": { + "query": "query { docker { containers { id names state image } } }" + } +} + +// Wire: HTTP Request → GraphQL Response Normalizer → Update Container ID Registry → (existing Code nodes) + +// Normalizer transforms: +// IN: { data: { docker: { containers: [{ id: "hash:hash", names: ["/plex"], state: "RUNNING" }] } } } +// OUT: [{ Id: "hash:hash", Names: ["/plex"], State: "running", _unraidId: "hash:hash" }] + +// Registry update (Code node after normalizer): +const containers = $input.all().map(item => item.json); +const registryInput = { + action: "updateCache", + containers: containers +}; +// Pass to Container ID Registry node + +// Existing Code nodes see Docker API format unchanged +``` + +**Source:** Phase 15-02 normalizer implementation, ARCHITECTURE.md Docker API contract + +--- + +### Container Start Mutation Migration + +```javascript +// BEFORE (Docker API): +// HTTP Request: POST http://docker-socket-proxy:2375/v1.47/containers/abc123/start +// Code node checks: if (response.statusCode === 304) { /* already started */ } + +// AFTER (Unraid GraphQL): +// Step 1: Lookup PrefixedID (Code node before HTTP Request) +const containerName = $json.containerName; // From upstream input +const registryLookup = { + action: "lookup", + containerName: containerName +}; +// Pass to Container ID Registry → returns { prefixedId: "...", found: true } + +// Step 2: Build mutation (Code node prepares GraphQL body) +const prefixedId = $('Container ID Registry').item.json.prefixedId; +return { + json: { + query: `mutation { docker { start(id: "${prefixedId}") { id state } } }` + } +}; + +// Step 3: Execute mutation (HTTP Request, uses Unraid API HTTP Template) +// Body: {{ $json.query }} + +// Step 4: Handle errors (wire through GraphQL Error Handler) +// Error Handler maps ALREADY_IN_STATE → { statusCode: 304, alreadyInState: true } + +// Step 5: Existing Code node (unchanged) +const response = $input.item.json; +if (response.statusCode === 304) { + return { json: { message: "Container already started" } }; +} +if (response.success) { + return { json: { message: "Container started successfully" } }; +} +``` + +**Source:** Phase 15 utility node integration, n8n-actions.json existing error handling + +--- + +### Single Container Update Mutation Migration + +```javascript +// BEFORE (Docker API 5-step flow in n8n-update.json): +// 1. Inspect container → get image digest +// 2. Stop container +// 3. Remove container +// 4. Create container (pulls new image) +// 5. Start container +// 6. Remove old image +// Total: 6 HTTP Request nodes, 8 Code nodes for orchestration + +// AFTER (Unraid GraphQL): +// Step 1: Get current container state (for imageId comparison) +const containerName = $json.containerName; +// Query: { docker { containers { id image imageId } } } (filter by name) + +// Step 2: Lookup PrefixedID +// Registry input: { action: "lookup", containerName: containerName } + +// Step 3: Single mutation +const prefixedId = $('Container ID Registry').item.json.prefixedId; +const oldImageId = $json.currentImageId; // From step 1 +return { + json: { + query: `mutation { docker { updateContainer(id: "${prefixedId}") { id state image imageId } } }` + } +}; + +// Step 4: Execute mutation (HTTP Request with 60s timeout) + +// Step 5: Normalize response and check if updated +// GraphQL Response Normalizer → Code node: +const response = $input.item.json; +const newImageId = response.imageId; +const updated = (newImageId !== oldImageId); + +if (updated) { + return { + json: { + success: true, + updated: true, + message: `Updated ${containerName}: ${oldImageId.slice(0,12)} → ${newImageId.slice(0,12)}` + } + }; +} else { + return { + json: { + success: true, + updated: false, + message: `No update available for ${containerName}` + } + }; +} + +// Total: 3 HTTP Request nodes (query current, lookup ID, update mutation), 3 Code nodes +// Reduction: 6 → 3 HTTP nodes, 8 → 3 Code nodes +``` + +**Source:** n8n-update.json current implementation, Unraid GraphQL schema updateContainer mutation + +--- + +### Batch Update Migration + +```javascript +// BEFORE (Docker API): Loop in Code node, Execute Workflow sub-workflow call per container (serial) + +// AFTER (Unraid GraphQL): +// Option A: Small batch (≤5 containers) — parallel mutation +const selectedNames = $json.selectedContainers.split(','); + +// Lookup all PrefixedIDs +const ids = []; +for (const name of selectedNames) { + const result = lookupInRegistry(name); // Call Registry node + ids.push(result.prefixedId); +} + +// Single mutation +return { + json: { + query: `mutation { docker { updateContainers(ids: ${JSON.stringify(ids)}) { id state imageId } } }` + } +}; + +// HTTP Request (120s timeout for batch) → Normalizer → Success message + +// Option B: Large batch (>5 containers) — serial with progress +// Keep existing pattern: loop + Execute Workflow calls, replace inner logic with GraphQL mutation + +// Hybrid recommendation: +const batchSize = selectedNames.length; +if (batchSize <= 5) { + // Use updateContainers mutation (Option A) +} else { + // Use serial loop with Telegram progress updates (Option B) +} +``` + +**Source:** n8n-batch-ui.json, Unraid GraphQL schema updateContainers mutation + +--- + +### Restart Implementation (Sequential Stop + Start) + +```javascript +// Unraid has no native restart mutation — implement as two operations + +// Step 1: Stop mutation (tolerate ALREADY_IN_STATE) +const prefixedId = $json.prefixedId; +return { + json: { + query: `mutation { docker { stop(id: "${prefixedId}") { id state } } }` + } +}; + +// HTTP Request → GraphQL Error Handler +// Error Handler output: { statusCode: 304, alreadyInState: true } OR { success: true } + +// Step 2: Check stop result (Code node) +const stopResult = $input.item.json; +if (stopResult.statusCode === 304 || stopResult.success) { + // Container stopped (or was already stopped) — proceed to start + return { json: { proceedToStart: true } }; +} +// Other errors fail the restart + +// Step 3: Start mutation +return { + json: { + query: `mutation { docker { start(id: "${prefixedId}") { id state } } }` + } +}; + +// HTTP Request → Error Handler → Success + +// Wiring: Stop HTTP → Error Handler → Check Result IF → Start HTTP → Error Handler → Format Result +``` + +**Source:** Unraid GraphQL schema (no restart mutation), standard restart implementation pattern + +--- + +## State of the Art + +| Old Approach | Current Approach | When Changed | Impact | +|--------------|------------------|--------------|--------| +| Docker REST API via socket proxy | Unraid GraphQL API via myunraid.net relay | This phase (v1.4) | Single API, update badge sync, no proxy security boundary | +| 5-step update flow (stop/remove/create/start) | Single `updateContainer` mutation | This phase | Simpler, faster, Unraid handles retry logic | +| Serial batch updates with progress | `updateContainers` plural mutation for small batches | This phase | Parallel execution, faster for ≤5 containers | +| Docker 64-char container IDs | Unraid 129-char PrefixedID with Registry mapping | Phase 15-16 | Requires translation layer, but enables GraphQL API | +| Manual "Apply Update" in Unraid UI | Automatic badge clear via GraphQL | This phase | Core user pain point solved | + +**Deprecated/outdated:** +- **docker-socket-proxy container:** Removed in Phase 17, GraphQL API replaces Docker socket access +- **Container logs feature:** Removed in Phase 17, not valuable enough to maintain hybrid architecture +- **Direct Docker container ID storage:** Replaced by Container ID Registry lookups (PrefixedID required) + +**Current best practice (post-Phase 16):** All container operations via Unraid GraphQL API. Docker socket proxy is legacy artifact. + +--- + +## Open Questions + +1. **Actual updateContainer mutation timeout needs** + - What we know: Large images (10GB+) can take 30+ seconds to pull + - What's unclear: Does myunraid.net relay timeout separately? Will 60s be enough for all cases? + - Recommendation: Start with 60s timeout, add workflow logging to capture actual duration, adjust if needed + +2. **Batch update progress tradeoff** + - What we know: `updateContainers` is fast but silent, serial updates show progress but slow + - What's unclear: User preference — speed or visibility? + - Recommendation: Hybrid approach (≤5 fast, >5 with progress), can adjust threshold based on user feedback + +3. **Restart error handling edge cases** + - What we know: Stop + start pattern works, need to tolerate ALREADY_IN_STATE on stop + - What's unclear: What if container exits between stop and start? Retry logic needed? + - Recommendation: Implement basic stop→start, add retry if real-world issues occur + +4. **Container ID Registry cache invalidation** + - What we know: Registry caches name → PrefixedID mapping, must refresh after updates + - What's unclear: Cache expiry strategy? Time-based TTL or event-driven only? + - Recommendation: Event-driven only (update after every GraphQL query/mutation), no TTL needed + +--- + +## Sources + +### Primary (HIGH confidence) +- [Unraid GraphQL Schema](https://raw.githubusercontent.com/unraid/api/main/api/generated-schema.graphql) — Mutation signatures, DockerContainer type fields +- [Using the Unraid API](https://docs.unraid.net/API/how-to-use-the-api/) — Authentication, endpoint, rate limiting +- Phase 15-01 Plan — Container ID Registry, Callback Token Encoder/Decoder implementation +- Phase 15-02 Plan — GraphQL Response Normalizer, Error Handler, HTTP Template implementation +- ARCHITECTURE.md — Current Docker API contracts, workflow node breakdown, error patterns + +### Secondary (MEDIUM confidence) +- [Docker and VM Integration | Unraid API](https://deepwiki.com/unraid/api/2.4.2-notification-system) — Unraid update implementation details (shells to Dynamix Docker Manager) +- [Core Services | Unraid API](https://deepwiki.com/unraid/api/2.4-docker-integration) — DockerService retry logic (5 polling attempts at 500ms intervals) +- n8n-update.json — Current 5-step Docker update flow implementation +- n8n-actions.json — Current start/stop error handling pattern (statusCode === 304 check) +- n8n-status.json — Current container list query pattern + +### Tertiary (LOW confidence) +- Community forum posts on Unraid container updates — Anecdotal timing data for large image pulls +- Real-world myunraid.net relay latency observations — 200-500ms baseline from Phase 14 testing + +--- + +## Metadata + +**Confidence breakdown:** +- Standard stack: HIGH — Unraid GraphQL API verified in Phase 14, Phase 15 infrastructure already built +- Architecture: HIGH — Migration patterns are straightforward substitutions, Phase 15 utilities handle complexity +- Pitfalls: MEDIUM-HIGH — Most are standard API migration issues, actual timeout needs and batch tradeoffs require real-world testing + +**Research date:** 2026-02-09 +**Valid until:** 60 days (Unraid GraphQL API stable, schema changes infrequent) + +**Critical dependencies for planning:** +- Phase 15 utility nodes deployed and tested (Container ID Registry, GraphQL Normalizer, Error Handler, HTTP Template) +- Phase 14 Unraid API access verified (credentials, network connectivity, authentication working) +- n8n workflow JSON structure understood (node IDs, connections, typeVersion patterns from CLAUDE.md) + +**Migration risk assessment:** +- **Low risk:** Container queries (status, list) — direct substitution, normalizer handles response shape +- **Medium risk:** Container lifecycle (start/stop/restart) — ALREADY_IN_STATE error mapping critical, restart needs sequential implementation +- **Medium risk:** Single container update — timeout configuration important, imageId comparison for success detection +- **Medium-high risk:** Batch updates — tradeoff between speed and progress visibility, hybrid approach recommended + +**Ready for planning:** YES — Clear migration patterns identified, Phase 15 infrastructure ready, pitfalls documented, code examples provided for each operation type.