From 5880dc45734a779f6c63649f3ab5514e33da44bc Mon Sep 17 00:00:00 2001
From: Lucas Berger <me@lucasberger.ca>
Date: Mon, 9 Feb 2026 09:10:14 -0500
Subject: [PATCH] docs(16): research Unraid GraphQL API migration patterns

---
 .../phases/16-api-migration/16-RESEARCH.md    | 767 ++++++++++++++++++
 1 file changed, 767 insertions(+)
 create mode 100644 .planning/phases/16-api-migration/16-RESEARCH.md

diff --git a/.planning/phases/16-api-migration/16-RESEARCH.md b/.planning/phases/16-api-migration/16-RESEARCH.md
new file mode 100644
index 0000000..51db554
--- /dev/null
+++ b/.planning/phases/16-api-migration/16-RESEARCH.md
@@ -0,0 +1,767 @@
+# Phase 16: API Migration - Research
+
+**Researched:** 2026-02-09
+**Domain:** Unraid GraphQL API migration for Docker container operations
+**Confidence:** HIGH
+
+## Summary
+
+Phase 16 replaces all Docker socket proxy API calls with Unraid GraphQL API mutations and queries. This is a **pure substitution migration** — the user experience remains identical (same Telegram commands, same responses, same timing), but the backend switches from Docker Engine REST API to Unraid's GraphQL API.
+
+The migration complexity is mitigated by Phase 15 infrastructure: Container ID Registry handles ID translation (Docker 64-char hex → Unraid 129-char PrefixedID), GraphQL Response Normalizer transforms API responses to Docker contract format, and GraphQL Error Handler standardizes error checking. The workflows already have 60+ Code nodes expecting Docker API response shapes — the normalizer ensures zero changes to these downstream nodes.
+
+Key architectural wins: (1) Single `updateContainer` GraphQL mutation replaces the 5-step Docker flow (inspect → stop → remove → create → start → cleanup), (2) Batch operations use efficient `updateContainers` plural mutation instead of N serial API calls, (3) Unraid update badges clear automatically (no manual "Apply Update" clicks), (4) No Docker socket proxy security boundary to manage.
+
+**Primary recommendation:** Migrate workflows in dependency order (n8n-status.json first for container listing, then n8n-actions.json for lifecycle, then n8n-update.json for updates), using the Phase 15 utility nodes as drop-in replacements for Docker API HTTP Request nodes. Keep existing Code node logic unchanged — let normalizer/error handler bridge the API differences.
+
+---
+
+## Standard Stack
+
+### Core
+
+| Library | Version | Purpose | Why Standard |
+|---------|---------|---------|--------------|
+| Unraid GraphQL API | 7.2+ native | Container lifecycle and update operations | Official Unraid interface, same mechanism as WebGUI, v1.3 Phase 14 verified |
+| Phase 15 utility nodes | Current | Data transformation layer | Container ID Registry, GraphQL Normalizer, Error Handler — purpose-built for this migration |
+| n8n HTTP Request node | Built-in | GraphQL client | GraphQL-over-HTTP with POST method, 15s timeout for myunraid.net relay |
+
+### Supporting
+
+| Library | Version | Purpose | When to Use |
+|---------|---------|---------|-------------|
+| Unraid API HTTP Template | Phase 15-02 | Pre-configured HTTP node | Duplicate and modify query for each GraphQL call |
+| Container ID Registry | Phase 15-01 | Name ↔ PrefixedID mapping | All GraphQL mutations (require 129-char PrefixedID format) |
+| Callback Token Encoder/Decoder | Phase 15-01 | Telegram callback data encoding | Inline keyboard callbacks with PrefixedIDs (64-byte limit) |
+
+### Alternatives Considered
+
+| Instead of | Could Use | Tradeoff |
+|------------|-----------|----------|
+| GraphQL API | Keep Docker socket proxy | Misses architectural goal (single API), no update badge sync, security boundary remains |
+| Single updateContainer mutation | 5-step Docker flow via GraphQL | Unraid doesn't expose low-level primitives — GraphQL abstracts container recreation |
+| Normalizer layer | Rewrite 60+ Code nodes for Unraid response shape | High risk, massive changeset, testing nightmare |
+| Container ID Registry | Store only container names, fetch ID on each mutation | N extra API calls, latency overhead, cache staleness risk |
+
+**Installation:**
+
+No new dependencies. Phase 15 utility nodes already deployed in n8n-workflow.json. Migration uses existing HTTP Request nodes (duplicate template, wire to normalizer/error handler).
+
+---
+
+## Architecture Patterns
+
+### Pattern 1: GraphQL Query Migration (Container Listing)
+
+**What:** Replace Docker API `GET /containers/json` with Unraid GraphQL `containers` query
+
+**When to use:** n8n-status.json (container list/status), n8n-batch-ui.json (batch selection), main workflow (container lookups)
+
+**Example migration:**
+
+```javascript
+// BEFORE (Docker API):
+// HTTP Request node: GET http://docker-socket-proxy:2375/containers/json?all=true
+// Response: [{ "Id": "abc123", "Names": ["/plex"], "State": "running" }]
+
+// AFTER (Unraid GraphQL):
+// 1. Duplicate "Unraid API HTTP Template" node
+// 2. Set query body:
+{
+  "query": "query { docker { containers { id names state image } } }"
+}
+
+// 3. Wire: HTTP Request → GraphQL Response Normalizer → (existing downstream Code nodes)
+// Normalizer output: [{ "Id": "server_hash:container_hash", "Names": ["/plex"], "State": "running", "_unraidId": "..." }]
+```
+
+**Key pattern:** Normalizer transforms Unraid response to Docker contract — downstream nodes see identical data structure.
+
+**Source:** Phase 15-02 Plan (GraphQL Response Normalizer implementation)
+
+---
+
+### Pattern 2: GraphQL Mutation Migration (Container Start/Stop/Restart)
+
+**What:** Replace Docker API `POST /containers/{id}/start` with Unraid GraphQL `start(id: PrefixedID!)` mutation
+
+**When to use:** n8n-actions.json (start/stop/restart operations)
+
+**Example migration:**
+
+```javascript
+// BEFORE (Docker API):
+// HTTP Request: POST http://docker-socket-proxy:2375/v1.47/containers/abc123/start
+// On 304: Container already started (handled by existing Code node checking statusCode === 304)
+
+// AFTER (Unraid GraphQL):
+// 1. Look up PrefixedID from Container ID Registry (by container name)
+// 2. Call GraphQL mutation:
+{
+  "query": "mutation { docker { start(id: \"server_hash:container_hash\") { id state } } }"
+}
+
+// 3. Wire: HTTP Request → GraphQL Error Handler → (existing downstream Code nodes)
+// Error Handler maps ALREADY_IN_STATE error to { statusCode: 304, alreadyInState: true }
+// Existing Code node: if (response.statusCode === 304) { /* already started */ }
+```
+
+**RESTART special case:** No native `restart` mutation in Unraid GraphQL. Implement as sequential `stop` + `start`:
+
+```javascript
+// GraphQL has no restart mutation — use two operations:
+// 1. mutation { docker { stop(id: "...") { id state } } }
+// 2. mutation { docker { start(id: "...") { id state } } }
+// Wire: Stop HTTP → Error Handler → Start HTTP → Error Handler → Success Response
+```
+
+**Key pattern:** Error Handler maps GraphQL error codes to HTTP status codes (ALREADY_IN_STATE → 304) — existing Code nodes unchanged.
+
+**Source:** Unraid GraphQL schema (DockerMutations type), Phase 15-02 Plan (GraphQL Error Handler implementation)
+
+---
+
+### Pattern 3: Single Container Update Migration (5-Step Flow → 1 Mutation)
+
+**What:** Replace Docker's 5-step update flow with single `updateContainer(id: PrefixedID!)` mutation
+
+**When to use:** n8n-update.json (single container update), main workflow (text command "update \<name\>")
+
+**Current 5-step Docker flow:**
+1. Inspect container (get current config)
+2. Stop container
+3. Remove container
+4. Create container (with new image)
+5. Start container
+6. Remove old image (cleanup)
+
+**New 1-step Unraid flow:**
+```javascript
+// Single GraphQL mutation replaces entire flow:
+{
+  "query": "mutation { docker { updateContainer(id: \"server_hash:container_hash\") { id state image imageId } } }"
+}
+
+// Unraid internally handles: pull new image, stop, remove, recreate, start
+// Returns: Updated container object (normalized by GraphQL Response Normalizer)
+```
+
+**Success criteria verification:**
+- **Before:** Check old vs new image digest to confirm update happened
+- **After:** Unraid mutation updates `imageId` field — compare before/after values
+
+**Migration steps:**
+1. Get container name from user input
+2. Look up current container state (for "before" imageId comparison)
+3. Look up PrefixedID from Container ID Registry
+4. Call `updateContainer` mutation
+5. Normalize response
+6. Compare imageId: if different → updated, if same → no update available
+7. Return same success/failure messages as before
+
+**Key win:** Simpler flow, Unraid handles retry logic and state management, update badge clears automatically.
+
+**Source:** Unraid GraphQL schema (DockerMutations.updateContainer), WebSearch results (Unraid update implementation shells to Dynamix Docker Manager)
+
+---
+
+### Pattern 4: Batch Update Migration (Serial → Parallel)
+
+**What:** Replace N serial Docker update flows with single `updateContainers(ids: [PrefixedID!]!)` mutation
+
+**When to use:** Batch update (multiple container selection), "Update All :latest" feature
+
+**Example migration:**
+
+```javascript
+// BEFORE (Docker API): Loop over selected containers, call update flow N times serially
+// for (const container of selectedContainers) {
+//   await updateDockerContainer(container.id);  // 5-step flow each
+// }
+
+// AFTER (Unraid GraphQL):
+// 1. Look up all PrefixedIDs from Container ID Registry (by names)
+// 2. Single mutation:
+{
+  "query": "mutation { docker { updateContainers(ids: [\"id1\", \"id2\", \"id3\"]) { id state imageId } } }"
+}
+
+// Returns: Array of updated containers (each normalized)
+```
+
+**"Update All :latest" special case:**
+
+```javascript
+// Option 1: Filter in workflow Code node, call updateContainers
+// 1. Query all containers: query { docker { containers { id image } } }
+// 2. Filter where image.endsWith(':latest')
+// 3. Call updateContainers(ids: [...filteredIds])
+
+// Option 2: Use updateAllContainers mutation (updates everything, slower)
+{
+  "query": "mutation { docker { updateAllContainers { id state imageId } } }"
+}
+
+// Recommendation: Option 1 (filtered updateContainers) — matches current ":latest" filter behavior
+```
+
+**Key pattern:** Batch efficiency — 1 API call instead of N, Unraid handles parallelization internally.
+
+**Source:** Unraid GraphQL schema (DockerMutations.updateContainers, updateAllContainers)
+
+---
+
+### Pattern 5: Container ID Registry Usage
+
+**What:** All GraphQL mutations require Unraid's 129-character PrefixedID format — use Container ID Registry to map container names to IDs
+
+**When to use:** Every mutation call (start, stop, update), every inline keyboard callback (encode PrefixedID into 64-byte limit)
+
+**Workflow integration:**
+
+```javascript
+// 1. User input: container name (e.g., "plex")
+// 2. Look up in Container ID Registry:
+//    Input: { action: "lookup", containerName: "plex" }
+//    Output: { prefixedId: "server_hash:container_hash", found: true }
+// 3. Use prefixedId in GraphQL mutation
+// 4. Store result back in registry (cache refresh)
+
+// Cache refresh pattern:
+// After GraphQL query/mutation returns container data:
+//   Input: { action: "updateCache", containers: [...normalizedContainers] }
+//   Registry extracts Names[0] and Id, updates internal map
+```
+
+**Callback encoding:**
+
+```javascript
+// Inline keyboard callbacks (64-byte limit):
+// BEFORE: "s:abc123" (status, Docker ID)
+// AFTER: Use Callback Token Encoder
+//   Input: { containerName: "plex", action: "status" }
+//   Output: "s:1a2b3c4d" (8-char hash token, deterministic)
+//   Decoder: "s:1a2b3c4d" → lookup in registry → "plex" → get PrefixedID
+```
+
+**Key pattern:** Registry is the single source of truth for name ↔ PrefixedID mapping. Update it after every GraphQL query/mutation that returns container data.
+
+**Source:** Phase 15-01 Plan (Container ID Registry implementation)
+
+---
+
+### Anti-Patterns to Avoid
+
+- **Rewriting existing Code nodes:** GraphQL Normalizer exists to prevent this — use it
+- **Storing PrefixedIDs in Telegram callback data directly:** Too long (129 chars vs 64-byte limit) — use Callback Token Encoder
+- **Calling GraphQL mutations without Error Handler:** Skips ALREADY_IN_STATE → 304 mapping, breaks existing error logic
+- **Querying containers without updating Registry cache:** Stale ID lookups, mutations fail with "container not found"
+- **Using Docker container IDs in GraphQL calls:** Unraid expects PrefixedID format, Docker IDs are incompatible
+- **Implementing custom restart via low-level operations:** Unraid doesn't expose container create/remove — use stop + start pattern
+
+---
+
+## Don't Hand-Roll
+
+| Problem | Don't Build | Use Instead | Why |
+|---------|-------------|-------------|-----|
+| GraphQL response transformation | Custom mapping for each Code node | Phase 15 GraphQL Response Normalizer | 60+ Code nodes expect Docker contract, normalizer handles all |
+| Container ID translation | Ad-hoc lookups in each workflow | Phase 15 Container ID Registry | Single source of truth, cache management, name resolution |
+| Error code mapping | Custom error checks per node | Phase 15 GraphQL Error Handler | Standardized ALREADY_IN_STATE → 304, NOT_FOUND handling |
+| Callback data encoding | Custom compression/truncation | Phase 15 Callback Token Encoder | Deterministic 8-char hash, 64-byte limit compliance |
+| Restart mutation | Try to recreate container via GraphQL | Sequential stop + start | Unraid abstracts low-level ops, no create/remove exposed |
+
+**Key insight:** Phase 15 infrastructure was built specifically to make this migration low-risk. Using it prevents cascading changes across 60+ nodes.
+
+---
+
+## Common Pitfalls
+
+### Pitfall 1: Forgetting to Update Container ID Registry Cache
+
+**What goes wrong:** User updates container via bot. Next command uses stale registry cache, mutation fails with "container not found: server_hash:old_container_hash".
+
+**Why it happens:** `updateContainer` mutation recreates the container with a new ID (same as Docker update flow). Registry still has the old PrefixedID.
+
+**How to avoid:**
+1. After every GraphQL query/mutation that returns container data, wire through Registry's "updateCache" action
+2. Extract normalized containers from response, pass to Registry
+3. Registry refreshes name → PrefixedID mappings
+
+**Warning signs:**
+- Mutation succeeds, but next command on same container fails
+- "Container not found" errors after successful updates
+- Registry lookup returns PrefixedID that doesn't exist in Unraid
+
+**Prevention pattern:**
+```javascript
+// After updateContainer mutation:
+// 1. Normalize response (get updated container object)
+// 2. Update Registry cache:
+//    Input: { action: "updateCache", containers: [normalizedContainer] }
+// 3. Proceed with success message
+```
+
+**Source:** Docker behavior (container ID changes on recreate), Phase 15-01 design
+
+---
+
+### Pitfall 2: GraphQL Timeout on Slow Update Operations
+
+**What goes wrong:** `updateContainer` mutation for large container (10GB+ image) times out at 15 seconds, leaving container in intermediate state (stopped, old image removed).
+
+**Why it happens:** Phase 15 HTTP Template uses 15-second timeout for myunraid.net cloud relay latency. Container updates can take 30+ seconds for large images.
+
+**How to avoid:**
+1. **Increase timeout for update mutations specifically:** Duplicate HTTP Template, set timeout to 60000ms (60s) for updateContainer/updateContainers nodes
+2. **Keep 15s timeout for queries and quick mutations** (start/stop)
+3. Document in ARCHITECTURE.md: "Update operations have 60s timeout to accommodate large image pulls"
+
+**Warning signs:**
+- Timeout errors during container updates (not start/stop)
+- Containers stuck in "stopped" state after timeout
+- Unraid shows "pulling image" in Docker tab, but bot reports failure
+
+**Recommended timeouts by operation:**
+- Queries (containers list): 15s (current)
+- Start/stop/restart: 15s (current)
+- Single container update: 60s (increase)
+- Batch updates: 120s (increase further)
+
+**Source:** Real-world Docker image pull times (10GB+ images take 20-30s on gigabit), myunraid.net relay adds 200-500ms per request
+
+---
+
+### Pitfall 3: ALREADY_IN_STATE Not Mapped to HTTP 304
+
+**What goes wrong:** User taps "Start" on running container. GraphQL returns ALREADY_IN_STATE error. Existing Code node expects `statusCode === 304`, throws generic error instead of "already started" message.
+
+**Why it happens:** Forgetting to wire GraphQL Error Handler between HTTP Request and existing Code node.
+
+**How to avoid:**
+1. **Every GraphQL mutation HTTP Request node MUST wire through GraphQL Error Handler**
+2. Error Handler maps `error.extensions.code === "ALREADY_IN_STATE"` → `{ statusCode: 304, alreadyInState: true }`
+3. Existing Code nodes check `response.statusCode === 304` unchanged
+
+**Warning signs:**
+- Generic error messages instead of "Container already started"
+- Errors when user repeats same action (stop stopped container, etc.)
+- Code nodes throwing on ALREADY_IN_STATE instead of graceful handling
+
+**Correct wiring:**
+```
+HTTP Request (GraphQL mutation)
+  ↓
+GraphQL Error Handler (maps ALREADY_IN_STATE → 304)
+  ↓
+Existing Code node (checks statusCode === 304)
+```
+
+**Source:** Phase 15-02 Plan (GraphQL Error Handler implementation), n8n-actions.json existing pattern
+
+---
+
+### Pitfall 4: Restart Implementation Without Error Handling
+
+**What goes wrong:** Restart operation calls `stop` mutation, which fails with ALREADY_IN_STATE (container already stopped). Sequential `start` mutation never executes, user sees error.
+
+**Why it happens:** Implementing restart as sequential `stop` + `start` without ALREADY_IN_STATE tolerance.
+
+**How to avoid:**
+1. **Stop mutation:** Wire through Error Handler, **continue on 304** (already stopped is OK)
+2. **Start mutation:** Wire through Error Handler, fail on ALREADY_IN_STATE for start (indicates logic error)
+3. Use n8n "Continue On Fail" or explicit error checking in Code node
+
+**Correct implementation:**
+```
+1. Stop mutation → Error Handler
+   - On 304: Continue to start (container was already stopped, fine)
+   - On error: Fail restart operation
+2. Start mutation → Error Handler
+   - On success: Return "restarted" message
+   - On 304: Fail (container started during restart, unexpected)
+   - On error: Fail restart operation
+```
+
+**Alternative:** Check container state first, only stop if running. Adds latency but avoids ALREADY_IN_STATE on stop.
+
+**Source:** Unraid GraphQL schema (no native restart mutation), standard restart logic patterns
+
+---
+
+### Pitfall 5: Batch Update Progress Not Visible
+
+**What goes wrong:** User selects 10 containers for batch update. Bot sends "Updating..." then silence for 2 minutes, then "Done". User doesn't know if bot is working or stuck.
+
+**Why it happens:** `updateContainers` mutation is atomic — returns only after all containers updated. No progress events.
+
+**How to avoid:**
+1. **Keep existing Docker pattern:** Serial updates with Telegram message edits per container
+2. **Alternative (faster but no progress):** Use `updateContainers` mutation, send initial "Updating X containers..." then final result
+3. **Hybrid (recommended):** Small batches (≤5) use `updateContainers` for speed, large batches (>5) use serial with progress
+
+**Implementation for hybrid:**
+```javascript
+// In batch update Code node:
+if (selectedContainers.length <= 5) {
+  // Fast path: Single updateContainers mutation
+  const ids = selectedContainers.map(c => lookupPrefixedId(c.name));
+  await updateContainers(ids);
+  return { message: `Updated ${selectedContainers.length} containers` };
+} else {
+  // Progress path: Serial updates with Telegram edits
+  for (const container of selectedContainers) {
+    await updateContainer(container.prefixedId);
+    await editTelegramMessage(`Updated ${i}/${total}: ${container.name}`);
+  }
+}
+```
+
+**Tradeoff:** Progress visibility vs speed. User decision from v1.2 batch work: progress is important.
+
+**Source:** v1.2 batch operations design, user feedback on "silent operations"
+
+---
+
+### Pitfall 6: Update Badge Still Shows After Bot Update
+
+**What goes wrong:** User updates container via bot. Unraid Docker tab still shows "apply update" badge. User clicks badge, update completes instantly (image already cached).
+
+**Why it happens:** This is **the problem v1.4 solves**. If it still occurs, GraphQL mutation isn't properly clearing Unraid's internal update tracking.
+
+**How to avoid:**
+1. **Verify GraphQL mutation returns success** (not just HTTP 200, but valid container object)
+2. **Check Unraid version:** Update badge sync requires Unraid 7.2+ or Connect plugin with recent version
+3. **Test in real environment:** Synthetic tests may not reveal badge state issues
+
+**Verification test:**
+```bash
+# 1. Via bot: Update container
+# 2. Check Unraid Docker tab: Badge should be GONE
+# 3. If badge remains: Check Unraid logs for GraphQL mutation execution
+# 4. If logs show success but badge remains: Unraid bug, report to Unraid team
+```
+
+**Expected behavior (success):** After `updateContainer` mutation completes, refreshing Unraid Docker tab shows no update badge for that container.
+
+**If badge persists:** Check Unraid API version, verify mutation actually executed (not just HTTP success), check Unraid internal logs (`/var/log/syslog`).
+
+**Source:** v1.3 Known Limitations (update badge issue), v1.4 migration goal, Unraid GraphQL API design
+
+---
+
+## Code Examples
+
+### Container List Query Migration
+
+```javascript
+// BEFORE (Docker API):
+// HTTP Request node: GET http://docker-socket-proxy:2375/containers/json?all=true
+// Next node (Code): processes response as-is
+
+// AFTER (Unraid GraphQL):
+// HTTP Request node (duplicate "Unraid API HTTP Template"):
+{
+  "method": "POST",
+  "url": "={{ $env.UNRAID_HOST }}/graphql",
+  "body": {
+    "query": "query { docker { containers { id names state image } } }"
+  }
+}
+
+// Wire: HTTP Request → GraphQL Response Normalizer → Update Container ID Registry → (existing Code nodes)
+
+// Normalizer transforms:
+// IN:  { data: { docker: { containers: [{ id: "hash:hash", names: ["/plex"], state: "RUNNING" }] } } }
+// OUT: [{ Id: "hash:hash", Names: ["/plex"], State: "running", _unraidId: "hash:hash" }]
+
+// Registry update (Code node after normalizer):
+const containers = $input.all().map(item => item.json);
+const registryInput = {
+  action: "updateCache",
+  containers: containers
+};
+// Pass to Container ID Registry node
+
+// Existing Code nodes see Docker API format unchanged
+```
+
+**Source:** Phase 15-02 normalizer implementation, ARCHITECTURE.md Docker API contract
+
+---
+
+### Container Start Mutation Migration
+
+```javascript
+// BEFORE (Docker API):
+// HTTP Request: POST http://docker-socket-proxy:2375/v1.47/containers/abc123/start
+// Code node checks: if (response.statusCode === 304) { /* already started */ }
+
+// AFTER (Unraid GraphQL):
+// Step 1: Lookup PrefixedID (Code node before HTTP Request)
+const containerName = $json.containerName; // From upstream input
+const registryLookup = {
+  action: "lookup",
+  containerName: containerName
+};
+// Pass to Container ID Registry → returns { prefixedId: "...", found: true }
+
+// Step 2: Build mutation (Code node prepares GraphQL body)
+const prefixedId = $('Container ID Registry').item.json.prefixedId;
+return {
+  json: {
+    query: `mutation { docker { start(id: "${prefixedId}") { id state } } }`
+  }
+};
+
+// Step 3: Execute mutation (HTTP Request, uses Unraid API HTTP Template)
+// Body: {{ $json.query }}
+
+// Step 4: Handle errors (wire through GraphQL Error Handler)
+// Error Handler maps ALREADY_IN_STATE → { statusCode: 304, alreadyInState: true }
+
+// Step 5: Existing Code node (unchanged)
+const response = $input.item.json;
+if (response.statusCode === 304) {
+  return { json: { message: "Container already started" } };
+}
+if (response.success) {
+  return { json: { message: "Container started successfully" } };
+}
+```
+
+**Source:** Phase 15 utility node integration, n8n-actions.json existing error handling
+
+---
+
+### Single Container Update Mutation Migration
+
+```javascript
+// BEFORE (Docker API 5-step flow in n8n-update.json):
+// 1. Inspect container → get image digest
+// 2. Stop container
+// 3. Remove container
+// 4. Create container (pulls new image)
+// 5. Start container
+// 6. Remove old image
+// Total: 6 HTTP Request nodes, 8 Code nodes for orchestration
+
+// AFTER (Unraid GraphQL):
+// Step 1: Get current container state (for imageId comparison)
+const containerName = $json.containerName;
+// Query: { docker { containers { id image imageId } } } (filter by name)
+
+// Step 2: Lookup PrefixedID
+// Registry input: { action: "lookup", containerName: containerName }
+
+// Step 3: Single mutation
+const prefixedId = $('Container ID Registry').item.json.prefixedId;
+const oldImageId = $json.currentImageId; // From step 1
+return {
+  json: {
+    query: `mutation { docker { updateContainer(id: "${prefixedId}") { id state image imageId } } }`
+  }
+};
+
+// Step 4: Execute mutation (HTTP Request with 60s timeout)
+
+// Step 5: Normalize response and check if updated
+// GraphQL Response Normalizer → Code node:
+const response = $input.item.json;
+const newImageId = response.imageId;
+const updated = (newImageId !== oldImageId);
+
+if (updated) {
+  return {
+    json: {
+      success: true,
+      updated: true,
+      message: `Updated ${containerName}: ${oldImageId.slice(0,12)} → ${newImageId.slice(0,12)}`
+    }
+  };
+} else {
+  return {
+    json: {
+      success: true,
+      updated: false,
+      message: `No update available for ${containerName}`
+    }
+  };
+}
+
+// Total: 3 HTTP Request nodes (query current, lookup ID, update mutation), 3 Code nodes
+// Reduction: 6 → 3 HTTP nodes, 8 → 3 Code nodes
+```
+
+**Source:** n8n-update.json current implementation, Unraid GraphQL schema updateContainer mutation
+
+---
+
+### Batch Update Migration
+
+```javascript
+// BEFORE (Docker API): Loop in Code node, Execute Workflow sub-workflow call per container (serial)
+
+// AFTER (Unraid GraphQL):
+// Option A: Small batch (≤5 containers) — parallel mutation
+const selectedNames = $json.selectedContainers.split(',');
+
+// Lookup all PrefixedIDs
+const ids = [];
+for (const name of selectedNames) {
+  const result = lookupInRegistry(name); // Call Registry node
+  ids.push(result.prefixedId);
+}
+
+// Single mutation
+return {
+  json: {
+    query: `mutation { docker { updateContainers(ids: ${JSON.stringify(ids)}) { id state imageId } } }`
+  }
+};
+
+// HTTP Request (120s timeout for batch) → Normalizer → Success message
+
+// Option B: Large batch (>5 containers) — serial with progress
+// Keep existing pattern: loop + Execute Workflow calls, replace inner logic with GraphQL mutation
+
+// Hybrid recommendation:
+const batchSize = selectedNames.length;
+if (batchSize <= 5) {
+  // Use updateContainers mutation (Option A)
+} else {
+  // Use serial loop with Telegram progress updates (Option B)
+}
+```
+
+**Source:** n8n-batch-ui.json, Unraid GraphQL schema updateContainers mutation
+
+---
+
+### Restart Implementation (Sequential Stop + Start)
+
+```javascript
+// Unraid has no native restart mutation — implement as two operations
+
+// Step 1: Stop mutation (tolerate ALREADY_IN_STATE)
+const prefixedId = $json.prefixedId;
+return {
+  json: {
+    query: `mutation { docker { stop(id: "${prefixedId}") { id state } } }`
+  }
+};
+
+// HTTP Request → GraphQL Error Handler
+// Error Handler output: { statusCode: 304, alreadyInState: true } OR { success: true }
+
+// Step 2: Check stop result (Code node)
+const stopResult = $input.item.json;
+if (stopResult.statusCode === 304 || stopResult.success) {
+  // Container stopped (or was already stopped) — proceed to start
+  return { json: { proceedToStart: true } };
+}
+// Other errors fail the restart
+
+// Step 3: Start mutation
+return {
+  json: {
+    query: `mutation { docker { start(id: "${prefixedId}") { id state } } }`
+  }
+};
+
+// HTTP Request → Error Handler → Success
+
+// Wiring: Stop HTTP → Error Handler → Check Result IF → Start HTTP → Error Handler → Format Result
+```
+
+**Source:** Unraid GraphQL schema (no restart mutation), standard restart implementation pattern
+
+---
+
+## State of the Art
+
+| Old Approach | Current Approach | When Changed | Impact |
+|--------------|------------------|--------------|--------|
+| Docker REST API via socket proxy | Unraid GraphQL API via myunraid.net relay | This phase (v1.4) | Single API, update badge sync, no proxy security boundary |
+| 5-step update flow (stop/remove/create/start) | Single `updateContainer` mutation | This phase | Simpler, faster, Unraid handles retry logic |
+| Serial batch updates with progress | `updateContainers` plural mutation for small batches | This phase | Parallel execution, faster for ≤5 containers |
+| Docker 64-char container IDs | Unraid 129-char PrefixedID with Registry mapping | Phase 15-16 | Requires translation layer, but enables GraphQL API |
+| Manual "Apply Update" in Unraid UI | Automatic badge clear via GraphQL | This phase | Core user pain point solved |
+
+**Deprecated/outdated:**
+- **docker-socket-proxy container:** Removed in Phase 17, GraphQL API replaces Docker socket access
+- **Container logs feature:** Removed in Phase 17, not valuable enough to maintain hybrid architecture
+- **Direct Docker container ID storage:** Replaced by Container ID Registry lookups (PrefixedID required)
+
+**Current best practice (post-Phase 16):** All container operations via Unraid GraphQL API. Docker socket proxy is legacy artifact.
+
+---
+
+## Open Questions
+
+1. **Actual updateContainer mutation timeout needs**
+   - What we know: Large images (10GB+) can take 30+ seconds to pull
+   - What's unclear: Does myunraid.net relay timeout separately? Will 60s be enough for all cases?
+   - Recommendation: Start with 60s timeout, add workflow logging to capture actual duration, adjust if needed
+
+2. **Batch update progress tradeoff**
+   - What we know: `updateContainers` is fast but silent, serial updates show progress but slow
+   - What's unclear: User preference — speed or visibility?
+   - Recommendation: Hybrid approach (≤5 fast, >5 with progress), can adjust threshold based on user feedback
+
+3. **Restart error handling edge cases**
+   - What we know: Stop + start pattern works, need to tolerate ALREADY_IN_STATE on stop
+   - What's unclear: What if container exits between stop and start? Retry logic needed?
+   - Recommendation: Implement basic stop→start, add retry if real-world issues occur
+
+4. **Container ID Registry cache invalidation**
+   - What we know: Registry caches name → PrefixedID mapping, must refresh after updates
+   - What's unclear: Cache expiry strategy? Time-based TTL or event-driven only?
+   - Recommendation: Event-driven only (update after every GraphQL query/mutation), no TTL needed
+
+---
+
+## Sources
+
+### Primary (HIGH confidence)
+- [Unraid GraphQL Schema](https://raw.githubusercontent.com/unraid/api/main/api/generated-schema.graphql) — Mutation signatures, DockerContainer type fields
+- [Using the Unraid API](https://docs.unraid.net/API/how-to-use-the-api/) — Authentication, endpoint, rate limiting
+- Phase 15-01 Plan — Container ID Registry, Callback Token Encoder/Decoder implementation
+- Phase 15-02 Plan — GraphQL Response Normalizer, Error Handler, HTTP Template implementation
+- ARCHITECTURE.md — Current Docker API contracts, workflow node breakdown, error patterns
+
+### Secondary (MEDIUM confidence)
+- [Docker and VM Integration | Unraid API](https://deepwiki.com/unraid/api/2.4.2-notification-system) — Unraid update implementation details (shells to Dynamix Docker Manager)
+- [Core Services | Unraid API](https://deepwiki.com/unraid/api/2.4-docker-integration) — DockerService retry logic (5 polling attempts at 500ms intervals)
+- n8n-update.json — Current 5-step Docker update flow implementation
+- n8n-actions.json — Current start/stop error handling pattern (statusCode === 304 check)
+- n8n-status.json — Current container list query pattern
+
+### Tertiary (LOW confidence)
+- Community forum posts on Unraid container updates — Anecdotal timing data for large image pulls
+- Real-world myunraid.net relay latency observations — 200-500ms baseline from Phase 14 testing
+
+---
+
+## Metadata
+
+**Confidence breakdown:**
+- Standard stack: HIGH — Unraid GraphQL API verified in Phase 14, Phase 15 infrastructure already built
+- Architecture: HIGH — Migration patterns are straightforward substitutions, Phase 15 utilities handle complexity
+- Pitfalls: MEDIUM-HIGH — Most are standard API migration issues, actual timeout needs and batch tradeoffs require real-world testing
+
+**Research date:** 2026-02-09
+**Valid until:** 60 days (Unraid GraphQL API stable, schema changes infrequent)
+
+**Critical dependencies for planning:**
+- Phase 15 utility nodes deployed and tested (Container ID Registry, GraphQL Normalizer, Error Handler, HTTP Template)
+- Phase 14 Unraid API access verified (credentials, network connectivity, authentication working)
+- n8n workflow JSON structure understood (node IDs, connections, typeVersion patterns from CLAUDE.md)
+
+**Migration risk assessment:**
+- **Low risk:** Container queries (status, list) — direct substitution, normalizer handles response shape
+- **Medium risk:** Container lifecycle (start/stop/restart) — ALREADY_IN_STATE error mapping critical, restart needs sequential implementation
+- **Medium risk:** Single container update — timeout configuration important, imageId comparison for success detection
+- **Medium-high risk:** Batch updates — tradeoff between speed and progress visibility, hybrid approach recommended
+
+**Ready for planning:** YES — Clear migration patterns identified, Phase 15 infrastructure ready, pitfalls documented, code examples provided for each operation type.