diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md index 00e5392..0b99dfd 100644 --- a/.planning/ROADMAP.md +++ b/.planning/ROADMAP.md @@ -52,10 +52,12 @@ -### 🚧 v1.4 Unraid API Native (In Progress) +### ⏸️ v1.4 Unraid API Native (PAUSED β€” waiting for Unraid 7.3) **Milestone Goal:** Replace Docker socket proxy with Unraid's GraphQL API for all container operations, remove container logs feature, and clean up all proxy artifacts. +**PAUSED:** UAT revealed `updateContainer` mutation only ships in Unraid 7.3+ (not yet released). Status queries and start/stop/restart work via GraphQL, but update operations require the missing mutation. v1.3 workflows restored to n8n for stable production use. Resume when Unraid 7.3 ships. + #### Phase 15: Infrastructure Foundation **Goal**: Data transformation layers ready for Unraid API integration **Depends on**: Phase 14 @@ -139,11 +141,12 @@ Phases execute in numeric order: 1-14 (complete) β†’ 15 β†’ 16 β†’ 17 β†’ 18 | 13 | Documentation Overhaul | v1.2 | 1/1 | Complete | 2026-02-08 | | 14 | Unraid API Access | v1.3 | 2/2 | Complete | 2026-02-08 | | 15 | Infrastructure Foundation | v1.4 | 2/2 | Complete | 2026-02-09 | -| 16 | API Migration | v1.4 | 6/6 | Complete | 2026-02-09 | -| 17 | Cleanup | v1.4 | 0/? | Not started | - | -| 18 | Documentation | v1.4 | 0/? | Not started | - | +| 16 | API Migration | v1.4 | 6/6 | UAT: 6/9 pass, blocked on Unraid 7.3 | 2026-02-09 | +| 17 | Cleanup | v1.4 | 0/? | PAUSED | - | +| 18 | Documentation | v1.4 | 0/? | PAUSED | - | -**Total: 4 milestones shipped (14 phases, 50 plans), v1.4 in progress (Phase 15-16 complete, 8/10 plans)** +**Total: 4 milestones shipped (14 phases, 50 plans), v1.4 PAUSED (blocked on Unraid 7.3 updateContainer mutation)** +**Production: v1.3 workflows running on n8n** --- -*Updated: 2026-02-09 β€” Phase 16 complete (6/6 plans, all container operations use GraphQL)* +*Updated: 2026-02-09 β€” v1.4 PAUSED, v1.3 restored to n8n. Resume when Unraid 7.3 ships.* diff --git a/.planning/STATE.md b/.planning/STATE.md index 57cc46a..6e788c3 100644 --- a/.planning/STATE.md +++ b/.planning/STATE.md @@ -2,10 +2,10 @@ ## Current Position -- **Milestone:** v1.4 Unraid API Native -- **Phase:** 16 of 18 (API Migration) - Complete (6/6 plans) -- **Status:** Phase 16 fully complete, all 6 plans finished (including gap closure) -- **Last activity:** 2026-02-09 β€” Phase 16-06 complete (text command paths migrated to GraphQL, zero Execute Command nodes remain) +- **Milestone:** v1.4 Unraid API Native β€” PAUSED +- **Phase:** 16 of 18 (API Migration) - Paused (UAT revealed API limitation) +- **Status:** PAUSED β€” Unraid GraphQL `updateContainer` mutation requires Unraid 7.3+ (not yet released). v1.3 workflows restored to n8n. +- **Last activity:** 2026-02-09 β€” v1.4 paused, v1.3 workflows pushed to n8n as stable rollback ## Project Reference @@ -13,7 +13,7 @@ See: .planning/PROJECT.md (updated 2026-02-09) **Core value:** When you get a container update notification or notice a service is down, you can immediately investigate and act from your phone. -**Current focus:** v1.4 Unraid API Native β€” replace Docker socket proxy with Unraid GraphQL API +**Current focus:** PAUSED β€” waiting for Unraid 7.3 to ship `updateContainer` GraphQL mutation ## Progress @@ -22,11 +22,28 @@ v1.0: [**********] 100% SHIPPED (Phases 1-5, 12 plans) v1.1: [**********] 100% SHIPPED (Phases 6-9, 11 plans) v1.2: [**********] 100% SHIPPED (Phases 10-13 + 10.1-10.2, 25 plans) v1.3: [**********] 100% SHIPPED (Phase 14, 2 plans β€” descoped) -v1.4: [********..] 80% IN PROGRESS (Phases 15-18, 8 of 10 plans) +v1.4: [******....] 60% PAUSED (Phases 15-18 β€” blocked on Unraid 7.3) -Overall: 4 milestones shipped (14 phases, 50 plans), v1.4 in progress (Phase 15: 2/2, Phase 16: 6/6, Phase 17: 0/? pending) +Overall: 4 milestones shipped (14 phases, 50 plans), v1.4 paused +Running in production: v1.3 (Docker socket proxy architecture) ``` +## Why Paused + +**UAT on Phase 16 revealed:** The Unraid GraphQL API (v4.25-4.28, Unraid 7.2.x) only exposes `start` and `stop` Docker mutations. The `updateContainer`, `updateContainers`, and `updateAllContainers` mutations exist in the API source code (commit 277ac42046, 2025-12-18) but are tagged for **Unraid 7.3+** which has not been released. + +**UAT results (6 passed, 3 blocked):** +- PASS: Container list, status submenu, start, stop, restart, idempotent start +- BLOCKED: Single container update, batch update, text commands (all depend on `updateContainer` mutation) + +**What's ready for Unraid 7.3:** +- All Phase 15 infrastructure (Container ID Registry, GraphQL Normalizer, Error Handler) +- Phase 16 workflow code for status queries, start/stop/restart (all working) +- Phase 16 workflow code for updates (correct mutation signatures, just needs API availability) +- Debug fixes: batch cancel wiring, text command paired item fix, batch confirmation HTTP node + +**Resume trigger:** Unraid 7.3 release β†’ re-run UAT β†’ fix any remaining issues β†’ continue Phase 17-18 + ## Performance Metrics **Velocity:** @@ -44,93 +61,48 @@ Overall: 4 milestones shipped (14 phases, 50 plans), v1.4 in progress (Phase 15: | v1.3 | 2 | 1 day | ~2 minutes | | v1.4 | 8 | 29 minutes | 3.6 minutes | -**Phase 15 Details:** - -| Plan | Duration | Tasks | Files | -|------|----------|-------|-------| -| 15-01 | 6 min | 2 | 1 | -| 15-02 | 5 min | 2 | 1 | - -**Phase 16 Details:** - -| Plan | Duration | Tasks | Files | -|------|----------|-------|-------| -| 16-01 | 2 min | 1 | 1 | -| 16-02 | 3 min | 2 | 1 | -| 16-03 | 2 min | 1 | 1 | -| 16-04 | 2 min | 1 | 1 | -| 16-05 | 8 min | 3 | 1 | -| 16-06 | 3 min | 1 | 1 | - ## Accumulated Context ### Decisions Decisions are logged in PROJECT.md Key Decisions table. -Key decisions from v1.3 and v1.4 planning: +- [v1.4] PAUSE β€” Unraid 7.2.x lacks updateContainer mutation, resume when 7.3 ships +- [v1.4] ROLLBACK β€” v1.3 workflows restored to n8n for stable production use - [v1.4] Remove container logs feature entirely (not valuable enough to justify hybrid architecture) - [v1.4] Remove docker-socket-proxy completely (clean single-API architecture) - [v1.3] Descope to Phase 14 only β€” Phases 15-16 superseded by v1.4 Unraid API Native - [v1.3] myunraid.net cloud relay for Unraid API (direct LAN IP fails due to nginx redirect) -- [v1.3] Environment variables for Unraid API auth (more reliable than n8n Header Auth) -- [Phase 15-02]: GraphQL normalizer keeps full Unraid PrefixedID (Container ID Registry handles translation) -- [Phase 15-02]: ALREADY_IN_STATE error maps to HTTP 304 (matches Docker API pattern) -- [Phase 15-02]: 15-second timeout for myunraid.net cloud relay (200-500ms latency + safety margin) -- [Phase 15]: Token encoder uses 8-char hex (not base64) for deterministic collision avoidance via hash window offsets -- [Phase 15]: Container ID Registry stores full PrefixedID (129-char) as-is for downstream consumers -- [Phase 16-01]: Use inline Code nodes for normalizer and registry updates (sub-workflows cannot cross-reference parent workflow utility nodes) -- [Phase 16-01]: Same GraphQL query for all 3 status paths (downstream Code nodes filter/process as needed) -- [Phase 16-01]: Update Container ID Registry after every status query (keeps mapping fresh for mutations) -- [Phase 16-02]: Restart as sequential stop+start (no native GraphQL restart mutation) -- [Phase 16-02]: ALREADY_IN_STATE errors map to HTTP 304 (idempotent operation tolerance) -- [Phase 16-02]: Format Result nodes unchanged (GraphQL Error Handler maps to existing patterns) -- [Phase 16-03]: 60-second timeout for updateContainer (accommodates 10GB+ images, was 600s for docker pull) -- [Phase 16-03]: ImageId field comparison determines update success (not image digest like Docker) -- [Phase 16-03]: Error routing uses IF node after Handle Update Response (Code nodes have single output) -- [Phase 16-04]: 5 identical normalizer nodes per query path (n8n architectural constraint) -- [Phase 16-04]: 15-second timeout for myunraid.net cloud relay (200-500ms latency + safety margin) -- [Phase 16-05]: Callback data uses names, not IDs - token encoding unnecessary (names fit within 64-byte limit) -- [Phase 16-05]: Batch size threshold of 5 containers for parallel vs serial update (small batches parallel, large batches show progress) -- [Phase 16-05]: 120-second timeout for batch updateContainers mutation (accommodates multiple large image pulls) ### Pending Todos -None. +- Monitor Unraid 7.3 release for `updateContainer` mutation availability +- When 7.3 ships: re-run `/gsd:verify-work 16` to validate update operations ### Blockers/Concerns -**v1.4 architectural risks (from research):** -- Container ID format translation critical (Docker 64-char hex vs Unraid 129-char PrefixedID) -- Telegram callback data 64-byte limit with longer IDs requires encoding redesign -- GraphQL response normalization must prevent cascading failures across 60+ Code nodes -- myunraid.net cloud relay adds 200-500ms latency (timeout configuration needed) - -**Next phase readiness:** -- Phase 15 complete (both plans) β€” All infrastructure utility nodes ready -- Phase 16 complete (all 6 plans) β€” Full GraphQL migration successful, gap closure done -- Complete utility node suite: Container ID Registry, Token Encoder/Decoder, GraphQL Normalizer, Error Handler -- Hybrid batch update: parallel for small batches (<=5), serial with progress for large batches -- Phase 17 ready: Remove docker-socket-proxy from infrastructure -- No blockers +**BLOCKING: Unraid 7.3 not released** +- `updateContainer(id: PrefixedID!)` β€” single container update +- `updateContainers(ids: [PrefixedID!]!)` β€” batch update +- `updateAllContainers` β€” update all with available updates +- All three mutations exist in API source (commit 277ac42046) but only ship in Unraid 7.3+ +- Current server runs Unraid 7.2.x with API v4.25-4.28 (only `start`/`stop` mutations) ## Key Artifacts -- `n8n-workflow.json` -- Main workflow (187 nodes β€” fully migrated to GraphQL, zero Execute Command nodes) -- `n8n-batch-ui.json` -- Batch UI sub-workflow (migrated to GraphQL) -- ID: `ZJhnGzJT26UUmW45` -- `n8n-status.json` -- Container Status sub-workflow (17 nodes, migrated to GraphQL) -- ID: `lqpg2CqesnKE2RJQ` -- `n8n-confirmation.json` -- Confirmation Dialogs sub-workflow (16 nodes) -- ID: `fZ1hu8eiovkCk08G` -- `n8n-update.json` -- Container Update sub-workflow (29 nodes, migrated to GraphQL) -- ID: `7AvTzLtKXM2hZTio92_mC` -- `n8n-actions.json` -- Container Actions sub-workflow (22 nodes, migrated to GraphQL) -- ID: `fYSZS5PkH0VSEaT5` -- `n8n-logs.json` -- Container Logs sub-workflow (9 nodes) -- ID: `oE7aO2GhbksXDEIw` -- TO BE REMOVED -- `n8n-matching.json` -- Container Matching sub-workflow (23 nodes) -- ID: `kL4BoI8ITSP9Oxek` -- `ARCHITECTURE.md` -- Full architecture docs, contracts, and node analysis +**Production (v1.3 β€” running on n8n):** +- `n8n-workflow.json` -- Main workflow (v1.3, Docker socket proxy architecture) +- All 7 sub-workflows at v1.3 state pushed to n8n + +**Development (v1.4 β€” on branch gsd/v1.0-unraid-api-native):** +- Phase 15-16 work preserved in git (GraphQL migration code ready for Unraid 7.3) +- UAT and debug reports in `.planning/phases/16-api-migration/` ## Session Continuity Last session: 2026-02-09 -Stopped at: Phase 16-06 complete (gap closure β€” all text command paths migrated to GraphQL) -Next step: Phase 17 (Cleanup) - remove container logs feature, docker-socket-proxy references, and proxy artifacts +Stopped at: v1.4 PAUSED β€” v1.3 restored to n8n, waiting for Unraid 7.3 +Next step: When Unraid 7.3 releases β†’ re-run Phase 16 UAT β†’ continue to Phase 17-18 --- *Auto-maintained by GSD workflow*