328442554c
Restructured as a proper technical architecture document: - Added Observability section (correlation IDs, structured errors, debugging) - Reorganized into logical flow: overview, request flow, contracts, internals - Removed stale rollback/backup references - Updated all references in README, CLAUDE.md, PROJECT.md, STATE.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
296 lines
16 KiB
Markdown
296 lines
16 KiB
Markdown
# Project State -- Unraid Docker Manager
|
|
|
|
## Current Position
|
|
|
|
- **Milestone:** v1.2 SHIPPED
|
|
- **Status:** Milestone archived, tagged v1.2
|
|
- **Last activity:** 2026-02-08 -- v1.2 milestone completed and archived
|
|
|
|
## Project Reference
|
|
|
|
See: .planning/PROJECT.md (updated 2026-02-08)
|
|
|
|
**Core value:** When you get a container update notification or notice a service is down, you can immediately investigate and act from your phone.
|
|
**Current focus:** Planning next milestone
|
|
|
|
## Progress
|
|
|
|
```
|
|
v1.0: [**********] 100% SHIPPED
|
|
v1.1: [**********] 100% SHIPPED
|
|
|
|
v1.2: [**********] 100% SHIPPED
|
|
|
|
Phase 10: Workflow Modularization [**********] 100% COMPLETE (+ 10-07 UAT fixes)
|
|
Phase 10.1: Aggressive Modularization [**********] 100% COMPLETE (9/9 plans + UAT closure)
|
|
Phase 10.2: Better Logging & Log Management [**********] 100% COMPLETE (4/4 plans complete)
|
|
Phase 11: Update All & Callback Limits [**********] 100% COMPLETE (2/2 plans, UAT 6/6 pass)
|
|
Phase 12: Polish & Audit [**********] 100% COMPLETE (2/2 plans, all requirements closed)
|
|
Phase 13: Documentation Overhaul [**********] 100% COMPLETE (1/1 plan, README overhaul)
|
|
```
|
|
|
|
## Phase 10 Completion Summary
|
|
|
|
| Plan | Description | Status |
|
|
|------|-------------|--------|
|
|
| 10-01 | Orphan node cleanup | Complete |
|
|
| 10-02 | Container Update sub-workflow | Complete |
|
|
| 10-03 | Container Actions sub-workflow | Complete |
|
|
| 10-04 | Integration verification | Complete |
|
|
| 10-05 | Complete modularization (batch, logs) | Complete |
|
|
| 10-06 | Remediation: routing, logs, cleanup | Complete |
|
|
| 10-07 | UAT gap closure (5 fixes) | Complete |
|
|
|
|
**Achievements:**
|
|
- 3 sub-workflows created and deployed (Update, Actions, Logs)
|
|
- All container operations consolidated (no duplicate logic)
|
|
- Old inline batch execution path removed
|
|
- Legacy callbacks modernized to new format
|
|
- Main workflow: 209 -> 192 nodes (-8%)
|
|
- 6 Python helper scripts removed
|
|
- UAT gaps closed: race condition, data chain errors, fuzzy matching, refresh errors
|
|
|
|
## Key Artifacts
|
|
|
|
- `n8n-workflow.json` -- Main workflow (166 nodes: structural minimum achieved, orphan callback chain removed)
|
|
- `n8n-batch-ui.json` -- Batch UI sub-workflow (17 nodes: 16 baseline + 1 Fetch Containers For Exec) -- ID: `ZJhnGzJT26UUmW45`
|
|
- `n8n-status.json` -- Container Status sub-workflow (11 nodes) -- ID: `lqpg2CqesnKE2RJQ`
|
|
- `n8n-confirmation.json` -- Confirmation Dialogs sub-workflow (16 nodes) -- ID: `fZ1hu8eiovkCk08G`
|
|
- `n8n-update.json` -- Container Update sub-workflow (34 nodes) -- ID: `7AvTzLtKXM2hZTio92_mC`
|
|
- `n8n-actions.json` -- Container Actions sub-workflow (11 nodes) -- ID: `fYSZS5PkH0VSEaT5`
|
|
- `n8n-logs.json` -- Container Logs sub-workflow (9 nodes) -- ID: `oE7aO2GhbksXDEIw`
|
|
- `n8n-matching.json` -- Container Matching sub-workflow (23 nodes) -- ID: `kL4BoI8ITSP9Oxek`
|
|
- `ARCHITECTURE.md` -- Full architecture docs, contracts, and node analysis
|
|
|
|
## Technical Notes
|
|
|
|
**n8n typeVersion 1.2 requirement:**
|
|
```json
|
|
"workflowId": { "__rl": true, "mode": "list", "value": "<id>" }
|
|
```
|
|
|
|
**Docker API success detection:**
|
|
- 204 No Content = success (empty response body)
|
|
- Check `!response.message && !response.error`
|
|
|
|
**Sub-workflow input contracts:**
|
|
- Container Update: containerId, containerName, chatId, messageId, responseMode
|
|
- Container Actions: containerId, containerName, action, chatId, messageId, responseMode
|
|
- Container Logs: containerId/containerName, lineCount, chatId, messageId, responseMode
|
|
- Batch UI: chatId, messageId, queryId, callbackData, action, batchPage, selectedCsv, toggleName, batchAction
|
|
- Container Status: chatId, messageId, action, containerId, containerName, page, queryId, searchTerm
|
|
- Confirmation: chatId, messageId, action, containerId, containerName, confirmAction, confirmationToken, expired, responseMode
|
|
- Matching: action, containerList, searchTerm, selectedContainers, chatId, messageId
|
|
|
|
**Sub-workflow output patterns:**
|
|
- Batch UI returns `action` field (keyboard/execute/cancel)
|
|
- Container Status returns `action` field (list/status/paginate)
|
|
- Confirmation returns `action` field (show_stop/show_update/confirm_stop_result/confirm_update/cancel/expired)
|
|
- Matching returns `action` field (matched/multiple/no_match/error/suggestion/batch_matched/disambiguation/not_found + update variants)
|
|
- Main workflow routes based on action to appropriate Telegram response handler
|
|
|
|
**Data chain pattern (10-07):**
|
|
- Use `$('Build Progress Message').item.json` to reference data across async nodes
|
|
- Do not rely on `$json` after Telegram API calls (response overwrites data)
|
|
|
|
**Dynamic input pattern (10-09):**
|
|
- Use `$input.item.json` for nodes with multiple predecessors
|
|
- Matching sub-workflow returns both `action` (routing label) and `actionType` (user's requested action)
|
|
|
|
## Accumulated Decisions
|
|
|
|
| Phase | Decision | Rationale |
|
|
|-------|----------|-----------|
|
|
| 10-05 | Use placeholder workflow ID for logs sub-workflow | ID assigned by n8n on import |
|
|
| 10-05 | Retain Parse Logs Command in main workflow | Handles error cases before sub-workflow call |
|
|
| 10-06 | Remove old batch inline path | Migrated to bexec: callback format, uses sub-workflow |
|
|
| 10-06 | Defer aggressive modularization to 10.1 | Core goals achieved, deeper work needs separate phase |
|
|
| 10-07 | Timestamp on logs refresh | Prevents "message not modified" error, shows freshness |
|
|
| 10-07 | Fuzzy matching in logs sub-workflow | Simpler than duplicating Docker query infrastructure |
|
|
| 10.1-01 | Realistic target 115-125 nodes (not 50-80) | 58 Telegram response nodes locked to main workflow |
|
|
| 10.1-01 | Wave 2: Batch UI + Container List extraction | Highest-value domains with clear boundaries |
|
|
| 10.1-02 | Partial batch UI extraction (UI only, not loop) | Batch execution loop cannot be in sub-workflow due to n8n limitations |
|
|
| 10.1-02 | Action-based sub-workflow routing | Sub-workflow returns action field, main routes to Telegram handlers |
|
|
| 10.1-03 | Minimal net node reduction due to integration overhead | Removed 10 nodes but added 9 integration nodes; value is complexity reduction |
|
|
| 10.1-04 | Return confirm_update action to main workflow | Update flow tightly integrated with existing update sub-workflow |
|
|
| 10.1-04 | Call n8n-actions.json for stop execution | Reuse existing action execution instead of duplicating Docker API calls |
|
|
| 10.1-06 | Downstream nodes reference original parse nodes for action type | Sub-workflow doesn't carry user's requested action (stop/start) through return data |
|
|
| 10.1-06 | Text-mode status needs keyboard strip + messageId routing | Pre-existing bug exposed by testing; text commands have no message to edit |
|
|
| 10.1-06 | Batch text needs Prepare Batch Execution transform | Sub-workflow returns matchedContainers/batch_matched, downstream expects allMatched/stop |
|
|
| 10.1-07 | No further Code node extraction viable | 2 candidates yield net-negative extraction (-50% efficiency) |
|
|
| 10.1-07 | 168 nodes is near-minimal (structural minimum: 166) | Evidence-based analysis of all 168 nodes by category |
|
|
| 10.1-07 | 115-125 target was unrealistic | Based on incomplete extraction overhead analysis |
|
|
| 10.1-08 | Status code checks before message-based fallback | Explicit HTTP response handling before message parsing |
|
|
| 10.1-08 | HTTP 304 treated as success | Docker API returns 304 for already-in-state, better UX than error |
|
|
| 10.1-09 | /list command as alias for status | Status command already provides list functionality; alias simpler than duplication |
|
|
| 10.1-09 | Dynamic predecessor reference pattern | Use $input.item.json for nodes with multiple incoming paths |
|
|
- [Phase 10.2-03]: n8n workflow static data does NOT persist between executions (critical platform limitation)
|
|
- [Phase 10.2-03]: Ring buffer + debug commands architecture non-functional due to static data limitation
|
|
- [Phase 10.2-03]: Stripped all static-data-dependent features, kept correlation IDs + structured error returns
|
|
- [Phase 10.2-02]: Correlation ID uses timestamp + random string (no UUID dependency)
|
|
- [Phase 10.2-02]: Use $input.item.json.correlationId pattern for Prepare Input nodes
|
|
- [Phase 10.2-04]: Fixed connection keys to use node names per n8n resolution protocol
|
|
- [Phase 10.2-04]: Accepted debug/errors routing behavior as minor (commands removed, no real users)
|
|
- [Phase 10.2-04]: Final state 168 nodes (includes 2 correlation ID generators, 2 orphans removed)
|
|
- [Phase 10.2-04]: Fixed connection keys to use node names per n8n resolution protocol
|
|
- [Phase 10.2-04]: Accepted debug/errors routing behavior as minor (commands removed, no real users)
|
|
- [Phase 11-01]: Use base36 BigInt encoding for bitmaps (supports 50+ containers, max ~20 bytes callback size)
|
|
- [Phase 11-01]: Retain old batch parsers for graceful migration of in-flight messages (<1 minute window)
|
|
- [Quick 1-1]: Removed 6 orphan callback nodes (no incoming connections after Phase 10 modularization)
|
|
- [Quick 1-1]: Achieved structural minimum of 166 nodes (per Phase 10.1-07 analysis)
|
|
- [Phase 12-01]: Document Unraid badge limitation instead of programmatic fix (Unraid API integration adds complexity for cosmetic issue)
|
|
- [Phase 13-01]: Remove DEPLOYMENT_GUIDE.md instead of updating (outdated Phase 10-05 content, fully superseded by ARCHITECTURE.md)
|
|
- [Phase 13-01]: Separate Configuration from Installation in README (installation should be linear and action-only)
|
|
|
|
## Phase 10.1 Progress
|
|
|
|
| Plan | Description | Status |
|
|
|------|-------------|--------|
|
|
| 10.1-01 | Foundation and Domain Analysis | Complete |
|
|
| 10.1-02 | Batch UI Sub-workflow (Wave 2) | Complete |
|
|
| 10.1-03 | Container Status Sub-workflow (Wave 2) | Complete |
|
|
| 10.1-04 | Confirmation Sub-workflow (Wave 3) | Complete |
|
|
| 10.1-05 | Integration Verification | Complete |
|
|
| 10.1-06 | Matching Sub-workflow Extraction | Complete |
|
|
| 10.1-07 | Code Classification + Contract Documentation | Complete |
|
|
| 10.1-08 | UAT Gap Closure: Container Action Status Codes | Complete |
|
|
| 10.1-09 | UAT Gap Closure: Data Flow Fixes | Complete |
|
|
|
|
**Node count progress:**
|
|
- Start: 192 nodes
|
|
- After 10.1-02: 179 nodes (-13)
|
|
- After 10.1-03: 178 nodes (-1)
|
|
- After 10.1-04: 168 nodes (-10)
|
|
- After 10.1-06: 168 nodes (net 0: -12 extracted, +9 integration, +3 fix nodes)
|
|
- Final: 168 nodes (structural minimum: 166, gap: 2 non-viable candidates)
|
|
|
|
**Extraction complete:**
|
|
- Batch UI: -13 nodes (16 nodes in sub-workflow)
|
|
- Container Status: -1 net (11 nodes in sub-workflow, complexity reduction)
|
|
- Confirmation: -10 nodes (16 nodes in sub-workflow)
|
|
- Matching: net 0 (23 nodes in sub-workflow, complexity reduction)
|
|
- Total reduction: 24 nodes (192 -> 168, -12.5%)
|
|
|
|
## Phase 10.1 Sub-workflows
|
|
|
|
All 7 sub-workflows deployed and operational:
|
|
- n8n-update.json -- `7AvTzLtKXM2hZTio92_mC`
|
|
- n8n-actions.json -- `fYSZS5PkH0VSEaT5`
|
|
- n8n-logs.json -- `oE7aO2GhbksXDEIw`
|
|
- n8n-batch-ui.json -- `ZJhnGzJT26UUmW45`
|
|
- n8n-status.json -- `lqpg2CqesnKE2RJQ`
|
|
- n8n-confirmation.json -- `fZ1hu8eiovkCk08G`
|
|
- n8n-matching.json -- `kL4BoI8ITSP9Oxek`
|
|
|
|
## Phase 10.2 Progress
|
|
|
|
| Plan | Description | Status |
|
|
|------|-------------|--------|
|
|
| 10.2-01 | Error Ring Buffer Foundation and Hidden Debug Commands | Complete (infrastructure later removed) |
|
|
| 10.2-02 | Wire Error Logging to Main Workflow | Complete (error logging removed, correlation IDs kept) |
|
|
| 10.2-03 | Add Debug Tracing to Sub-workflow Boundaries | Complete (scope reduced due to static data limitation) |
|
|
| 10.2-04 | Gap Closure: Correlation ID Wiring | Complete (UAT gaps 1-3 closed) |
|
|
|
|
**Critical Finding:**
|
|
- **n8n workflow static data does NOT persist between executions** (execution-scoped, not workflow-scoped)
|
|
- Ring buffer + debug command architecture non-functional due to this limitation
|
|
- All static-data-dependent features stripped in Plan 03 cleanup
|
|
|
|
**Achievements (10.2-01):** [REMOVED in 10.2-03 cleanup]
|
|
- Ring buffer infrastructure (non-functional - static data doesn't persist)
|
|
- 4 hidden debug commands (removed)
|
|
- Log Error and Log Trace utility nodes (removed)
|
|
|
|
**Achievements (10.2-02):** [PARTIALLY RETAINED]
|
|
- Structured error returns in all 7 sub-workflows (KEPT - success/error fields)
|
|
- Correlation ID generation for text and callback paths (KEPT - 2 nodes)
|
|
- 19 Prepare Input nodes modified to pass correlationId (KEPT)
|
|
- Error detection IF nodes (REMOVED - depended on static data logging)
|
|
|
|
**Final State (10.2-04):**
|
|
- Main workflow: 168 nodes (includes 2 correlation ID generators, 2 orphans removed)
|
|
- Correlation ID infrastructure wired and functional (text + callback paths)
|
|
- Correlation IDs flow to all sub-workflows via Prepare Input nodes
|
|
- Structured error returns in all sub-workflows (enables better error handling)
|
|
- All static-data-dependent features removed cleanly
|
|
- UAT gaps 1-3 closed (correlation ID wiring), gap 4 accepted as minor
|
|
- No regression to bot functionality
|
|
|
|
## Phase 11 Progress
|
|
|
|
| Plan | Description | Status |
|
|
|------|-------------|--------|
|
|
| 11-01 | Bitmap encoding for batch selection | Complete |
|
|
| 11-02 | Update All button with confirmation | Complete |
|
|
|
|
**Achievements (11-01):**
|
|
- Bitmap-encoded batch selection eliminates 64-byte Telegram callback limit
|
|
- Supports unlimited container selection (max ~20 bytes for 50+ containers)
|
|
- Base36 BigInt encoding: `b:0:1a3:5` vs old CSV `batch:toggle:0:plex,sonarr:jellyfin`
|
|
- Graceful migration: old parsers retained as fallback for in-flight messages
|
|
- Batch stop confirmation works with bitmap via resolution flow
|
|
- n8n-batch-ui.json: 17 nodes (16 + 1 Fetch Containers For Exec)
|
|
- n8n-workflow.json: 166 nodes (structural minimum achieved)
|
|
|
|
## Phase 12 Progress
|
|
|
|
| Plan | Description | Status |
|
|
|------|-------------|--------|
|
|
| 12-01 | Documentation audit (ENV-01, ENV-02, DEBT-01, DEBT-02, UNR-01) | Complete |
|
|
| 12-02 | Deferred UAT: BATCH-04 + BATCH-05 (9 bug fixes) | Complete |
|
|
|
|
**Achievements (12-02):**
|
|
- BATCH-04 (text "update all") passed end-to-end UAT
|
|
- BATCH-05 (inline keyboard "Update All :latest") passed end-to-end UAT
|
|
- 9 bugs discovered and fixed during UAT (data chains, format mismatches, infra exclusion)
|
|
- Infrastructure container exclusion added (n8n, socket-proxy) — prevents bot self-destruction
|
|
- Batch responseMode added to update sub-workflow — suppresses per-container Telegram messages
|
|
- Dynamic edit/send endpoint for confirmation (editMessageText for keyboard, sendMessage for text)
|
|
- All v1.2 requirements now closed (12/12)
|
|
|
|
**Achievements (12-01):**
|
|
- README updated to document docker-socket-proxy architecture (not direct socket mount)
|
|
- Clarified TELEGRAM_BOT_TOKEN requires both n8n credential AND environment variable
|
|
- Clarified user ID is hardcoded in IF nodes (no TELEGRAM_USERID env var)
|
|
- Documented all 8 workflow files (main + 7 sub-workflows) in installation section
|
|
- Added missing commands to usage table: `update all` and `/list` alias
|
|
- Verified DEBT-02 is fixed: single --max-time 600 flag, no duplicates
|
|
- Documented Unraid update badge limitation (UNR-01) with root cause and workaround
|
|
- Closed 4 requirements: ENV-01, ENV-02, DEBT-01, DEBT-02
|
|
- Resolved UNR-01 as documented limitation (not a fix, but closed)
|
|
|
|
## Phase 13 Progress
|
|
|
|
| Plan | Description | Status |
|
|
|------|-------------|--------|
|
|
| 13-01 | README overhaul with architecture, configuration, troubleshooting | Complete |
|
|
|
|
**Achievements (13-01):**
|
|
- README expanded from 139 to 264 lines with dedicated Architecture, Configuration, and Troubleshooting sections
|
|
- Documented batch selection workflow (toggle checkmarks, multi-select UI in inline keyboard)
|
|
- Documented "Update All :latest" button location and usage in inline keyboard
|
|
- Separated configuration from installation (installation now linear and action-only)
|
|
- Added 5 common troubleshooting scenarios with fixes
|
|
- Removed outdated DEPLOYMENT_GUIDE.md (Phase 10-05, 3 sub-workflows, 199 nodes)
|
|
- Consolidated to single technical reference: ARCHITECTURE.md (725 lines, 7 sub-workflows, 166 nodes)
|
|
- All v1.2 features now documented (batch ops, update all, inline keyboard, 7 sub-workflows)
|
|
|
|
## Quick Tasks Completed
|
|
|
|
| Task | Description | Status | Date | Node Impact |
|
|
|------|-------------|--------|------|-------------|
|
|
| quick-1-1 | Remove orphan callback node chain | Complete | 2026-02-08 | 172→166 nodes |
|
|
|
|
## Next Step
|
|
|
|
Phase 13 complete (documentation overhaul). v1.2 milestone 100% COMPLETE. All requirements closed.
|
|
|
|
## Session Continuity
|
|
|
|
Last session: 2026-02-08
|
|
Stopped at: Completed 13-01 (README overhaul: architecture/config/troubleshooting, removed DEPLOYMENT_GUIDE.md)
|
|
Resume file: None
|
|
|
|
---
|
|
*Auto-maintained by GSD workflow*
|