docs: rename DEPLOY-SUBWORKFLOWS.md to ARCHITECTURE.md and rewrite

Restructured as a proper technical architecture document:
- Added Observability section (correlation IDs, structured errors, debugging)
- Reorganized into logical flow: overview, request flow, contracts, internals
- Removed stale rollback/backup references
- Updated all references in README, CLAUDE.md, PROJECT.md, STATE.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Lucas Berger
2026-02-08 19:15:56 -05:00
parent e7eadd088c
commit 328442554c
6 changed files with 428 additions and 732 deletions
+421
View File
@@ -0,0 +1,421 @@
# Architecture
Technical reference for the Unraid Docker Manager workflow system.
## System Overview
The bot is an n8n workflow that receives Telegram messages, routes them through authentication and keyword matching, dispatches to domain-specific sub-workflows, and sends responses back to the user.
```
Telegram
|
v
n8n-workflow.json (166 nodes)
|-- Auth check (Telegram user ID)
|-- Correlation ID generation
|-- Text path: Keyword Router -> Parse Command -> Match Container -> Execute
|-- Callback path: Parse Callback Data -> Route -> Execute
|
|-- n8n-update.json (34 nodes) Container image pull + recreate
|-- n8n-actions.json (11 nodes) Start / stop / restart
|-- n8n-logs.json (9 nodes) Log retrieval + formatting
|-- n8n-batch-ui.json (17 nodes) Batch selection keyboard UI
|-- n8n-status.json (11 nodes) Container list + status display
|-- n8n-confirmation.json (16 nodes) Confirmation dialogs
|-- n8n-matching.json (23 nodes) Container name matching
|
v
docker-socket-proxy (tecnativa/docker-socket-proxy)
|
v
Docker Engine
```
**Total:** 287 nodes (166 main + 121 across 7 sub-workflows)
## Workflow Files
| File | n8n ID | Purpose | Nodes |
|------|--------|---------|-------|
| n8n-workflow.json | `HmiXBlJefBRPMS0m4iNYc` | Main orchestrator | 166 |
| n8n-update.json | `7AvTzLtKXM2hZTio92_mC` | Container update (pull, recreate, cleanup) | 34 |
| n8n-actions.json | `fYSZS5PkH0VSEaT5` | Container start/stop/restart | 11 |
| n8n-logs.json | `oE7aO2GhbksXDEIw` | Container log retrieval | 9 |
| n8n-batch-ui.json | `ZJhnGzJT26UUmW45` | Batch selection keyboard | 17 |
| n8n-status.json | `lqpg2CqesnKE2RJQ` | Container list and status | 11 |
| n8n-confirmation.json | `fZ1hu8eiovkCk08G` | Confirmation dialogs | 16 |
| n8n-matching.json | `kL4BoI8ITSP9Oxek` | Container name matching | 23 |
## Request Flow
### Text Commands
1. Telegram Trigger receives message
2. Auth IF node checks user ID
3. Correlation ID generator creates a unique request trace ID (`timestamp-random`)
4. Keyword Router (Switch node) matches command keyword
5. Parse Command (Code node) extracts parameters
6. Matching sub-workflow resolves container name to Docker ID
7. Domain sub-workflow executes the operation
8. Result routed to Telegram response node
### Callback (Inline Keyboard)
1. Telegram Trigger receives callback query
2. Auth IF node checks user ID
3. Correlation ID generator creates a unique request trace ID
4. Parse Callback Data (Code node, 441 lines) decodes callback string
5. Route Callback (Switch node) dispatches by prefix
6. Domain sub-workflow executes the operation
7. Result routed to Telegram response node (editMessageText)
## Observability
### Correlation IDs
Every request gets a unique correlation ID generated at the entry point of the main workflow. This ID flows through all sub-workflow calls via Prepare Input nodes, enabling request tracing across workflow boundaries in the n8n execution log.
**How it works:**
- Two generator nodes in the main workflow: one for the text path, one for the callback path
- Format: `timestamp-randomString` (no external dependencies)
- All 19 Prepare Input nodes include `correlationId: $json.correlationId` in their output
- Sub-workflows receive the ID as an input field and can reference it in their logs
**Where to find it:** Open any execution in the n8n UI, inspect the output of a Prepare Input node — the `correlationId` field traces back to the original user request.
**Limitations:** Correlation IDs are only visible in the n8n execution log. There is no persistent storage or user-facing output. n8n's workflow static data is execution-scoped (not workflow-scoped), so ring buffers and cross-execution logging are not possible on this platform.
### Structured Error Returns
All 7 sub-workflows return structured error objects on failure:
```json
{
"success": false,
"error": "Container not found: foo",
"correlationId": "1707400000-abc123"
}
```
This provides a consistent error shape for the main workflow to route and format error messages to the user.
### Debugging a Request
1. Open the n8n execution list for the main workflow
2. Find the execution by timestamp or Telegram message content
3. Check the Prepare Input node output for the `correlationId`
4. Search sub-workflow executions for the same `correlationId`
5. Trace the full request path: main workflow -> sub-workflow -> Docker API -> response
## Sub-workflow Contracts
Each sub-workflow has a defined input/output contract. The main workflow communicates with sub-workflows through Prepare Input (Code) nodes that build the input object, and Route Result (Code) nodes that interpret the response.
### n8n-update.json (Container Update)
**Input:**
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| containerId | string | yes* | Docker container ID (empty string to resolve by name) |
| containerName | string | yes | Container display name |
| chatId | number | yes | Telegram chat ID |
| messageId | number | yes | Telegram message ID (0 for text mode) |
| responseMode | string | yes | `"text"`, `"inline"`, or `"batch"` |
| correlationId | string | no | Request trace ID |
*containerId can be empty — the sub-workflow resolves by name via its Resolve Container ID node.
**Output:**
| Outcome | Fields |
|---------|--------|
| Updated | `success: true, updated: true, message, oldDigest, newDigest` |
| No update needed | `success: true, updated: false, message` |
| Error | `success: false, updated: false, message` |
**Callers:** Execute Text Update, Execute Callback Update, Execute Batch Update
---
### n8n-actions.json (Container Actions)
**Input:**
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| containerId | string | yes | Docker container ID |
| containerName | string | yes | Container display name |
| action | string | yes | `"start"`, `"stop"`, or `"restart"` |
| chatId | number | yes | Telegram chat ID |
| messageId | number | yes | Telegram message ID (0 for text mode) |
| responseMode | string | yes | `"text"`, `"inline"`, or `"batch"` |
| correlationId | string | no | Request trace ID |
**Output:** `success, message, action, containerName, containerId, chatId, messageId, responseMode`
HTTP status codes are checked before message content: 204 = success, 304 = already in state (treated as success), others = error.
**Callers:** Execute Container Action, Execute Inline Action, Execute Batch Action Sub-workflow
---
### n8n-logs.json (Container Logs)
**Input:**
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| containerName | string | yes* | Container name for lookup |
| containerId | string | no | Docker container ID (optional) |
| lineCount | number | no | Lines to retrieve (default: 50, max: 1000) |
| chatId | number | yes | Telegram chat ID |
| messageId | number | no | Telegram message ID (default: 0) |
| responseMode | string | no | `"text"` or `"inline"` (default: "text") |
| correlationId | string | no | Request trace ID |
*Either containerId or containerName is required.
**Output:** `success: true, message, containerName, lineCount`
Errors (container not found, Docker error) throw exceptions handled by n8n's error system.
**Callers:** Execute Text Logs, Execute Inline Logs
---
### n8n-batch-ui.json (Batch Selection UI)
**Input:**
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| chatId | number | yes | Telegram chat ID |
| messageId | number | yes | Telegram message ID |
| callbackData | string | yes | Raw callback data string |
| queryId | string | yes | Telegram callback query ID |
| action | string | yes | `"mode"`, `"toggle"`, `"nav"`, `"exec"`, `"clear"`, `"cancel"` |
| batchPage | number | no | Current page number (default: 0) |
| selectedCsv | string | no | Comma-separated selected container names |
| toggleName | string | no | Container name being toggled |
| batchAction | string | no | Action for batch execution (stop/restart/update) |
| correlationId | string | no | Request trace ID |
**Output:**
| action | Description |
|--------|-------------|
| `"keyboard"` | Rendered selection keyboard with checkmarks |
| `"execute"` | User confirmed — includes containerNames and batchAction |
| `"cancel"` | User cancelled batch selection |
Batch selection uses bitmap encoding (base36 BigInt) to fit container selections within Telegram's 64-byte callback data limit.
**Callers:** Execute Batch UI
---
### n8n-status.json (Container Status/List)
**Input:**
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| chatId | number | yes | Telegram chat ID |
| messageId | number | yes | Telegram message ID |
| action | string | yes | `"list"`, `"status"`, `"paginate"` |
| containerId | string | no | Docker container ID (for status lookup) |
| containerName | string | no | Container name (for status lookup) |
| page | number | no | Page number for pagination (default: 0) |
| queryId | string | no | Telegram callback query ID |
| searchTerm | string | no | Container name search term |
| correlationId | string | no | Request trace ID |
**Output:**
| action | Description |
|--------|-------------|
| `"list"` | Container list with pagination keyboard |
| `"status"` | Single container detail with action buttons |
| `"paginate"` | Updated page of container list |
The container list keyboard includes an "Update All :latest" button after the navigation row.
**Callers:** Execute Container Status, Execute Select Status, Execute Paginate Status, Execute Batch Cancel Status
---
### n8n-confirmation.json (Confirmation Dialogs)
**Input:**
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| chatId | number | yes | Telegram chat ID |
| messageId | number | yes | Telegram message ID |
| action | string | yes | `"show_stop"`, `"show_update"`, `"confirm"`, `"cancel"`, `"expired"` |
| containerId | string | no | Docker container ID |
| containerName | string | yes | Container display name |
| confirmAction | string | no | `"stop"` or `"update"` (for confirm action) |
| confirmationToken | string | no | Timestamp token for 30-second expiry check |
| expired | boolean | no | Whether confirmation has expired |
| responseMode | string | no | `"inline"` (default) |
| correlationId | string | no | Request trace ID |
This sub-workflow internally calls n8n-actions.json for confirmed stop actions.
**Output:**
| action | Description |
|--------|-------------|
| `"show_stop"` | Stop confirmation dialog rendered |
| `"show_update"` | Update confirmation dialog rendered |
| `"confirm_stop_result"` | Stop executed, result returned |
| `"confirm_update"` | Update confirmed, containerId/name returned for update sub-workflow |
| `"cancel"` | Confirmation cancelled |
| `"expired"` | Confirmation token expired (30-second timeout) |
**Callers:** Execute Confirmation (fed by 3 Prepare Input nodes: Prepare Confirm Input, Prepare Show Stop Input, Prepare Show Update Input)
---
### n8n-matching.json (Container Matching/Disambiguation)
**Input:**
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| action | string | yes | `"match_action"`, `"match_update"`, or `"match_batch"` |
| containerList | string | yes | Raw Docker API JSON output (container list) |
| searchTerm | string | yes | Container name query to match |
| selectedContainers | string | no | Comma-separated names for batch matching |
| chatId | number | yes | Telegram chat ID |
| messageId | number | yes | Telegram message ID |
| correlationId | string | no | Request trace ID |
Matching priority: exact match > single substring match > multiple matches (disambiguation) > no match (suggestion keyboard).
**Output (action match):**
| action | Description |
|--------|-------------|
| `"matched"` | Single container matched — includes containerId, containerName |
| `"multiple"` | Multiple matches — includes matches array for disambiguation |
| `"no_match"` | No match found |
| `"suggestion"` | Suggestion keyboard with close matches |
| `"error"` | Matching error |
**Output (update match):** Same as action match with `_update` suffix on action values.
**Output (batch match):**
| action | Description |
|--------|-------------|
| `"batch_matched"` | All names resolved — includes matchedContainers array |
| `"disambiguation"` | Some names ambiguous — disambiguation keyboard |
| `"not_found"` | Some names not found |
**Callers:** Execute Action Match, Execute Update Match, Execute Batch Match
## Execute Workflow Node Map
17 Execute Workflow nodes in the main workflow dispatch to 7 sub-workflows:
| Target | Execute Nodes |
|--------|---------------|
| n8n-update.json | Execute Text Update, Execute Callback Update, Execute Batch Update |
| n8n-actions.json | Execute Container Action, Execute Inline Action, Execute Batch Action Sub-workflow |
| n8n-logs.json | Execute Text Logs, Execute Inline Logs |
| n8n-batch-ui.json | Execute Batch UI |
| n8n-status.json | Execute Container Status, Execute Select Status, Execute Paginate Status, Execute Batch Cancel Status |
| n8n-confirmation.json | Execute Confirmation |
| n8n-matching.json | Execute Action Match, Execute Update Match, Execute Batch Match |
## Main Workflow Internals
### Node Breakdown (166 nodes)
| Category | Count | Purpose |
|----------|-------|---------|
| Code | 60 | Command parsing, input preparation, result routing, response building, batch orchestration |
| HTTP Request | 40 | Docker API and Telegram API calls |
| Telegram | 23 | User-facing response nodes |
| Execute Workflow | 17 | Sub-workflow dispatch |
| Switch | 13 | Keyword Router, Route Callback, result routing |
| If | 8 | Auth checks, batch completion, expiry, status routing |
| Execute Command | 6 | Docker CLI (container list, exec) |
| Telegram Trigger | 1 | Entry point |
### Code Node Categories
The 60 Code nodes break down into orchestration categories. All are infrastructure — none contain extractable domain logic:
| Category | Count | Role |
|----------|-------|------|
| prepare-input | 27 | Build input objects for sub-workflow calls |
| route-result | 12 | Interpret sub-workflow responses for routing |
| build-response | 8 | Build Telegram messages and keyboards |
| orchestration | 6 | Batch loop control and state management |
| parse-command | 5 | Parse text commands into structured data |
| domain-logic | 2 | Legacy candidates (net-negative extraction) |
### Callback Data Encoding
Telegram limits callback data to 64 bytes. The system uses two encoding schemes:
**Colon-delimited** for single operations: `s:containerId` (status), `stop:containerId` (action), `cfm:stop:containerId:token` (confirmation)
**Bitmap encoding** for batch selection: `b:0:1a3:5` where the middle segment is a base36 BigInt representing selected container indices. Supports 50+ containers within the 64-byte limit.
Legacy parsers (`batch:toggle:`, `batch:nav:`, `batch:exec:`) are retained for graceful migration of in-flight messages.
## Deployment
### Redeploying After Changes
1. Import the modified sub-workflow JSON into n8n
2. If main workflow changed, re-import n8n-workflow.json
3. Activate the workflow
Workflow IDs are stable — n8n preserves them across re-imports of the same workflow.
### Execute Workflow Node Format
All Execute Workflow nodes use typeVersion 1.2:
```json
"workflowId": { "__rl": true, "mode": "list", "value": "<id>" }
```
### Testing Checklist
After deployment, verify:
- [ ] `status` — Shows container list with pagination
- [ ] Tap container — Shows detail with action buttons
- [ ] `stop <name>` — Confirmation dialog, confirm executes stop
- [ ] `update <name>` — Confirmation dialog, confirm pulls image + recreates
- [ ] `restart <name>` — Immediate restart
- [ ] `logs <name>` — Shows last 50 lines
- [ ] `stop plex sonarr` — Batch selection keyboard
- [ ] Select multiple, execute — Batch processes all selected
- [ ] `update all` — Lists :latest containers, confirm updates all
- [ ] Update All button in keyboard — Same flow via inline keyboard
## Known Limitations
### Unraid Update Badges
After the bot updates a container, Unraid's Docker tab may show "apply update" on the next check. The bot uses Docker API directly while Unraid tracks containers through its XML template system — it doesn't know the container was updated externally.
**Resolution:** Click "Apply Update" in Unraid. It completes instantly since the image is already cached.
**Why not automated:** Clearing the badge would require calling Unraid's internal web API (authentication, template parsing) for a cosmetic issue that takes one click.
### n8n Static Data
n8n workflow static data (`$getWorkflowStaticData('global')`) is execution-scoped, not workflow-scoped. Data written in one execution is not available in the next. This prevents persistent cross-execution features like error ring buffers or debug command history.
### Orphan Nodes
3 legacy Code nodes remain in the main workflow (Build Action Command, Build Immediate Action Command, Prepare Cancel Return). They are unreachable dead code from pre-modularization inline action paths. They have no incoming connections and do not affect functionality.