docs(10.2): create phase plan

2026-02-08 12:32:35 -05:00
parent 0ef36ab4c8
commit e9a84794b9
4 changed files with 737 additions and 5 deletions
@@ -0,0 +1,242 @@
+---
+phase: 10.2-better-logging-and-log-management
+plan: 02
+type: execute
+wave: 2
+depends_on:
+  - "10.2-01"
+files_modified:
+  - n8n-workflow.json
+  - n8n-actions.json
+  - n8n-update.json
+  - n8n-logs.json
+  - n8n-batch-ui.json
+  - n8n-status.json
+  - n8n-confirmation.json
+  - n8n-matching.json
+autonomous: true
+
+must_haves:
+  truths:
+    - "All 7 sub-workflows return structured error objects when failures occur"
+    - "Errors from sub-workflow failures are captured in the main workflow's ring buffer"
+    - "User sees friendly error messages inline (summary + cause) when actions fail"
+    - "Error entries include sub-workflow name, node, HTTP code, and raw response"
+    - "Correlation IDs propagate from main workflow through sub-workflow calls"
+  artifacts:
+    - path: "n8n-actions.json"
+      provides: "Structured error return on Docker API failure"
+      contains: "success"
+    - path: "n8n-update.json"
+      provides: "Structured error return on update failure"
+      contains: "success"
+    - path: "n8n-logs.json"
+      provides: "Structured error return on log retrieval failure"
+      contains: "success"
+    - path: "n8n-batch-ui.json"
+      provides: "Structured error return on batch UI failure"
+      contains: "success"
+    - path: "n8n-status.json"
+      provides: "Structured error return on status query failure"
+      contains: "success"
+    - path: "n8n-confirmation.json"
+      provides: "Structured error return on confirmation failure"
+      contains: "success"
+    - path: "n8n-matching.json"
+      provides: "Structured error return on matching failure"
+      contains: "success"
+    - path: "n8n-workflow.json"
+      provides: "Error capture after each Execute Workflow node, correlation ID generation"
+      contains: "Log Error"
+  key_links:
+    - from: "Execute Container Action (main)"
+      to: "Log Error"
+      via: "IF check on success field, false path to Log Error"
+      pattern: "success.*false"
+    - from: "sub-workflow error return"
+      to: "main workflow ring buffer"
+      via: "structured return value with success:false and error object"
+      pattern: "success.*false.*error"
+---
+
+<objective>
+Wire error propagation from all 7 sub-workflows to the main workflow's centralized error ring buffer.
+
+Purpose: Make every sub-workflow failure automatically captured with full diagnostic context (workflow name, node, HTTP code, raw response, sub-workflow I/O boundaries) in the ring buffer. Users see friendly error messages; Claude gets queryable diagnostic data via /errors command.
+
+Output: All 8 workflow JSON files modified. Sub-workflows return structured errors. Main workflow captures errors via Log Error node from Plan 01.
+</objective>
+
+<execution_context>
+@/home/luc/.claude/get-shit-done/workflows/execute-plan.md
+@/home/luc/.claude/get-shit-done/templates/summary.md
+</execution_context>
+
+<context>
+@.planning/STATE.md
+@.planning/ROADMAP.md
+@n8n-workflow.json
+@n8n-actions.json
+@n8n-update.json
+@n8n-logs.json
+@n8n-batch-ui.json
+@n8n-status.json
+@n8n-confirmation.json
+@n8n-matching.json
+@DEPLOY-SUBWORKFLOWS.md
+@.planning/phases/10.2-better-logging-and-log-management/10.2-CONTEXT.md
+@.planning/phases/10.2-better-logging-and-log-management/10.2-RESEARCH.md
+@.planning/phases/10.2-better-logging-and-log-management/10.2-01-SUMMARY.md
+</context>
+
+<tasks>
+
+<task type="auto">
+  <name>Task 1: Add structured error returns to all 7 sub-workflows</name>
+  <files>n8n-actions.json, n8n-update.json, n8n-logs.json, n8n-batch-ui.json, n8n-status.json, n8n-confirmation.json, n8n-matching.json</files>
+  <action>
+For each of the 7 sub-workflows, audit the existing error handling paths and ensure they return a standardized error object. The goal is NOT to change how errors are currently handled (many sub-workflows already have error paths), but to AUGMENT the return data with a consistent structure that the main workflow can detect and log.
+
+**Standard error return format** (add to existing error paths):
+```javascript
+{
+  success: false,
+  action: "<existing-action-value>",  // Preserve existing action field for routing
+  error: {
+    workflow: "<sub-workflow-name>",
+    node: "<node-that-failed>",
+    message: "<human-readable-error>",
+    httpCode: <http-status-or-null>,
+    rawResponse: "<truncated-raw-response>"
+  },
+  // ... preserve all existing return fields for backward compatibility
+}
+```
+
+**Critical rule: PRESERVE BACKWARD COMPATIBILITY.** Existing return fields (action, text, chatId, messageId, keyboard, etc.) MUST remain unchanged. The `success` and `error` fields are ADDITIONS to the existing return objects.
+
+**Per sub-workflow audit and modifications:**
+
+1. **n8n-actions.json** (11 nodes): Already has error handling in Format Action Result nodes. Add `success: true` to success paths and `success: false` + `error` object to failure paths in the Format Action Result Code node. The existing `statusCode` checks (304, 404, 500+) should populate `error.httpCode`.
+
+2. **n8n-update.json** (34 nodes): Has multiple error paths (pull error, create error, start error). Each error Code node (Format Pull Error, etc.) already returns `success: false`. Ensure each also includes an `error` object with `{ workflow: 'n8n-update', node, message, httpCode, rawResponse }`.
+
+3. **n8n-logs.json** (9 nodes): Has error handling for container not found and log retrieval failures. Add `success` field to all return paths and `error` object to failure paths.
+
+4. **n8n-batch-ui.json** (16 nodes): Has error handling for invalid state. Add `success` field to all return paths.
+
+5. **n8n-status.json** (11 nodes): Has error handling for Docker query failures. Add `success` field to return paths.
+
+6. **n8n-confirmation.json** (16 nodes): Has error paths for expired/invalid tokens. Add `success` field to return paths. Note: expired/cancel are NOT errors -- they are expected flows. Only add `success: false` for actual failures (Docker API errors in the stop execution path).
+
+7. **n8n-matching.json** (23 nodes): Has `no_match` and `error` action returns. `no_match` is NOT an error. Only the `error` action path should include `success: false` with error details.
+
+**Also add `correlationId` pass-through:** Each sub-workflow already accepts input parameters. Add `correlationId` to the "When executed by another workflow" trigger node's expected fields (it will be passed but ignored if not present -- n8n handles extra fields gracefully). In error return paths, include `correlationId: $('When executed by another workflow').item.json.correlationId || ''` so the main workflow can correlate errors.
+
+**Implementation approach:** Read each sub-workflow JSON, identify Code nodes on error paths, modify their jsCode to include the standardized fields. Do NOT add new nodes to sub-workflows -- modify existing Code node outputs.
+  </action>
+  <verify>
+    - For each sub-workflow JSON, parse and verify:
+      1. At least one Code node contains `success: false` and `error:` in its jsCode
+      2. At least one Code node contains `success: true` in its jsCode
+      3. Error objects include `workflow:` field matching the sub-workflow name
+      4. `correlationId` appears in error return paths
+    - Spot-check n8n-actions.json and n8n-update.json Code nodes in detail since they have the most complex error paths
+  </verify>
+  <done>
+    All 7 sub-workflows return `success: true/false` on all paths. Failure paths include standardized `error` object with workflow name, node, message, httpCode, and rawResponse. Existing return fields preserved for backward compatibility. correlationId passed through on error returns.
+  </done>
+</task>
+
+<task type="auto">
+  <name>Task 2: Add correlation ID generation and error capture to main workflow Execute Workflow paths</name>
+  <files>n8n-workflow.json</files>
+  <action>
+This task wires the error capture infrastructure in the main workflow.
+
+**Part A: Correlation ID generation**
+
+Add a new Code node: **Generate Correlation ID** (id: `code-generate-correlation-id`). Place it between the IF Authenticated node's true output and the Keyword Router, so every authenticated request gets a correlation ID.
+
+Implementation:
+```javascript
+const correlationId = `${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
+return {
+  json: {
+    ...$input.item.json,
+    correlationId
+  }
+};
+```
+
+Note: Use timestamp + random string instead of UUID (avoids `require('uuid')` dependency issues in n8n Code nodes). This generates sufficiently unique IDs for a single-user bot.
+
+Wire: IF Authenticated (true) -> Generate Correlation ID -> Keyword Router. This requires updating the connection from IF Authenticated to Keyword Router.
+
+Similarly, add correlation ID generation for the callback path. Add a new Code node: **Generate Callback Correlation ID** (id: `code-generate-callback-correlation-id`) between IF Callback Authenticated (true) and Parse Callback Data. Same implementation.
+
+**Part B: Pass correlation ID to sub-workflow calls**
+
+For each Execute Workflow node in the main workflow (there are ~17 of them per DEPLOY-SUBWORKFLOWS.md), ensure the `correlationId` field is passed as an input parameter. Since most Prepare Input Code nodes already construct the input object, add `correlationId: $('Generate Correlation ID').item.json.correlationId` (or `$('Generate Callback Correlation ID').item.json.correlationId` for callback-path nodes) to each Prepare Input node's return object.
+
+**Important data chain note:** Use `$input.item.json.correlationId` pattern for nodes with multiple predecessors (per 10.1-09 decision). For nodes with a single predecessor chain back to the correlation ID generation, reference the specific node.
+
+**Part C: Error capture after Execute Workflow nodes**
+
+For the highest-value Execute Workflow nodes (the ones most likely to fail), add error detection that routes to the Log Error node from Plan 01. The pattern:
+
+After each Execute Workflow node, the existing result-handling Code node or IF node should check the `success` field. If `success === false`, route a branch to Log Error.
+
+**Priority targets** (modify these first -- they handle Docker API calls that actually fail):
+1. After Execute Container Action (single text command path)
+2. After Execute Inline Action (callback action path)
+3. After Execute Text Update (text update path)
+4. After Execute Callback Update (callback update path)
+5. After Execute Text Logs (text logs path)
+6. After Execute Inline Logs (callback logs path)
+
+For each target:
+- If there's already a result-handling Code node after the Execute Workflow node, modify it to check `success === false` and, on the false path, route to Log Error with appropriate fields
+- If the existing flow doesn't have branching, add an IF node (Check {X} Success) after the Execute Workflow node that checks `{{ $json.success }}` equals `false`. The false path goes to Log Error, the true path continues the existing flow.
+- Log Error receives: `{ correlationId, workflow, node, operation, userMessage, errorMessage, httpCode, rawResponse, contextData, chatId, text }`
+- After Log Error, the flow continues to the existing Telegram error response (Log Error passes through data)
+
+**Minimize new nodes:** Where possible, modify existing result-handling Code nodes to include the error check rather than adding new IF nodes. Only add IF nodes where the existing flow has no branching capability.
+
+Estimated new nodes: 2 (correlation ID generators) + 0-4 (IF nodes for error detection, depending on existing flow structure). Target: +2 to +6 new nodes.
+  </action>
+  <verify>
+    - Parse `n8n-workflow.json` to verify:
+      1. Generate Correlation ID node exists and is wired between IF Authenticated and Keyword Router
+      2. Generate Callback Correlation ID node exists and is wired in callback path
+      3. At least 4 Prepare Input nodes include `correlationId` in their return objects
+      4. Log Error node (from Plan 01) has at least 2 incoming connections
+      5. Node count is within expected range (174-178)
+    - Verify connection integrity: no broken paths, all existing flows still connected
+  </verify>
+  <done>
+    Every authenticated request gets a correlation ID. Correlation IDs propagate to sub-workflow calls. At least 6 Execute Workflow result paths check for `success === false` and route to Log Error. Error entries appear in ring buffer with correlation IDs, sub-workflow names, and diagnostic data.
+  </done>
+</task>
+
+</tasks>
+
+<verification>
+1. All 7 sub-workflow JSON files contain `success: true/false` returns
+2. Main workflow has correlation ID generation on both text and callback paths
+3. Log Error node has incoming connections from error detection paths
+4. No broken connections in any workflow file (validate JSON structure)
+5. Existing functionality preserved (action routing, Telegram responses unchanged)
+6. `python3` JSON parse of all 8 files succeeds without errors
+</verification>
+
+<success_criteria>
+- Sub-workflow errors automatically captured in ring buffer with full diagnostic context
+- /errors command shows real errors from Docker API failures
+- Correlation IDs trace a single user request across main + sub-workflow boundaries
+- No regression to existing bot functionality (all action/update/status/logs flows work)
+</success_criteria>
+
+<output>
+After completion, create `.planning/phases/10.2-better-logging-and-log-management/10.2-02-SUMMARY.md`
+</output>