Files
unraid-docker-manager/.planning/phases/10.2-better-logging-and-log-management/10.2-02-PLAN.md
T
2026-02-08 18:56:44 -05:00

13 KiB

phase, plan, type, wave, depends_on, files_modified, autonomous, must_haves
phase plan type wave depends_on files_modified autonomous must_haves
10.2-better-logging-and-log-management 02 execute 2
10.2-01
n8n-workflow.json
n8n-actions.json
n8n-update.json
n8n-logs.json
n8n-batch-ui.json
n8n-status.json
n8n-confirmation.json
n8n-matching.json
true
truths artifacts key_links
All 7 sub-workflows return structured error objects when failures occur
Errors from sub-workflow failures are captured in the main workflow's ring buffer
User sees friendly error messages inline (summary + cause) when actions fail
Error entries include sub-workflow name, node, HTTP code, and raw response
Correlation IDs propagate from main workflow through sub-workflow calls
path provides contains
n8n-actions.json Structured error return on Docker API failure success
path provides contains
n8n-update.json Structured error return on update failure success
path provides contains
n8n-logs.json Structured error return on log retrieval failure success
path provides contains
n8n-batch-ui.json Structured error return on batch UI failure success
path provides contains
n8n-status.json Structured error return on status query failure success
path provides contains
n8n-confirmation.json Structured error return on confirmation failure success
path provides contains
n8n-matching.json Structured error return on matching failure success
path provides contains
n8n-workflow.json Error capture after each Execute Workflow node, correlation ID generation Log Error
from to via pattern
Execute Container Action (main) Log Error IF check on success field, false path to Log Error success.*false
from to via pattern
sub-workflow error return main workflow ring buffer structured return value with success:false and error object success.*false.*error
Wire error propagation from all 7 sub-workflows to the main workflow's centralized error ring buffer.

Purpose: Make every sub-workflow failure automatically captured with full diagnostic context (workflow name, node, HTTP code, raw response, sub-workflow I/O boundaries) in the ring buffer. Users see friendly error messages; Claude gets queryable diagnostic data via /errors command.

Output: All 8 workflow JSON files modified. Sub-workflows return structured errors. Main workflow captures errors via Log Error node from Plan 01.

<execution_context> @/home/luc/.claude/get-shit-done/workflows/execute-plan.md @/home/luc/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/STATE.md @.planning/ROADMAP.md @n8n-workflow.json @n8n-actions.json @n8n-update.json @n8n-logs.json @n8n-batch-ui.json @n8n-status.json @n8n-confirmation.json @n8n-matching.json @DEPLOY-SUBWORKFLOWS.md @.planning/phases/10.2-better-logging-and-log-management/10.2-CONTEXT.md @.planning/phases/10.2-better-logging-and-log-management/10.2-RESEARCH.md @.planning/phases/10.2-better-logging-and-log-management/10.2-01-SUMMARY.md Task 1: Add structured error returns to all 7 sub-workflows n8n-actions.json, n8n-update.json, n8n-logs.json, n8n-batch-ui.json, n8n-status.json, n8n-confirmation.json, n8n-matching.json For each of the 7 sub-workflows, audit the existing error handling paths and ensure they return a standardized error object. The goal is NOT to change how errors are currently handled (many sub-workflows already have error paths), but to AUGMENT the return data with a consistent structure that the main workflow can detect and log.

Standard error return format (add to existing error paths):

{
  success: false,
  action: "<existing-action-value>",  // Preserve existing action field for routing
  error: {
    workflow: "<sub-workflow-name>",
    node: "<node-that-failed>",
    message: "<human-readable-error>",
    httpCode: <http-status-or-null>,
    rawResponse: "<truncated-raw-response>"
  },
  // ... preserve all existing return fields for backward compatibility
}

Critical rule: PRESERVE BACKWARD COMPATIBILITY. Existing return fields (action, text, chatId, messageId, keyboard, etc.) MUST remain unchanged. The success and error fields are ADDITIONS to the existing return objects.

Per sub-workflow audit and modifications:

  1. n8n-actions.json (11 nodes): Already has error handling in Format Action Result nodes. Add success: true to success paths and success: false + error object to failure paths in the Format Action Result Code node. The existing statusCode checks (304, 404, 500+) should populate error.httpCode.

  2. n8n-update.json (34 nodes): Has multiple error paths (pull error, create error, start error). Each error Code node (Format Pull Error, etc.) already returns success: false. Ensure each also includes an error object with { workflow: 'n8n-update', node, message, httpCode, rawResponse }.

  3. n8n-logs.json (9 nodes): Has error handling for container not found and log retrieval failures. Add success field to all return paths and error object to failure paths.

  4. n8n-batch-ui.json (16 nodes): Has error handling for invalid state. Add success field to all return paths.

  5. n8n-status.json (11 nodes): Has error handling for Docker query failures. Add success field to return paths.

  6. n8n-confirmation.json (16 nodes): Has error paths for expired/invalid tokens. Add success field to return paths. Note: expired/cancel are NOT errors -- they are expected flows. Only add success: false for actual failures (Docker API errors in the stop execution path).

  7. n8n-matching.json (23 nodes): Has no_match and error action returns. no_match is NOT an error. Only the error action path should include success: false with error details.

Also add correlationId pass-through: Each sub-workflow already accepts input parameters. Add correlationId to the "When executed by another workflow" trigger node's expected fields (it will be passed but ignored if not present -- n8n handles extra fields gracefully). In error return paths, include correlationId: $('When executed by another workflow').item.json.correlationId || '' so the main workflow can correlate errors.

Implementation approach: Read each sub-workflow JSON, identify Code nodes on error paths, modify their jsCode to include the standardized fields. Do NOT add new nodes to sub-workflows -- modify existing Code node outputs. - For each sub-workflow JSON, parse and verify: 1. At least one Code node contains success: false and error: in its jsCode 2. At least one Code node contains success: true in its jsCode 3. Error objects include workflow: field matching the sub-workflow name 4. correlationId appears in error return paths - Spot-check n8n-actions.json and n8n-update.json Code nodes in detail since they have the most complex error paths All 7 sub-workflows return success: true/false on all paths. Failure paths include standardized error object with workflow name, node, message, httpCode, and rawResponse. Existing return fields preserved for backward compatibility. correlationId passed through on error returns.

Task 2: Add correlation ID generation and error capture to main workflow Execute Workflow paths n8n-workflow.json This task wires the error capture infrastructure in the main workflow.

Part A: Correlation ID generation

Add a new Code node: Generate Correlation ID (id: code-generate-correlation-id). Place it between the IF Authenticated node's true output and the Keyword Router, so every authenticated request gets a correlation ID.

Implementation:

const correlationId = `${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
return {
  json: {
    ...$input.item.json,
    correlationId
  }
};

Note: Use timestamp + random string instead of UUID (avoids require('uuid') dependency issues in n8n Code nodes). This generates sufficiently unique IDs for a single-user bot.

Wire: IF Authenticated (true) -> Generate Correlation ID -> Keyword Router. This requires updating the connection from IF Authenticated to Keyword Router.

Similarly, add correlation ID generation for the callback path. Add a new Code node: Generate Callback Correlation ID (id: code-generate-callback-correlation-id) between IF Callback Authenticated (true) and Parse Callback Data. Same implementation.

Part B: Pass correlation ID to sub-workflow calls

For each Execute Workflow node in the main workflow (there are ~17 of them per DEPLOY-SUBWORKFLOWS.md), ensure the correlationId field is passed as an input parameter. Since most Prepare Input Code nodes already construct the input object, add correlationId: $('Generate Correlation ID').item.json.correlationId (or $('Generate Callback Correlation ID').item.json.correlationId for callback-path nodes) to each Prepare Input node's return object.

Important data chain note: Use $input.item.json.correlationId pattern for nodes with multiple predecessors (per 10.1-09 decision). For nodes with a single predecessor chain back to the correlation ID generation, reference the specific node.

Part C: Error capture after Execute Workflow nodes

For the highest-value Execute Workflow nodes (the ones most likely to fail), add error detection that routes to the Log Error node from Plan 01. The pattern:

After each Execute Workflow node, the existing result-handling Code node or IF node should check the success field. If success === false, route a branch to Log Error.

Priority targets (modify these first -- they handle Docker API calls that actually fail):

  1. After Execute Container Action (single text command path)
  2. After Execute Inline Action (callback action path)
  3. After Execute Text Update (text update path)
  4. After Execute Callback Update (callback update path)
  5. After Execute Text Logs (text logs path)
  6. After Execute Inline Logs (callback logs path)

For each target:

  • If there's already a result-handling Code node after the Execute Workflow node, modify it to check success === false and, on the false path, route to Log Error with appropriate fields
  • If the existing flow doesn't have branching, add an IF node (Check {X} Success) after the Execute Workflow node that checks {{ $json.success }} equals false. The false path goes to Log Error, the true path continues the existing flow.
  • Log Error receives: { correlationId, workflow, node, operation, userMessage, errorMessage, httpCode, rawResponse, contextData, chatId, text }
  • After Log Error, the flow continues to the existing Telegram error response (Log Error passes through data)

Minimize new nodes: Where possible, modify existing result-handling Code nodes to include the error check rather than adding new IF nodes. Only add IF nodes where the existing flow has no branching capability.

Estimated new nodes: 2 (correlation ID generators) + 0-4 (IF nodes for error detection, depending on existing flow structure). Target: +2 to +6 new nodes. - Parse n8n-workflow.json to verify: 1. Generate Correlation ID node exists and is wired between IF Authenticated and Keyword Router 2. Generate Callback Correlation ID node exists and is wired in callback path 3. At least 4 Prepare Input nodes include correlationId in their return objects 4. Log Error node (from Plan 01) has at least 2 incoming connections 5. Node count is within expected range (174-178) - Verify connection integrity: no broken paths, all existing flows still connected Every authenticated request gets a correlation ID. Correlation IDs propagate to sub-workflow calls. At least 6 Execute Workflow result paths check for success === false and route to Log Error. Error entries appear in ring buffer with correlation IDs, sub-workflow names, and diagnostic data.

1. All 7 sub-workflow JSON files contain `success: true/false` returns 2. Main workflow has correlation ID generation on both text and callback paths 3. Log Error node has incoming connections from error detection paths 4. No broken connections in any workflow file (validate JSON structure) 5. Existing functionality preserved (action routing, Telegram responses unchanged) 6. `python3` JSON parse of all 8 files succeeds without errors

<success_criteria>

  • Sub-workflow errors automatically captured in ring buffer with full diagnostic context
  • /errors command shows real errors from Docker API failures
  • Correlation IDs trace a single user request across main + sub-workflow boundaries
  • No regression to existing bot functionality (all action/update/status/logs flows work) </success_criteria>
After completion, create `.planning/phases/10.2-better-logging-and-log-management/10.2-02-SUMMARY.md`