---
phase: 10.2-better-logging-and-log-management
plan: 02
type: execute
wave: 2
depends_on:
  - "10.2-01"
files_modified:
  - n8n-workflow.json
  - n8n-actions.json
  - n8n-update.json
  - n8n-logs.json
  - n8n-batch-ui.json
  - n8n-status.json
  - n8n-confirmation.json
  - n8n-matching.json
autonomous: true

must_haves:
  truths:
    - "All 7 sub-workflows return structured error objects when failures occur"
    - "Errors from sub-workflow failures are captured in the main workflow's ring buffer"
    - "User sees friendly error messages inline (summary + cause) when actions fail"
    - "Error entries include sub-workflow name, node, HTTP code, and raw response"
    - "Correlation IDs propagate from main workflow through sub-workflow calls"
  artifacts:
    - path: "n8n-actions.json"
      provides: "Structured error return on Docker API failure"
      contains: "success"
    - path: "n8n-update.json"
      provides: "Structured error return on update failure"
      contains: "success"
    - path: "n8n-logs.json"
      provides: "Structured error return on log retrieval failure"
      contains: "success"
    - path: "n8n-batch-ui.json"
      provides: "Structured error return on batch UI failure"
      contains: "success"
    - path: "n8n-status.json"
      provides: "Structured error return on status query failure"
      contains: "success"
    - path: "n8n-confirmation.json"
      provides: "Structured error return on confirmation failure"
      contains: "success"
    - path: "n8n-matching.json"
      provides: "Structured error return on matching failure"
      contains: "success"
    - path: "n8n-workflow.json"
      provides: "Error capture after each Execute Workflow node, correlation ID generation"
      contains: "Log Error"
  key_links:
    - from: "Execute Container Action (main)"
      to: "Log Error"
      via: "IF check on success field, false path to Log Error"
      pattern: "success.*false"
    - from: "sub-workflow error return"
      to: "main workflow ring buffer"
      via: "structured return value with success:false and error object"
      pattern: "success.*false.*error"
---

<objective>
Wire error propagation from all 7 sub-workflows to the main workflow's centralized error ring buffer.

Purpose: Make every sub-workflow failure automatically captured with full diagnostic context (workflow name, node, HTTP code, raw response, sub-workflow I/O boundaries) in the ring buffer. Users see friendly error messages; Claude gets queryable diagnostic data via /errors command.

Output: All 8 workflow JSON files modified. Sub-workflows return structured errors. Main workflow captures errors via Log Error node from Plan 01.
</objective>

<execution_context>
@/home/luc/.claude/get-shit-done/workflows/execute-plan.md
@/home/luc/.claude/get-shit-done/templates/summary.md
</execution_context>

<context>
@.planning/STATE.md
@.planning/ROADMAP.md
@n8n-workflow.json
@n8n-actions.json
@n8n-update.json
@n8n-logs.json
@n8n-batch-ui.json
@n8n-status.json
@n8n-confirmation.json
@n8n-matching.json
@DEPLOY-SUBWORKFLOWS.md
@.planning/phases/10.2-better-logging-and-log-management/10.2-CONTEXT.md
@.planning/phases/10.2-better-logging-and-log-management/10.2-RESEARCH.md
@.planning/phases/10.2-better-logging-and-log-management/10.2-01-SUMMARY.md
</context>

<tasks>

<task type="auto">
  <name>Task 1: Add structured error returns to all 7 sub-workflows</name>
  <files>n8n-actions.json, n8n-update.json, n8n-logs.json, n8n-batch-ui.json, n8n-status.json, n8n-confirmation.json, n8n-matching.json</files>
  <action>
For each of the 7 sub-workflows, audit the existing error handling paths and ensure they return a standardized error object. The goal is NOT to change how errors are currently handled (many sub-workflows already have error paths), but to AUGMENT the return data with a consistent structure that the main workflow can detect and log.

**Standard error return format** (add to existing error paths):
```javascript
{
  success: false,
  action: "<existing-action-value>",  // Preserve existing action field for routing
  error: {
    workflow: "<sub-workflow-name>",
    node: "<node-that-failed>",
    message: "<human-readable-error>",
    httpCode: <http-status-or-null>,
    rawResponse: "<truncated-raw-response>"
  },
  // ... preserve all existing return fields for backward compatibility
}
```

**Critical rule: PRESERVE BACKWARD COMPATIBILITY.** Existing return fields (action, text, chatId, messageId, keyboard, etc.) MUST remain unchanged. The `success` and `error` fields are ADDITIONS to the existing return objects.

**Per sub-workflow audit and modifications:**

1. **n8n-actions.json** (11 nodes): Already has error handling in Format Action Result nodes. Add `success: true` to success paths and `success: false` + `error` object to failure paths in the Format Action Result Code node. The existing `statusCode` checks (304, 404, 500+) should populate `error.httpCode`.

2. **n8n-update.json** (34 nodes): Has multiple error paths (pull error, create error, start error). Each error Code node (Format Pull Error, etc.) already returns `success: false`. Ensure each also includes an `error` object with `{ workflow: 'n8n-update', node, message, httpCode, rawResponse }`.

3. **n8n-logs.json** (9 nodes): Has error handling for container not found and log retrieval failures. Add `success` field to all return paths and `error` object to failure paths.

4. **n8n-batch-ui.json** (16 nodes): Has error handling for invalid state. Add `success` field to all return paths.

5. **n8n-status.json** (11 nodes): Has error handling for Docker query failures. Add `success` field to return paths.

6. **n8n-confirmation.json** (16 nodes): Has error paths for expired/invalid tokens. Add `success` field to return paths. Note: expired/cancel are NOT errors -- they are expected flows. Only add `success: false` for actual failures (Docker API errors in the stop execution path).

7. **n8n-matching.json** (23 nodes): Has `no_match` and `error` action returns. `no_match` is NOT an error. Only the `error` action path should include `success: false` with error details.

**Also add `correlationId` pass-through:** Each sub-workflow already accepts input parameters. Add `correlationId` to the "When executed by another workflow" trigger node's expected fields (it will be passed but ignored if not present -- n8n handles extra fields gracefully). In error return paths, include `correlationId: $('When executed by another workflow').item.json.correlationId || ''` so the main workflow can correlate errors.

**Implementation approach:** Read each sub-workflow JSON, identify Code nodes on error paths, modify their jsCode to include the standardized fields. Do NOT add new nodes to sub-workflows -- modify existing Code node outputs.
  </action>
  <verify>
    - For each sub-workflow JSON, parse and verify:
      1. At least one Code node contains `success: false` and `error:` in its jsCode
      2. At least one Code node contains `success: true` in its jsCode
      3. Error objects include `workflow:` field matching the sub-workflow name
      4. `correlationId` appears in error return paths
    - Spot-check n8n-actions.json and n8n-update.json Code nodes in detail since they have the most complex error paths
  </verify>
  <done>
    All 7 sub-workflows return `success: true/false` on all paths. Failure paths include standardized `error` object with workflow name, node, message, httpCode, and rawResponse. Existing return fields preserved for backward compatibility. correlationId passed through on error returns.
  </done>
</task>

<task type="auto">
  <name>Task 2: Add correlation ID generation and error capture to main workflow Execute Workflow paths</name>
  <files>n8n-workflow.json</files>
  <action>
This task wires the error capture infrastructure in the main workflow.

**Part A: Correlation ID generation**

Add a new Code node: **Generate Correlation ID** (id: `code-generate-correlation-id`). Place it between the IF Authenticated node's true output and the Keyword Router, so every authenticated request gets a correlation ID.

Implementation:
```javascript
const correlationId = `${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
return {
  json: {
    ...$input.item.json,
    correlationId
  }
};
```

Note: Use timestamp + random string instead of UUID (avoids `require('uuid')` dependency issues in n8n Code nodes). This generates sufficiently unique IDs for a single-user bot.

Wire: IF Authenticated (true) -> Generate Correlation ID -> Keyword Router. This requires updating the connection from IF Authenticated to Keyword Router.

Similarly, add correlation ID generation for the callback path. Add a new Code node: **Generate Callback Correlation ID** (id: `code-generate-callback-correlation-id`) between IF Callback Authenticated (true) and Parse Callback Data. Same implementation.

**Part B: Pass correlation ID to sub-workflow calls**

For each Execute Workflow node in the main workflow (there are ~17 of them per DEPLOY-SUBWORKFLOWS.md), ensure the `correlationId` field is passed as an input parameter. Since most Prepare Input Code nodes already construct the input object, add `correlationId: $('Generate Correlation ID').item.json.correlationId` (or `$('Generate Callback Correlation ID').item.json.correlationId` for callback-path nodes) to each Prepare Input node's return object.

**Important data chain note:** Use `$input.item.json.correlationId` pattern for nodes with multiple predecessors (per 10.1-09 decision). For nodes with a single predecessor chain back to the correlation ID generation, reference the specific node.

**Part C: Error capture after Execute Workflow nodes**

For the highest-value Execute Workflow nodes (the ones most likely to fail), add error detection that routes to the Log Error node from Plan 01. The pattern:

After each Execute Workflow node, the existing result-handling Code node or IF node should check the `success` field. If `success === false`, route a branch to Log Error.

**Priority targets** (modify these first -- they handle Docker API calls that actually fail):
1. After Execute Container Action (single text command path)
2. After Execute Inline Action (callback action path)
3. After Execute Text Update (text update path)
4. After Execute Callback Update (callback update path)
5. After Execute Text Logs (text logs path)
6. After Execute Inline Logs (callback logs path)

For each target:
- If there's already a result-handling Code node after the Execute Workflow node, modify it to check `success === false` and, on the false path, route to Log Error with appropriate fields
- If the existing flow doesn't have branching, add an IF node (Check {X} Success) after the Execute Workflow node that checks `{{ $json.success }}` equals `false`. The false path goes to Log Error, the true path continues the existing flow.
- Log Error receives: `{ correlationId, workflow, node, operation, userMessage, errorMessage, httpCode, rawResponse, contextData, chatId, text }`
- After Log Error, the flow continues to the existing Telegram error response (Log Error passes through data)

**Minimize new nodes:** Where possible, modify existing result-handling Code nodes to include the error check rather than adding new IF nodes. Only add IF nodes where the existing flow has no branching capability.

Estimated new nodes: 2 (correlation ID generators) + 0-4 (IF nodes for error detection, depending on existing flow structure). Target: +2 to +6 new nodes.
  </action>
  <verify>
    - Parse `n8n-workflow.json` to verify:
      1. Generate Correlation ID node exists and is wired between IF Authenticated and Keyword Router
      2. Generate Callback Correlation ID node exists and is wired in callback path
      3. At least 4 Prepare Input nodes include `correlationId` in their return objects
      4. Log Error node (from Plan 01) has at least 2 incoming connections
      5. Node count is within expected range (174-178)
    - Verify connection integrity: no broken paths, all existing flows still connected
  </verify>
  <done>
    Every authenticated request gets a correlation ID. Correlation IDs propagate to sub-workflow calls. At least 6 Execute Workflow result paths check for `success === false` and route to Log Error. Error entries appear in ring buffer with correlation IDs, sub-workflow names, and diagnostic data.
  </done>
</task>

</tasks>

<verification>
1. All 7 sub-workflow JSON files contain `success: true/false` returns
2. Main workflow has correlation ID generation on both text and callback paths
3. Log Error node has incoming connections from error detection paths
4. No broken connections in any workflow file (validate JSON structure)
5. Existing functionality preserved (action routing, Telegram responses unchanged)
6. `python3` JSON parse of all 8 files succeeds without errors
</verification>

<success_criteria>
- Sub-workflow errors automatically captured in ring buffer with full diagnostic context
- /errors command shows real errors from Docker API failures
- Correlation IDs trace a single user request across main + sub-workflow boundaries
- No regression to existing bot functionality (all action/update/status/logs flows work)
</success_criteria>

<output>
After completion, create `.planning/phases/10.2-better-logging-and-log-management/10.2-02-SUMMARY.md`
</output>