docs(10.2): create phase plan
This commit is contained in:
@@ -0,0 +1,242 @@
|
||||
---
|
||||
phase: 10.2-better-logging-and-log-management
|
||||
plan: 02
|
||||
type: execute
|
||||
wave: 2
|
||||
depends_on:
|
||||
- "10.2-01"
|
||||
files_modified:
|
||||
- n8n-workflow.json
|
||||
- n8n-actions.json
|
||||
- n8n-update.json
|
||||
- n8n-logs.json
|
||||
- n8n-batch-ui.json
|
||||
- n8n-status.json
|
||||
- n8n-confirmation.json
|
||||
- n8n-matching.json
|
||||
autonomous: true
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "All 7 sub-workflows return structured error objects when failures occur"
|
||||
- "Errors from sub-workflow failures are captured in the main workflow's ring buffer"
|
||||
- "User sees friendly error messages inline (summary + cause) when actions fail"
|
||||
- "Error entries include sub-workflow name, node, HTTP code, and raw response"
|
||||
- "Correlation IDs propagate from main workflow through sub-workflow calls"
|
||||
artifacts:
|
||||
- path: "n8n-actions.json"
|
||||
provides: "Structured error return on Docker API failure"
|
||||
contains: "success"
|
||||
- path: "n8n-update.json"
|
||||
provides: "Structured error return on update failure"
|
||||
contains: "success"
|
||||
- path: "n8n-logs.json"
|
||||
provides: "Structured error return on log retrieval failure"
|
||||
contains: "success"
|
||||
- path: "n8n-batch-ui.json"
|
||||
provides: "Structured error return on batch UI failure"
|
||||
contains: "success"
|
||||
- path: "n8n-status.json"
|
||||
provides: "Structured error return on status query failure"
|
||||
contains: "success"
|
||||
- path: "n8n-confirmation.json"
|
||||
provides: "Structured error return on confirmation failure"
|
||||
contains: "success"
|
||||
- path: "n8n-matching.json"
|
||||
provides: "Structured error return on matching failure"
|
||||
contains: "success"
|
||||
- path: "n8n-workflow.json"
|
||||
provides: "Error capture after each Execute Workflow node, correlation ID generation"
|
||||
contains: "Log Error"
|
||||
key_links:
|
||||
- from: "Execute Container Action (main)"
|
||||
to: "Log Error"
|
||||
via: "IF check on success field, false path to Log Error"
|
||||
pattern: "success.*false"
|
||||
- from: "sub-workflow error return"
|
||||
to: "main workflow ring buffer"
|
||||
via: "structured return value with success:false and error object"
|
||||
pattern: "success.*false.*error"
|
||||
---
|
||||
|
||||
<objective>
|
||||
Wire error propagation from all 7 sub-workflows to the main workflow's centralized error ring buffer.
|
||||
|
||||
Purpose: Make every sub-workflow failure automatically captured with full diagnostic context (workflow name, node, HTTP code, raw response, sub-workflow I/O boundaries) in the ring buffer. Users see friendly error messages; Claude gets queryable diagnostic data via /errors command.
|
||||
|
||||
Output: All 8 workflow JSON files modified. Sub-workflows return structured errors. Main workflow captures errors via Log Error node from Plan 01.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@/home/luc/.claude/get-shit-done/workflows/execute-plan.md
|
||||
@/home/luc/.claude/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/STATE.md
|
||||
@.planning/ROADMAP.md
|
||||
@n8n-workflow.json
|
||||
@n8n-actions.json
|
||||
@n8n-update.json
|
||||
@n8n-logs.json
|
||||
@n8n-batch-ui.json
|
||||
@n8n-status.json
|
||||
@n8n-confirmation.json
|
||||
@n8n-matching.json
|
||||
@DEPLOY-SUBWORKFLOWS.md
|
||||
@.planning/phases/10.2-better-logging-and-log-management/10.2-CONTEXT.md
|
||||
@.planning/phases/10.2-better-logging-and-log-management/10.2-RESEARCH.md
|
||||
@.planning/phases/10.2-better-logging-and-log-management/10.2-01-SUMMARY.md
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 1: Add structured error returns to all 7 sub-workflows</name>
|
||||
<files>n8n-actions.json, n8n-update.json, n8n-logs.json, n8n-batch-ui.json, n8n-status.json, n8n-confirmation.json, n8n-matching.json</files>
|
||||
<action>
|
||||
For each of the 7 sub-workflows, audit the existing error handling paths and ensure they return a standardized error object. The goal is NOT to change how errors are currently handled (many sub-workflows already have error paths), but to AUGMENT the return data with a consistent structure that the main workflow can detect and log.
|
||||
|
||||
**Standard error return format** (add to existing error paths):
|
||||
```javascript
|
||||
{
|
||||
success: false,
|
||||
action: "<existing-action-value>", // Preserve existing action field for routing
|
||||
error: {
|
||||
workflow: "<sub-workflow-name>",
|
||||
node: "<node-that-failed>",
|
||||
message: "<human-readable-error>",
|
||||
httpCode: <http-status-or-null>,
|
||||
rawResponse: "<truncated-raw-response>"
|
||||
},
|
||||
// ... preserve all existing return fields for backward compatibility
|
||||
}
|
||||
```
|
||||
|
||||
**Critical rule: PRESERVE BACKWARD COMPATIBILITY.** Existing return fields (action, text, chatId, messageId, keyboard, etc.) MUST remain unchanged. The `success` and `error` fields are ADDITIONS to the existing return objects.
|
||||
|
||||
**Per sub-workflow audit and modifications:**
|
||||
|
||||
1. **n8n-actions.json** (11 nodes): Already has error handling in Format Action Result nodes. Add `success: true` to success paths and `success: false` + `error` object to failure paths in the Format Action Result Code node. The existing `statusCode` checks (304, 404, 500+) should populate `error.httpCode`.
|
||||
|
||||
2. **n8n-update.json** (34 nodes): Has multiple error paths (pull error, create error, start error). Each error Code node (Format Pull Error, etc.) already returns `success: false`. Ensure each also includes an `error` object with `{ workflow: 'n8n-update', node, message, httpCode, rawResponse }`.
|
||||
|
||||
3. **n8n-logs.json** (9 nodes): Has error handling for container not found and log retrieval failures. Add `success` field to all return paths and `error` object to failure paths.
|
||||
|
||||
4. **n8n-batch-ui.json** (16 nodes): Has error handling for invalid state. Add `success` field to all return paths.
|
||||
|
||||
5. **n8n-status.json** (11 nodes): Has error handling for Docker query failures. Add `success` field to return paths.
|
||||
|
||||
6. **n8n-confirmation.json** (16 nodes): Has error paths for expired/invalid tokens. Add `success` field to return paths. Note: expired/cancel are NOT errors -- they are expected flows. Only add `success: false` for actual failures (Docker API errors in the stop execution path).
|
||||
|
||||
7. **n8n-matching.json** (23 nodes): Has `no_match` and `error` action returns. `no_match` is NOT an error. Only the `error` action path should include `success: false` with error details.
|
||||
|
||||
**Also add `correlationId` pass-through:** Each sub-workflow already accepts input parameters. Add `correlationId` to the "When executed by another workflow" trigger node's expected fields (it will be passed but ignored if not present -- n8n handles extra fields gracefully). In error return paths, include `correlationId: $('When executed by another workflow').item.json.correlationId || ''` so the main workflow can correlate errors.
|
||||
|
||||
**Implementation approach:** Read each sub-workflow JSON, identify Code nodes on error paths, modify their jsCode to include the standardized fields. Do NOT add new nodes to sub-workflows -- modify existing Code node outputs.
|
||||
</action>
|
||||
<verify>
|
||||
- For each sub-workflow JSON, parse and verify:
|
||||
1. At least one Code node contains `success: false` and `error:` in its jsCode
|
||||
2. At least one Code node contains `success: true` in its jsCode
|
||||
3. Error objects include `workflow:` field matching the sub-workflow name
|
||||
4. `correlationId` appears in error return paths
|
||||
- Spot-check n8n-actions.json and n8n-update.json Code nodes in detail since they have the most complex error paths
|
||||
</verify>
|
||||
<done>
|
||||
All 7 sub-workflows return `success: true/false` on all paths. Failure paths include standardized `error` object with workflow name, node, message, httpCode, and rawResponse. Existing return fields preserved for backward compatibility. correlationId passed through on error returns.
|
||||
</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 2: Add correlation ID generation and error capture to main workflow Execute Workflow paths</name>
|
||||
<files>n8n-workflow.json</files>
|
||||
<action>
|
||||
This task wires the error capture infrastructure in the main workflow.
|
||||
|
||||
**Part A: Correlation ID generation**
|
||||
|
||||
Add a new Code node: **Generate Correlation ID** (id: `code-generate-correlation-id`). Place it between the IF Authenticated node's true output and the Keyword Router, so every authenticated request gets a correlation ID.
|
||||
|
||||
Implementation:
|
||||
```javascript
|
||||
const correlationId = `${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
|
||||
return {
|
||||
json: {
|
||||
...$input.item.json,
|
||||
correlationId
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
Note: Use timestamp + random string instead of UUID (avoids `require('uuid')` dependency issues in n8n Code nodes). This generates sufficiently unique IDs for a single-user bot.
|
||||
|
||||
Wire: IF Authenticated (true) -> Generate Correlation ID -> Keyword Router. This requires updating the connection from IF Authenticated to Keyword Router.
|
||||
|
||||
Similarly, add correlation ID generation for the callback path. Add a new Code node: **Generate Callback Correlation ID** (id: `code-generate-callback-correlation-id`) between IF Callback Authenticated (true) and Parse Callback Data. Same implementation.
|
||||
|
||||
**Part B: Pass correlation ID to sub-workflow calls**
|
||||
|
||||
For each Execute Workflow node in the main workflow (there are ~17 of them per DEPLOY-SUBWORKFLOWS.md), ensure the `correlationId` field is passed as an input parameter. Since most Prepare Input Code nodes already construct the input object, add `correlationId: $('Generate Correlation ID').item.json.correlationId` (or `$('Generate Callback Correlation ID').item.json.correlationId` for callback-path nodes) to each Prepare Input node's return object.
|
||||
|
||||
**Important data chain note:** Use `$input.item.json.correlationId` pattern for nodes with multiple predecessors (per 10.1-09 decision). For nodes with a single predecessor chain back to the correlation ID generation, reference the specific node.
|
||||
|
||||
**Part C: Error capture after Execute Workflow nodes**
|
||||
|
||||
For the highest-value Execute Workflow nodes (the ones most likely to fail), add error detection that routes to the Log Error node from Plan 01. The pattern:
|
||||
|
||||
After each Execute Workflow node, the existing result-handling Code node or IF node should check the `success` field. If `success === false`, route a branch to Log Error.
|
||||
|
||||
**Priority targets** (modify these first -- they handle Docker API calls that actually fail):
|
||||
1. After Execute Container Action (single text command path)
|
||||
2. After Execute Inline Action (callback action path)
|
||||
3. After Execute Text Update (text update path)
|
||||
4. After Execute Callback Update (callback update path)
|
||||
5. After Execute Text Logs (text logs path)
|
||||
6. After Execute Inline Logs (callback logs path)
|
||||
|
||||
For each target:
|
||||
- If there's already a result-handling Code node after the Execute Workflow node, modify it to check `success === false` and, on the false path, route to Log Error with appropriate fields
|
||||
- If the existing flow doesn't have branching, add an IF node (Check {X} Success) after the Execute Workflow node that checks `{{ $json.success }}` equals `false`. The false path goes to Log Error, the true path continues the existing flow.
|
||||
- Log Error receives: `{ correlationId, workflow, node, operation, userMessage, errorMessage, httpCode, rawResponse, contextData, chatId, text }`
|
||||
- After Log Error, the flow continues to the existing Telegram error response (Log Error passes through data)
|
||||
|
||||
**Minimize new nodes:** Where possible, modify existing result-handling Code nodes to include the error check rather than adding new IF nodes. Only add IF nodes where the existing flow has no branching capability.
|
||||
|
||||
Estimated new nodes: 2 (correlation ID generators) + 0-4 (IF nodes for error detection, depending on existing flow structure). Target: +2 to +6 new nodes.
|
||||
</action>
|
||||
<verify>
|
||||
- Parse `n8n-workflow.json` to verify:
|
||||
1. Generate Correlation ID node exists and is wired between IF Authenticated and Keyword Router
|
||||
2. Generate Callback Correlation ID node exists and is wired in callback path
|
||||
3. At least 4 Prepare Input nodes include `correlationId` in their return objects
|
||||
4. Log Error node (from Plan 01) has at least 2 incoming connections
|
||||
5. Node count is within expected range (174-178)
|
||||
- Verify connection integrity: no broken paths, all existing flows still connected
|
||||
</verify>
|
||||
<done>
|
||||
Every authenticated request gets a correlation ID. Correlation IDs propagate to sub-workflow calls. At least 6 Execute Workflow result paths check for `success === false` and route to Log Error. Error entries appear in ring buffer with correlation IDs, sub-workflow names, and diagnostic data.
|
||||
</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
1. All 7 sub-workflow JSON files contain `success: true/false` returns
|
||||
2. Main workflow has correlation ID generation on both text and callback paths
|
||||
3. Log Error node has incoming connections from error detection paths
|
||||
4. No broken connections in any workflow file (validate JSON structure)
|
||||
5. Existing functionality preserved (action routing, Telegram responses unchanged)
|
||||
6. `python3` JSON parse of all 8 files succeeds without errors
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
- Sub-workflow errors automatically captured in ring buffer with full diagnostic context
|
||||
- /errors command shows real errors from Docker API failures
|
||||
- Correlation IDs trace a single user request across main + sub-workflow boundaries
|
||||
- No regression to existing bot functionality (all action/update/status/logs flows work)
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/10.2-better-logging-and-log-management/10.2-02-SUMMARY.md`
|
||||
</output>
|
||||
Reference in New Issue
Block a user