13 KiB
phase, plan, type, wave, depends_on, files_modified, autonomous, must_haves
| phase | plan | type | wave | depends_on | files_modified | autonomous | must_haves | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10.2-better-logging-and-log-management | 02 | execute | 2 |
|
|
true |
|
Purpose: Make every sub-workflow failure automatically captured with full diagnostic context (workflow name, node, HTTP code, raw response, sub-workflow I/O boundaries) in the ring buffer. Users see friendly error messages; Claude gets queryable diagnostic data via /errors command.
Output: All 8 workflow JSON files modified. Sub-workflows return structured errors. Main workflow captures errors via Log Error node from Plan 01.
<execution_context> @/home/luc/.claude/get-shit-done/workflows/execute-plan.md @/home/luc/.claude/get-shit-done/templates/summary.md </execution_context>
@.planning/STATE.md @.planning/ROADMAP.md @n8n-workflow.json @n8n-actions.json @n8n-update.json @n8n-logs.json @n8n-batch-ui.json @n8n-status.json @n8n-confirmation.json @n8n-matching.json @DEPLOY-SUBWORKFLOWS.md @.planning/phases/10.2-better-logging-and-log-management/10.2-CONTEXT.md @.planning/phases/10.2-better-logging-and-log-management/10.2-RESEARCH.md @.planning/phases/10.2-better-logging-and-log-management/10.2-01-SUMMARY.md Task 1: Add structured error returns to all 7 sub-workflows n8n-actions.json, n8n-update.json, n8n-logs.json, n8n-batch-ui.json, n8n-status.json, n8n-confirmation.json, n8n-matching.json For each of the 7 sub-workflows, audit the existing error handling paths and ensure they return a standardized error object. The goal is NOT to change how errors are currently handled (many sub-workflows already have error paths), but to AUGMENT the return data with a consistent structure that the main workflow can detect and log.Standard error return format (add to existing error paths):
{
success: false,
action: "<existing-action-value>", // Preserve existing action field for routing
error: {
workflow: "<sub-workflow-name>",
node: "<node-that-failed>",
message: "<human-readable-error>",
httpCode: <http-status-or-null>,
rawResponse: "<truncated-raw-response>"
},
// ... preserve all existing return fields for backward compatibility
}
Critical rule: PRESERVE BACKWARD COMPATIBILITY. Existing return fields (action, text, chatId, messageId, keyboard, etc.) MUST remain unchanged. The success and error fields are ADDITIONS to the existing return objects.
Per sub-workflow audit and modifications:
-
n8n-actions.json (11 nodes): Already has error handling in Format Action Result nodes. Add
success: trueto success paths andsuccess: false+errorobject to failure paths in the Format Action Result Code node. The existingstatusCodechecks (304, 404, 500+) should populateerror.httpCode. -
n8n-update.json (34 nodes): Has multiple error paths (pull error, create error, start error). Each error Code node (Format Pull Error, etc.) already returns
success: false. Ensure each also includes anerrorobject with{ workflow: 'n8n-update', node, message, httpCode, rawResponse }. -
n8n-logs.json (9 nodes): Has error handling for container not found and log retrieval failures. Add
successfield to all return paths anderrorobject to failure paths. -
n8n-batch-ui.json (16 nodes): Has error handling for invalid state. Add
successfield to all return paths. -
n8n-status.json (11 nodes): Has error handling for Docker query failures. Add
successfield to return paths. -
n8n-confirmation.json (16 nodes): Has error paths for expired/invalid tokens. Add
successfield to return paths. Note: expired/cancel are NOT errors -- they are expected flows. Only addsuccess: falsefor actual failures (Docker API errors in the stop execution path). -
n8n-matching.json (23 nodes): Has
no_matchanderroraction returns.no_matchis NOT an error. Only theerroraction path should includesuccess: falsewith error details.
Also add correlationId pass-through: Each sub-workflow already accepts input parameters. Add correlationId to the "When executed by another workflow" trigger node's expected fields (it will be passed but ignored if not present -- n8n handles extra fields gracefully). In error return paths, include correlationId: $('When executed by another workflow').item.json.correlationId || '' so the main workflow can correlate errors.
Implementation approach: Read each sub-workflow JSON, identify Code nodes on error paths, modify their jsCode to include the standardized fields. Do NOT add new nodes to sub-workflows -- modify existing Code node outputs.
- For each sub-workflow JSON, parse and verify:
1. At least one Code node contains success: false and error: in its jsCode
2. At least one Code node contains success: true in its jsCode
3. Error objects include workflow: field matching the sub-workflow name
4. correlationId appears in error return paths
- Spot-check n8n-actions.json and n8n-update.json Code nodes in detail since they have the most complex error paths
All 7 sub-workflows return success: true/false on all paths. Failure paths include standardized error object with workflow name, node, message, httpCode, and rawResponse. Existing return fields preserved for backward compatibility. correlationId passed through on error returns.
Part A: Correlation ID generation
Add a new Code node: Generate Correlation ID (id: code-generate-correlation-id). Place it between the IF Authenticated node's true output and the Keyword Router, so every authenticated request gets a correlation ID.
Implementation:
const correlationId = `${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
return {
json: {
...$input.item.json,
correlationId
}
};
Note: Use timestamp + random string instead of UUID (avoids require('uuid') dependency issues in n8n Code nodes). This generates sufficiently unique IDs for a single-user bot.
Wire: IF Authenticated (true) -> Generate Correlation ID -> Keyword Router. This requires updating the connection from IF Authenticated to Keyword Router.
Similarly, add correlation ID generation for the callback path. Add a new Code node: Generate Callback Correlation ID (id: code-generate-callback-correlation-id) between IF Callback Authenticated (true) and Parse Callback Data. Same implementation.
Part B: Pass correlation ID to sub-workflow calls
For each Execute Workflow node in the main workflow (there are ~17 of them per DEPLOY-SUBWORKFLOWS.md), ensure the correlationId field is passed as an input parameter. Since most Prepare Input Code nodes already construct the input object, add correlationId: $('Generate Correlation ID').item.json.correlationId (or $('Generate Callback Correlation ID').item.json.correlationId for callback-path nodes) to each Prepare Input node's return object.
Important data chain note: Use $input.item.json.correlationId pattern for nodes with multiple predecessors (per 10.1-09 decision). For nodes with a single predecessor chain back to the correlation ID generation, reference the specific node.
Part C: Error capture after Execute Workflow nodes
For the highest-value Execute Workflow nodes (the ones most likely to fail), add error detection that routes to the Log Error node from Plan 01. The pattern:
After each Execute Workflow node, the existing result-handling Code node or IF node should check the success field. If success === false, route a branch to Log Error.
Priority targets (modify these first -- they handle Docker API calls that actually fail):
- After Execute Container Action (single text command path)
- After Execute Inline Action (callback action path)
- After Execute Text Update (text update path)
- After Execute Callback Update (callback update path)
- After Execute Text Logs (text logs path)
- After Execute Inline Logs (callback logs path)
For each target:
- If there's already a result-handling Code node after the Execute Workflow node, modify it to check
success === falseand, on the false path, route to Log Error with appropriate fields - If the existing flow doesn't have branching, add an IF node (Check {X} Success) after the Execute Workflow node that checks
{{ $json.success }}equalsfalse. The false path goes to Log Error, the true path continues the existing flow. - Log Error receives:
{ correlationId, workflow, node, operation, userMessage, errorMessage, httpCode, rawResponse, contextData, chatId, text } - After Log Error, the flow continues to the existing Telegram error response (Log Error passes through data)
Minimize new nodes: Where possible, modify existing result-handling Code nodes to include the error check rather than adding new IF nodes. Only add IF nodes where the existing flow has no branching capability.
Estimated new nodes: 2 (correlation ID generators) + 0-4 (IF nodes for error detection, depending on existing flow structure). Target: +2 to +6 new nodes.
- Parse n8n-workflow.json to verify:
1. Generate Correlation ID node exists and is wired between IF Authenticated and Keyword Router
2. Generate Callback Correlation ID node exists and is wired in callback path
3. At least 4 Prepare Input nodes include correlationId in their return objects
4. Log Error node (from Plan 01) has at least 2 incoming connections
5. Node count is within expected range (174-178)
- Verify connection integrity: no broken paths, all existing flows still connected
Every authenticated request gets a correlation ID. Correlation IDs propagate to sub-workflow calls. At least 6 Execute Workflow result paths check for success === false and route to Log Error. Error entries appear in ring buffer with correlation IDs, sub-workflow names, and diagnostic data.
<success_criteria>
- Sub-workflow errors automatically captured in ring buffer with full diagnostic context
- /errors command shows real errors from Docker API failures
- Correlation IDs trace a single user request across main + sub-workflow boundaries
- No regression to existing bot functionality (all action/update/status/logs flows work) </success_criteria>