docs(10.2): create phase plan
This commit is contained in:
+11
-5
@@ -76,19 +76,25 @@ Plans:
|
|||||||
|
|
||||||
### Phase 10.2: Better Logging and Log Management (INSERTED)
|
### Phase 10.2: Better Logging and Log Management (INSERTED)
|
||||||
|
|
||||||
**Goal:** Improve logging capabilities and log management features
|
**Goal:** Add centralized error capture, execution tracing, and debugging infrastructure for programmatic issue diagnosis
|
||||||
|
|
||||||
**Dependencies:** Phase 10.1 (aggressive modularization complete)
|
**Dependencies:** Phase 10.1 (aggressive modularization complete)
|
||||||
|
|
||||||
**Requirements:** TBD
|
**Requirements:** LOG-01 (error ring buffer), LOG-02 (sub-workflow error propagation), LOG-03 (debug commands), LOG-04 (debug mode tracing)
|
||||||
|
|
||||||
**Plans:** 0 plans
|
**Plans:** 3 plans
|
||||||
|
|
||||||
Plans:
|
Plans:
|
||||||
- [ ] TBD (run /gsd:plan-phase 10.2 to break down)
|
- [ ] 10.2-01-PLAN.md -- Error ring buffer foundation + hidden Telegram debug commands
|
||||||
|
- [ ] 10.2-02-PLAN.md -- Sub-workflow error propagation + correlation ID tracking
|
||||||
|
- [ ] 10.2-03-PLAN.md -- Debug mode tracing + deployment verification
|
||||||
|
|
||||||
**Success Criteria:**
|
**Success Criteria:**
|
||||||
1. [To be defined during planning]
|
1. Errors from sub-workflow failures automatically captured in ring buffer with full diagnostic context
|
||||||
|
2. /errors, /clear-errors, /debug, /trace hidden commands work via Telegram
|
||||||
|
3. Correlation IDs trace single user requests across main + sub-workflow boundaries
|
||||||
|
4. Debug mode captures sub-workflow I/O boundary data and callback routing decisions
|
||||||
|
5. No regression to existing bot functionality after deployment
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,244 @@
|
|||||||
|
---
|
||||||
|
phase: 10.2-better-logging-and-log-management
|
||||||
|
plan: 01
|
||||||
|
type: execute
|
||||||
|
wave: 1
|
||||||
|
depends_on: []
|
||||||
|
files_modified:
|
||||||
|
- n8n-workflow.json
|
||||||
|
autonomous: true
|
||||||
|
|
||||||
|
must_haves:
|
||||||
|
truths:
|
||||||
|
- "Error ring buffer stores up to 50 structured error entries in workflow static data"
|
||||||
|
- "/errors command returns recent errors in human-readable format via Telegram"
|
||||||
|
- "/clear-errors command resets the error ring buffer"
|
||||||
|
- "/debug on|off|status toggles debug mode stored in static data"
|
||||||
|
- "/trace <correlationId> returns all entries matching a correlation ID"
|
||||||
|
- "Hidden commands are NOT listed in /start help menu"
|
||||||
|
artifacts:
|
||||||
|
- path: "n8n-workflow.json"
|
||||||
|
provides: "Error ring buffer infrastructure, debug toggle, hidden command routing and responses"
|
||||||
|
contains: "Process Debug Command"
|
||||||
|
key_links:
|
||||||
|
- from: "Keyword Router"
|
||||||
|
to: "Process Debug Command"
|
||||||
|
via: "switch output for /errors, /clear-errors, /debug, /trace keywords"
|
||||||
|
pattern: "keyword-errors|keyword-debug|keyword-trace|keyword-clear-errors"
|
||||||
|
- from: "Process Debug Command"
|
||||||
|
to: "Send Debug Response"
|
||||||
|
via: "formatted message output"
|
||||||
|
pattern: "chatId.*text"
|
||||||
|
---
|
||||||
|
|
||||||
|
<objective>
|
||||||
|
Build the error ring buffer foundation and hidden Telegram debug commands in the main workflow.
|
||||||
|
|
||||||
|
Purpose: Establish the centralized error/trace storage infrastructure (workflow static data with ring buffer) and the hidden command interface (/errors, /clear-errors, /debug, /trace) that Claude and the user can use for quick diagnostics. This is the foundation that Plans 02 and 03 build upon.
|
||||||
|
|
||||||
|
Output: Modified `n8n-workflow.json` with ring buffer initialization, hidden command routing, and Telegram response nodes for all 4 debug commands.
|
||||||
|
</objective>
|
||||||
|
|
||||||
|
<execution_context>
|
||||||
|
@/home/luc/.claude/get-shit-done/workflows/execute-plan.md
|
||||||
|
@/home/luc/.claude/get-shit-done/templates/summary.md
|
||||||
|
</execution_context>
|
||||||
|
|
||||||
|
<context>
|
||||||
|
@.planning/STATE.md
|
||||||
|
@.planning/ROADMAP.md
|
||||||
|
@n8n-workflow.json
|
||||||
|
@DEPLOY-SUBWORKFLOWS.md
|
||||||
|
@.planning/phases/10.2-better-logging-and-log-management/10.2-CONTEXT.md
|
||||||
|
@.planning/phases/10.2-better-logging-and-log-management/10.2-RESEARCH.md
|
||||||
|
</context>
|
||||||
|
|
||||||
|
<tasks>
|
||||||
|
|
||||||
|
<task type="auto">
|
||||||
|
<name>Task 1: Add hidden command routing to Keyword Router and create debug command processor</name>
|
||||||
|
<files>n8n-workflow.json</files>
|
||||||
|
<action>
|
||||||
|
Modify the Keyword Router switch node (id: `switch-keyword-router`) to add 4 new keyword outputs BEFORE the fallback output. The new outputs must use `startsWith` operator (not `contains`) to avoid false matches with regular text:
|
||||||
|
|
||||||
|
1. **keyword-errors**: matches `/errors` (startsWith, case-insensitive) -> outputKey: "errors"
|
||||||
|
2. **keyword-clear-errors**: matches `/clear` (startsWith, case-insensitive) -> outputKey: "clear-errors"
|
||||||
|
3. **keyword-debug**: matches `/debug` (startsWith, case-insensitive) -> outputKey: "debug"
|
||||||
|
4. **keyword-trace**: matches `/trace` (startsWith, case-insensitive) -> outputKey: "trace"
|
||||||
|
|
||||||
|
All 4 outputs route to a single new Code node: **Process Debug Command** (id: `code-process-debug-command`).
|
||||||
|
|
||||||
|
The Process Debug Command Code node implements ALL 4 commands in a single code block. It reads `$json.message.text` to determine which command was invoked, then:
|
||||||
|
|
||||||
|
**Static data structure** (initialize if missing):
|
||||||
|
```javascript
|
||||||
|
const staticData = $getWorkflowStaticData('global');
|
||||||
|
if (!staticData.errorLog) {
|
||||||
|
staticData.errorLog = {
|
||||||
|
debug: { enabled: false, executionCount: 0 },
|
||||||
|
errors: { buffer: [], nextId: 1, count: 0, lastCleared: new Date().toISOString() },
|
||||||
|
traces: { buffer: [], nextId: 1 }
|
||||||
|
};
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Command handling:**
|
||||||
|
|
||||||
|
- `/errors [N]`: Read `staticData.errorLog.errors.buffer`, take last N entries (default 5, max 50), format each as:
|
||||||
|
```
|
||||||
|
#{id} - {timestamp}
|
||||||
|
Workflow: {workflow} > {node}
|
||||||
|
{userMessage}
|
||||||
|
HTTP: {httpCode} (if present)
|
||||||
|
```
|
||||||
|
If no errors, return "No errors recorded." Include total error count and debug mode status at bottom.
|
||||||
|
|
||||||
|
- `/clear-errors`: Reset `staticData.errorLog.errors.buffer = []`, reset `nextId = 1`, update `lastCleared`. Return "Error log cleared. {count} entries removed."
|
||||||
|
|
||||||
|
- `/debug on|off|status`:
|
||||||
|
- `on`: Set `staticData.errorLog.debug.enabled = true`, reset `executionCount = 0`. Return "Debug mode enabled. Tracing sub-workflow boundaries and callback routing."
|
||||||
|
- `off`: Set `staticData.errorLog.debug.enabled = false`. Return "Debug mode disabled."
|
||||||
|
- `status` (or no argument): Return debug mode state, execution count, error buffer size, trace buffer size.
|
||||||
|
|
||||||
|
- `/trace <correlationId>`: Search both `staticData.errorLog.errors.buffer` and `staticData.errorLog.traces.buffer` for entries matching the correlationId. Format results chronologically. If no matches, return "No entries found for correlation ID: {id}".
|
||||||
|
|
||||||
|
The Code node returns `{ json: { chatId, text } }` where chatId comes from `$json.message.chat.id` and text is the formatted response (use HTML parse mode with `<pre>` for structured output).
|
||||||
|
|
||||||
|
**Wire the output** of Process Debug Command to a new Telegram node: **Send Debug Response** (id: `telegram-send-debug-response`), which sends the message using `chatId` and `text` from the Code node output, with `parse_mode: HTML`. Use the standard Telegram credential (id: `I0xTTiASl7C1NZhJ`, name: "Telegram account").
|
||||||
|
|
||||||
|
**Positioning:** Place Process Debug Command at position [1120, -200] and Send Debug Response at [1340, -200] to keep them visually grouped above the existing menu path.
|
||||||
|
|
||||||
|
**Important:** Do NOT modify the Show Menu text or /start command response. These debug commands must remain hidden/unlisted.
|
||||||
|
|
||||||
|
Node count impact: +2 new nodes (Code + Telegram send).
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
- Parse `n8n-workflow.json` with `python3 -c "import json; ..."` to verify:
|
||||||
|
1. Keyword Router has 4 new switch rules (errors, clear-errors, debug, trace)
|
||||||
|
2. Process Debug Command node exists with type `n8n-nodes-base.code`
|
||||||
|
3. Send Debug Response node exists with type `n8n-nodes-base.telegram`
|
||||||
|
4. Connections exist: Keyword Router -> Process Debug Command -> Send Debug Response
|
||||||
|
5. Total node count is 170 (168 + 2 new nodes)
|
||||||
|
- Verify the Code node's jsCode contains `$getWorkflowStaticData('global')` and handles all 4 commands
|
||||||
|
</verify>
|
||||||
|
<done>
|
||||||
|
Keyword Router routes /errors, /clear-errors, /debug, /trace to Process Debug Command. The code node initializes static data structure, implements all 4 commands, and outputs formatted text. Send Debug Response delivers the message via Telegram. No changes to existing Show Menu text.
|
||||||
|
</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
<task type="auto">
|
||||||
|
<name>Task 2: Add error logging utility function and ring buffer write helper</name>
|
||||||
|
<files>n8n-workflow.json</files>
|
||||||
|
<action>
|
||||||
|
Create a new Code node: **Log Error** (id: `code-log-error`) that serves as the centralized error logging entry point. This node will be called from multiple places in the main workflow (wired in Plan 02).
|
||||||
|
|
||||||
|
The Log Error node expects input with these fields:
|
||||||
|
- `correlationId` (string, optional - falls back to execution ID)
|
||||||
|
- `workflow` (string - "main" or sub-workflow name like "n8n-actions")
|
||||||
|
- `node` (string - node that encountered the error)
|
||||||
|
- `operation` (string - what was being done, e.g., "docker.stop")
|
||||||
|
- `userMessage` (string - user-friendly error summary)
|
||||||
|
- `errorMessage` (string - technical error message)
|
||||||
|
- `errorStack` (string, optional - stack trace)
|
||||||
|
- `httpCode` (number, optional - HTTP status code)
|
||||||
|
- `rawResponse` (string, optional - raw API response, will be truncated)
|
||||||
|
- `contextData` (object, optional - additional context like containerId, subWorkflowInput/Output)
|
||||||
|
- `chatId` (number - for pass-through to downstream)
|
||||||
|
- `text` (string, optional - for pass-through to downstream)
|
||||||
|
|
||||||
|
Implementation:
|
||||||
|
```javascript
|
||||||
|
const staticData = $getWorkflowStaticData('global');
|
||||||
|
const input = $input.item.json;
|
||||||
|
|
||||||
|
// Initialize if needed
|
||||||
|
if (!staticData.errorLog) {
|
||||||
|
staticData.errorLog = {
|
||||||
|
debug: { enabled: false, executionCount: 0 },
|
||||||
|
errors: { buffer: [], nextId: 1, count: 0, lastCleared: new Date().toISOString() },
|
||||||
|
traces: { buffer: [], nextId: 1 }
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
const MAX_ERRORS = 50;
|
||||||
|
const errorEntry = {
|
||||||
|
id: `err_${String(staticData.errorLog.errors.nextId).padStart(3, '0')}`,
|
||||||
|
correlationId: input.correlationId || $execution.id,
|
||||||
|
timestamp: new Date().toISOString(),
|
||||||
|
executionId: $execution.id,
|
||||||
|
workflow: input.workflow || 'main',
|
||||||
|
node: input.node || 'unknown',
|
||||||
|
operation: input.operation || 'unknown',
|
||||||
|
userMessage: input.userMessage || input.errorMessage || 'Unknown error',
|
||||||
|
error: {
|
||||||
|
message: input.errorMessage || 'Unknown error',
|
||||||
|
stack: (input.errorStack || '').substring(0, 500),
|
||||||
|
httpCode: input.httpCode || null,
|
||||||
|
rawResponse: (input.rawResponse || '').substring(0, 1000)
|
||||||
|
},
|
||||||
|
context: input.contextData || {}
|
||||||
|
};
|
||||||
|
|
||||||
|
// Ring buffer: push and rotate
|
||||||
|
staticData.errorLog.errors.buffer.push(errorEntry);
|
||||||
|
if (staticData.errorLog.errors.buffer.length > MAX_ERRORS) {
|
||||||
|
staticData.errorLog.errors.buffer.shift();
|
||||||
|
}
|
||||||
|
staticData.errorLog.errors.nextId++;
|
||||||
|
staticData.errorLog.errors.count++;
|
||||||
|
|
||||||
|
// Pass through all input data so downstream nodes still work
|
||||||
|
return { json: { ...input, _errorLogged: true, _errorId: errorEntry.id } };
|
||||||
|
```
|
||||||
|
|
||||||
|
**Positioning:** Place Log Error at position [2600, -200] (utility area, visually separate from main flow).
|
||||||
|
|
||||||
|
Also create a **Log Trace** Code node (id: `code-log-trace`) for debug-mode trace entries. This node:
|
||||||
|
- Checks `staticData.errorLog.debug.enabled` first; if false, passes data through unchanged
|
||||||
|
- If debug is enabled, increments `executionCount` and checks auto-disable at 100
|
||||||
|
- Stores trace entry in `staticData.errorLog.traces.buffer` (ring buffer, max 50)
|
||||||
|
- Trace entry fields: `id`, `correlationId`, `timestamp`, `executionId`, `event` (string: "sub-workflow-call" | "callback-routing"), `workflow`, `node`, `data` (object with input/output/duration or callbackData/routeTaken)
|
||||||
|
- Passes through all input data unchanged
|
||||||
|
|
||||||
|
**Positioning:** Place Log Trace at position [2600, -400].
|
||||||
|
|
||||||
|
These two nodes are NOT connected to anything yet -- they will be wired in Plan 02. They are standalone utility Code nodes that Plan 02 and Plan 03 will reference.
|
||||||
|
|
||||||
|
Node count impact: +2 new nodes (total now 172 including Task 1's additions).
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
- Parse `n8n-workflow.json` to verify:
|
||||||
|
1. Log Error node exists (id: `code-log-error`) with type `n8n-nodes-base.code`
|
||||||
|
2. Log Trace node exists (id: `code-log-trace`) with type `n8n-nodes-base.code`
|
||||||
|
3. Log Error's jsCode contains `$getWorkflowStaticData`, `MAX_ERRORS = 50`, `buffer.push`, `buffer.shift`
|
||||||
|
4. Log Trace's jsCode contains `debug.enabled` check and auto-disable at 100
|
||||||
|
5. Total node count is 172
|
||||||
|
6. Both nodes have no incoming or outgoing connections (standalone)
|
||||||
|
</verify>
|
||||||
|
<done>
|
||||||
|
Log Error and Log Trace utility nodes exist in the main workflow with correct ring buffer logic, field truncation, and auto-disable. They are ready to be wired by Plan 02 and Plan 03.
|
||||||
|
</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
</tasks>
|
||||||
|
|
||||||
|
<verification>
|
||||||
|
1. `python3 -c "import json; wf=json.load(open('n8n-workflow.json')); print(len(wf['nodes']))"` returns 172
|
||||||
|
2. Keyword Router has outputs for errors, clear-errors, debug, trace
|
||||||
|
3. All 4 hidden commands handled in Process Debug Command code
|
||||||
|
4. Log Error and Log Trace utility nodes exist and are unconnected
|
||||||
|
5. No changes to Show Menu text (commands remain hidden)
|
||||||
|
6. All new nodes use correct n8n typeVersion and credential references
|
||||||
|
</verification>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
- Main workflow has 172 nodes (168 + 4 new)
|
||||||
|
- Ring buffer infrastructure initialized in workflow static data
|
||||||
|
- /errors, /clear-errors, /debug, /trace commands routed and handled
|
||||||
|
- Log Error and Log Trace utility nodes ready for wiring
|
||||||
|
- No regression to existing functionality
|
||||||
|
</success_criteria>
|
||||||
|
|
||||||
|
<output>
|
||||||
|
After completion, create `.planning/phases/10.2-better-logging-and-log-management/10.2-01-SUMMARY.md`
|
||||||
|
</output>
|
||||||
@@ -0,0 +1,242 @@
|
|||||||
|
---
|
||||||
|
phase: 10.2-better-logging-and-log-management
|
||||||
|
plan: 02
|
||||||
|
type: execute
|
||||||
|
wave: 2
|
||||||
|
depends_on:
|
||||||
|
- "10.2-01"
|
||||||
|
files_modified:
|
||||||
|
- n8n-workflow.json
|
||||||
|
- n8n-actions.json
|
||||||
|
- n8n-update.json
|
||||||
|
- n8n-logs.json
|
||||||
|
- n8n-batch-ui.json
|
||||||
|
- n8n-status.json
|
||||||
|
- n8n-confirmation.json
|
||||||
|
- n8n-matching.json
|
||||||
|
autonomous: true
|
||||||
|
|
||||||
|
must_haves:
|
||||||
|
truths:
|
||||||
|
- "All 7 sub-workflows return structured error objects when failures occur"
|
||||||
|
- "Errors from sub-workflow failures are captured in the main workflow's ring buffer"
|
||||||
|
- "User sees friendly error messages inline (summary + cause) when actions fail"
|
||||||
|
- "Error entries include sub-workflow name, node, HTTP code, and raw response"
|
||||||
|
- "Correlation IDs propagate from main workflow through sub-workflow calls"
|
||||||
|
artifacts:
|
||||||
|
- path: "n8n-actions.json"
|
||||||
|
provides: "Structured error return on Docker API failure"
|
||||||
|
contains: "success"
|
||||||
|
- path: "n8n-update.json"
|
||||||
|
provides: "Structured error return on update failure"
|
||||||
|
contains: "success"
|
||||||
|
- path: "n8n-logs.json"
|
||||||
|
provides: "Structured error return on log retrieval failure"
|
||||||
|
contains: "success"
|
||||||
|
- path: "n8n-batch-ui.json"
|
||||||
|
provides: "Structured error return on batch UI failure"
|
||||||
|
contains: "success"
|
||||||
|
- path: "n8n-status.json"
|
||||||
|
provides: "Structured error return on status query failure"
|
||||||
|
contains: "success"
|
||||||
|
- path: "n8n-confirmation.json"
|
||||||
|
provides: "Structured error return on confirmation failure"
|
||||||
|
contains: "success"
|
||||||
|
- path: "n8n-matching.json"
|
||||||
|
provides: "Structured error return on matching failure"
|
||||||
|
contains: "success"
|
||||||
|
- path: "n8n-workflow.json"
|
||||||
|
provides: "Error capture after each Execute Workflow node, correlation ID generation"
|
||||||
|
contains: "Log Error"
|
||||||
|
key_links:
|
||||||
|
- from: "Execute Container Action (main)"
|
||||||
|
to: "Log Error"
|
||||||
|
via: "IF check on success field, false path to Log Error"
|
||||||
|
pattern: "success.*false"
|
||||||
|
- from: "sub-workflow error return"
|
||||||
|
to: "main workflow ring buffer"
|
||||||
|
via: "structured return value with success:false and error object"
|
||||||
|
pattern: "success.*false.*error"
|
||||||
|
---
|
||||||
|
|
||||||
|
<objective>
|
||||||
|
Wire error propagation from all 7 sub-workflows to the main workflow's centralized error ring buffer.
|
||||||
|
|
||||||
|
Purpose: Make every sub-workflow failure automatically captured with full diagnostic context (workflow name, node, HTTP code, raw response, sub-workflow I/O boundaries) in the ring buffer. Users see friendly error messages; Claude gets queryable diagnostic data via /errors command.
|
||||||
|
|
||||||
|
Output: All 8 workflow JSON files modified. Sub-workflows return structured errors. Main workflow captures errors via Log Error node from Plan 01.
|
||||||
|
</objective>
|
||||||
|
|
||||||
|
<execution_context>
|
||||||
|
@/home/luc/.claude/get-shit-done/workflows/execute-plan.md
|
||||||
|
@/home/luc/.claude/get-shit-done/templates/summary.md
|
||||||
|
</execution_context>
|
||||||
|
|
||||||
|
<context>
|
||||||
|
@.planning/STATE.md
|
||||||
|
@.planning/ROADMAP.md
|
||||||
|
@n8n-workflow.json
|
||||||
|
@n8n-actions.json
|
||||||
|
@n8n-update.json
|
||||||
|
@n8n-logs.json
|
||||||
|
@n8n-batch-ui.json
|
||||||
|
@n8n-status.json
|
||||||
|
@n8n-confirmation.json
|
||||||
|
@n8n-matching.json
|
||||||
|
@DEPLOY-SUBWORKFLOWS.md
|
||||||
|
@.planning/phases/10.2-better-logging-and-log-management/10.2-CONTEXT.md
|
||||||
|
@.planning/phases/10.2-better-logging-and-log-management/10.2-RESEARCH.md
|
||||||
|
@.planning/phases/10.2-better-logging-and-log-management/10.2-01-SUMMARY.md
|
||||||
|
</context>
|
||||||
|
|
||||||
|
<tasks>
|
||||||
|
|
||||||
|
<task type="auto">
|
||||||
|
<name>Task 1: Add structured error returns to all 7 sub-workflows</name>
|
||||||
|
<files>n8n-actions.json, n8n-update.json, n8n-logs.json, n8n-batch-ui.json, n8n-status.json, n8n-confirmation.json, n8n-matching.json</files>
|
||||||
|
<action>
|
||||||
|
For each of the 7 sub-workflows, audit the existing error handling paths and ensure they return a standardized error object. The goal is NOT to change how errors are currently handled (many sub-workflows already have error paths), but to AUGMENT the return data with a consistent structure that the main workflow can detect and log.
|
||||||
|
|
||||||
|
**Standard error return format** (add to existing error paths):
|
||||||
|
```javascript
|
||||||
|
{
|
||||||
|
success: false,
|
||||||
|
action: "<existing-action-value>", // Preserve existing action field for routing
|
||||||
|
error: {
|
||||||
|
workflow: "<sub-workflow-name>",
|
||||||
|
node: "<node-that-failed>",
|
||||||
|
message: "<human-readable-error>",
|
||||||
|
httpCode: <http-status-or-null>,
|
||||||
|
rawResponse: "<truncated-raw-response>"
|
||||||
|
},
|
||||||
|
// ... preserve all existing return fields for backward compatibility
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Critical rule: PRESERVE BACKWARD COMPATIBILITY.** Existing return fields (action, text, chatId, messageId, keyboard, etc.) MUST remain unchanged. The `success` and `error` fields are ADDITIONS to the existing return objects.
|
||||||
|
|
||||||
|
**Per sub-workflow audit and modifications:**
|
||||||
|
|
||||||
|
1. **n8n-actions.json** (11 nodes): Already has error handling in Format Action Result nodes. Add `success: true` to success paths and `success: false` + `error` object to failure paths in the Format Action Result Code node. The existing `statusCode` checks (304, 404, 500+) should populate `error.httpCode`.
|
||||||
|
|
||||||
|
2. **n8n-update.json** (34 nodes): Has multiple error paths (pull error, create error, start error). Each error Code node (Format Pull Error, etc.) already returns `success: false`. Ensure each also includes an `error` object with `{ workflow: 'n8n-update', node, message, httpCode, rawResponse }`.
|
||||||
|
|
||||||
|
3. **n8n-logs.json** (9 nodes): Has error handling for container not found and log retrieval failures. Add `success` field to all return paths and `error` object to failure paths.
|
||||||
|
|
||||||
|
4. **n8n-batch-ui.json** (16 nodes): Has error handling for invalid state. Add `success` field to all return paths.
|
||||||
|
|
||||||
|
5. **n8n-status.json** (11 nodes): Has error handling for Docker query failures. Add `success` field to return paths.
|
||||||
|
|
||||||
|
6. **n8n-confirmation.json** (16 nodes): Has error paths for expired/invalid tokens. Add `success` field to return paths. Note: expired/cancel are NOT errors -- they are expected flows. Only add `success: false` for actual failures (Docker API errors in the stop execution path).
|
||||||
|
|
||||||
|
7. **n8n-matching.json** (23 nodes): Has `no_match` and `error` action returns. `no_match` is NOT an error. Only the `error` action path should include `success: false` with error details.
|
||||||
|
|
||||||
|
**Also add `correlationId` pass-through:** Each sub-workflow already accepts input parameters. Add `correlationId` to the "When executed by another workflow" trigger node's expected fields (it will be passed but ignored if not present -- n8n handles extra fields gracefully). In error return paths, include `correlationId: $('When executed by another workflow').item.json.correlationId || ''` so the main workflow can correlate errors.
|
||||||
|
|
||||||
|
**Implementation approach:** Read each sub-workflow JSON, identify Code nodes on error paths, modify their jsCode to include the standardized fields. Do NOT add new nodes to sub-workflows -- modify existing Code node outputs.
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
- For each sub-workflow JSON, parse and verify:
|
||||||
|
1. At least one Code node contains `success: false` and `error:` in its jsCode
|
||||||
|
2. At least one Code node contains `success: true` in its jsCode
|
||||||
|
3. Error objects include `workflow:` field matching the sub-workflow name
|
||||||
|
4. `correlationId` appears in error return paths
|
||||||
|
- Spot-check n8n-actions.json and n8n-update.json Code nodes in detail since they have the most complex error paths
|
||||||
|
</verify>
|
||||||
|
<done>
|
||||||
|
All 7 sub-workflows return `success: true/false` on all paths. Failure paths include standardized `error` object with workflow name, node, message, httpCode, and rawResponse. Existing return fields preserved for backward compatibility. correlationId passed through on error returns.
|
||||||
|
</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
<task type="auto">
|
||||||
|
<name>Task 2: Add correlation ID generation and error capture to main workflow Execute Workflow paths</name>
|
||||||
|
<files>n8n-workflow.json</files>
|
||||||
|
<action>
|
||||||
|
This task wires the error capture infrastructure in the main workflow.
|
||||||
|
|
||||||
|
**Part A: Correlation ID generation**
|
||||||
|
|
||||||
|
Add a new Code node: **Generate Correlation ID** (id: `code-generate-correlation-id`). Place it between the IF Authenticated node's true output and the Keyword Router, so every authenticated request gets a correlation ID.
|
||||||
|
|
||||||
|
Implementation:
|
||||||
|
```javascript
|
||||||
|
const correlationId = `${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
|
||||||
|
return {
|
||||||
|
json: {
|
||||||
|
...$input.item.json,
|
||||||
|
correlationId
|
||||||
|
}
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
Note: Use timestamp + random string instead of UUID (avoids `require('uuid')` dependency issues in n8n Code nodes). This generates sufficiently unique IDs for a single-user bot.
|
||||||
|
|
||||||
|
Wire: IF Authenticated (true) -> Generate Correlation ID -> Keyword Router. This requires updating the connection from IF Authenticated to Keyword Router.
|
||||||
|
|
||||||
|
Similarly, add correlation ID generation for the callback path. Add a new Code node: **Generate Callback Correlation ID** (id: `code-generate-callback-correlation-id`) between IF Callback Authenticated (true) and Parse Callback Data. Same implementation.
|
||||||
|
|
||||||
|
**Part B: Pass correlation ID to sub-workflow calls**
|
||||||
|
|
||||||
|
For each Execute Workflow node in the main workflow (there are ~17 of them per DEPLOY-SUBWORKFLOWS.md), ensure the `correlationId` field is passed as an input parameter. Since most Prepare Input Code nodes already construct the input object, add `correlationId: $('Generate Correlation ID').item.json.correlationId` (or `$('Generate Callback Correlation ID').item.json.correlationId` for callback-path nodes) to each Prepare Input node's return object.
|
||||||
|
|
||||||
|
**Important data chain note:** Use `$input.item.json.correlationId` pattern for nodes with multiple predecessors (per 10.1-09 decision). For nodes with a single predecessor chain back to the correlation ID generation, reference the specific node.
|
||||||
|
|
||||||
|
**Part C: Error capture after Execute Workflow nodes**
|
||||||
|
|
||||||
|
For the highest-value Execute Workflow nodes (the ones most likely to fail), add error detection that routes to the Log Error node from Plan 01. The pattern:
|
||||||
|
|
||||||
|
After each Execute Workflow node, the existing result-handling Code node or IF node should check the `success` field. If `success === false`, route a branch to Log Error.
|
||||||
|
|
||||||
|
**Priority targets** (modify these first -- they handle Docker API calls that actually fail):
|
||||||
|
1. After Execute Container Action (single text command path)
|
||||||
|
2. After Execute Inline Action (callback action path)
|
||||||
|
3. After Execute Text Update (text update path)
|
||||||
|
4. After Execute Callback Update (callback update path)
|
||||||
|
5. After Execute Text Logs (text logs path)
|
||||||
|
6. After Execute Inline Logs (callback logs path)
|
||||||
|
|
||||||
|
For each target:
|
||||||
|
- If there's already a result-handling Code node after the Execute Workflow node, modify it to check `success === false` and, on the false path, route to Log Error with appropriate fields
|
||||||
|
- If the existing flow doesn't have branching, add an IF node (Check {X} Success) after the Execute Workflow node that checks `{{ $json.success }}` equals `false`. The false path goes to Log Error, the true path continues the existing flow.
|
||||||
|
- Log Error receives: `{ correlationId, workflow, node, operation, userMessage, errorMessage, httpCode, rawResponse, contextData, chatId, text }`
|
||||||
|
- After Log Error, the flow continues to the existing Telegram error response (Log Error passes through data)
|
||||||
|
|
||||||
|
**Minimize new nodes:** Where possible, modify existing result-handling Code nodes to include the error check rather than adding new IF nodes. Only add IF nodes where the existing flow has no branching capability.
|
||||||
|
|
||||||
|
Estimated new nodes: 2 (correlation ID generators) + 0-4 (IF nodes for error detection, depending on existing flow structure). Target: +2 to +6 new nodes.
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
- Parse `n8n-workflow.json` to verify:
|
||||||
|
1. Generate Correlation ID node exists and is wired between IF Authenticated and Keyword Router
|
||||||
|
2. Generate Callback Correlation ID node exists and is wired in callback path
|
||||||
|
3. At least 4 Prepare Input nodes include `correlationId` in their return objects
|
||||||
|
4. Log Error node (from Plan 01) has at least 2 incoming connections
|
||||||
|
5. Node count is within expected range (174-178)
|
||||||
|
- Verify connection integrity: no broken paths, all existing flows still connected
|
||||||
|
</verify>
|
||||||
|
<done>
|
||||||
|
Every authenticated request gets a correlation ID. Correlation IDs propagate to sub-workflow calls. At least 6 Execute Workflow result paths check for `success === false` and route to Log Error. Error entries appear in ring buffer with correlation IDs, sub-workflow names, and diagnostic data.
|
||||||
|
</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
</tasks>
|
||||||
|
|
||||||
|
<verification>
|
||||||
|
1. All 7 sub-workflow JSON files contain `success: true/false` returns
|
||||||
|
2. Main workflow has correlation ID generation on both text and callback paths
|
||||||
|
3. Log Error node has incoming connections from error detection paths
|
||||||
|
4. No broken connections in any workflow file (validate JSON structure)
|
||||||
|
5. Existing functionality preserved (action routing, Telegram responses unchanged)
|
||||||
|
6. `python3` JSON parse of all 8 files succeeds without errors
|
||||||
|
</verification>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
- Sub-workflow errors automatically captured in ring buffer with full diagnostic context
|
||||||
|
- /errors command shows real errors from Docker API failures
|
||||||
|
- Correlation IDs trace a single user request across main + sub-workflow boundaries
|
||||||
|
- No regression to existing bot functionality (all action/update/status/logs flows work)
|
||||||
|
</success_criteria>
|
||||||
|
|
||||||
|
<output>
|
||||||
|
After completion, create `.planning/phases/10.2-better-logging-and-log-management/10.2-02-SUMMARY.md`
|
||||||
|
</output>
|
||||||
@@ -0,0 +1,240 @@
|
|||||||
|
---
|
||||||
|
phase: 10.2-better-logging-and-log-management
|
||||||
|
plan: 03
|
||||||
|
type: execute
|
||||||
|
wave: 3
|
||||||
|
depends_on:
|
||||||
|
- "10.2-02"
|
||||||
|
files_modified:
|
||||||
|
- n8n-workflow.json
|
||||||
|
autonomous: false
|
||||||
|
|
||||||
|
must_haves:
|
||||||
|
truths:
|
||||||
|
- "Debug mode captures sub-workflow I/O boundary data when enabled"
|
||||||
|
- "Debug mode captures callback routing decisions (which switch path taken)"
|
||||||
|
- "Debug mode auto-disables after 100 executions"
|
||||||
|
- "/trace command returns boundary data for a specific correlation ID"
|
||||||
|
- "All modified workflows deploy to n8n and pass basic functional test"
|
||||||
|
artifacts:
|
||||||
|
- path: "n8n-workflow.json"
|
||||||
|
provides: "Debug trace capture at sub-workflow boundaries and callback routing"
|
||||||
|
contains: "Log Trace"
|
||||||
|
key_links:
|
||||||
|
- from: "Prepare * Input nodes"
|
||||||
|
to: "Log Trace"
|
||||||
|
via: "debug mode check before sub-workflow call"
|
||||||
|
pattern: "debug.*enabled"
|
||||||
|
- from: "Route Callback"
|
||||||
|
to: "Log Trace"
|
||||||
|
via: "callback routing trace capture"
|
||||||
|
pattern: "callback-routing"
|
||||||
|
---
|
||||||
|
|
||||||
|
<objective>
|
||||||
|
Add debug mode tracing at sub-workflow boundaries and callback routing decision points, then verify full deployment.
|
||||||
|
|
||||||
|
Purpose: Address the three specific pain points from CONTEXT.md: (1) sub-workflow data loss -- capture what data was sent/received at boundaries, (2) callback routing confusion -- trace which path callbacks take, (3) n8n API execution log parsing -- the ring buffer + /trace command makes execution data queryable without manual investigation. Final deployment checkpoint ensures everything works end-to-end.
|
||||||
|
|
||||||
|
Output: Modified `n8n-workflow.json` with debug traces wired, then deployed and verified.
|
||||||
|
</objective>
|
||||||
|
|
||||||
|
<execution_context>
|
||||||
|
@/home/luc/.claude/get-shit-done/workflows/execute-plan.md
|
||||||
|
@/home/luc/.claude/get-shit-done/templates/summary.md
|
||||||
|
</execution_context>
|
||||||
|
|
||||||
|
<context>
|
||||||
|
@.planning/STATE.md
|
||||||
|
@.planning/ROADMAP.md
|
||||||
|
@n8n-workflow.json
|
||||||
|
@DEPLOY-SUBWORKFLOWS.md
|
||||||
|
@.planning/phases/10.2-better-logging-and-log-management/10.2-CONTEXT.md
|
||||||
|
@.planning/phases/10.2-better-logging-and-log-management/10.2-RESEARCH.md
|
||||||
|
@.planning/phases/10.2-better-logging-and-log-management/10.2-01-SUMMARY.md
|
||||||
|
@.planning/phases/10.2-better-logging-and-log-management/10.2-02-SUMMARY.md
|
||||||
|
</context>
|
||||||
|
|
||||||
|
<tasks>
|
||||||
|
|
||||||
|
<task type="auto">
|
||||||
|
<name>Task 1: Wire debug trace capture at sub-workflow boundaries and callback routing</name>
|
||||||
|
<files>n8n-workflow.json</files>
|
||||||
|
<action>
|
||||||
|
Add debug trace capture points to the main workflow. Traces are stored via the Log Trace Code node (created in Plan 01) and only activate when `staticData.errorLog.debug.enabled === true`.
|
||||||
|
|
||||||
|
**Part A: Sub-workflow boundary tracing**
|
||||||
|
|
||||||
|
For the 6 highest-traffic Execute Workflow nodes (same targets as Plan 02 error capture), add trace capture AFTER the Execute Workflow node returns. This captures what was sent to and received from the sub-workflow.
|
||||||
|
|
||||||
|
Implementation approach: Modify the existing result-handling Code nodes (the ones that already process sub-workflow output) to add a trace block at the beginning of their code:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
// Debug trace: capture sub-workflow boundary
|
||||||
|
const staticData = $getWorkflowStaticData('global');
|
||||||
|
if (staticData.errorLog?.debug?.enabled) {
|
||||||
|
const MAX_TRACES = 50;
|
||||||
|
if (!staticData.errorLog.traces) {
|
||||||
|
staticData.errorLog.traces = { buffer: [], nextId: 1 };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Auto-disable after 100 executions
|
||||||
|
staticData.errorLog.debug.executionCount = (staticData.errorLog.debug.executionCount || 0) + 1;
|
||||||
|
if (staticData.errorLog.debug.executionCount > 100) {
|
||||||
|
staticData.errorLog.debug.enabled = false;
|
||||||
|
} else {
|
||||||
|
const traceEntry = {
|
||||||
|
id: `trace_${String(staticData.errorLog.traces.nextId).padStart(3, '0')}`,
|
||||||
|
correlationId: $input.item.json.correlationId || $execution.id,
|
||||||
|
timestamp: new Date().toISOString(),
|
||||||
|
executionId: $execution.id,
|
||||||
|
event: 'sub-workflow-call',
|
||||||
|
workflow: '<sub-workflow-name>',
|
||||||
|
node: '<execute-workflow-node-name>',
|
||||||
|
data: {
|
||||||
|
output: {
|
||||||
|
success: $input.item.json.success,
|
||||||
|
action: $input.item.json.action,
|
||||||
|
// Include key fields but NOT full payload to keep size bounded
|
||||||
|
hasError: !!$input.item.json.error
|
||||||
|
}
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
staticData.errorLog.traces.buffer.push(traceEntry);
|
||||||
|
if (staticData.errorLog.traces.buffer.length > MAX_TRACES) {
|
||||||
|
staticData.errorLog.traces.buffer.shift();
|
||||||
|
}
|
||||||
|
staticData.errorLog.traces.nextId++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ... rest of existing result-handling code unchanged
|
||||||
|
```
|
||||||
|
|
||||||
|
**Target result-handling nodes to modify (add trace block at top of existing jsCode):**
|
||||||
|
1. After Execute Container Action -> the result-handling node (Format Immediate Result or similar)
|
||||||
|
2. After Execute Inline Action -> its result handler
|
||||||
|
3. After Execute Text Update -> its result handler
|
||||||
|
4. After Execute Callback Update -> its result handler
|
||||||
|
5. After Execute Text Logs -> Handle Text Logs Result
|
||||||
|
6. After Execute Container Status -> its result handler (Route Status Result or similar)
|
||||||
|
|
||||||
|
For each, customize the `workflow` and `node` strings. Keep the trace data minimal: `success`, `action`, `hasError` fields only. Do NOT capture full input/output payloads (they would fill the ring buffer too quickly). Claude can get full payloads from the n8n API if needed.
|
||||||
|
|
||||||
|
**Part B: Callback routing trace**
|
||||||
|
|
||||||
|
Modify the Parse Callback Data Code node (which runs before Route Callback switch) to add a trace entry when debug mode is enabled:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
// At the top of existing Parse Callback Data code:
|
||||||
|
const staticData = $getWorkflowStaticData('global');
|
||||||
|
if (staticData.errorLog?.debug?.enabled) {
|
||||||
|
const MAX_TRACES = 50;
|
||||||
|
if (!staticData.errorLog.traces) {
|
||||||
|
staticData.errorLog.traces = { buffer: [], nextId: 1 };
|
||||||
|
}
|
||||||
|
|
||||||
|
const traceEntry = {
|
||||||
|
id: `trace_${String(staticData.errorLog.traces.nextId).padStart(3, '0')}`,
|
||||||
|
correlationId: $input.item.json.correlationId || $execution.id,
|
||||||
|
timestamp: new Date().toISOString(),
|
||||||
|
executionId: $execution.id,
|
||||||
|
event: 'callback-routing',
|
||||||
|
node: 'Parse Callback Data',
|
||||||
|
data: {
|
||||||
|
callbackData: $json.callback_query?.data || 'unknown',
|
||||||
|
// The route taken will be determined by Route Callback switch node
|
||||||
|
// We capture the callback data so Claude can trace which path it took
|
||||||
|
parsedPrefix: ($json.callback_query?.data || '').split(':')[0]
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
staticData.errorLog.traces.buffer.push(traceEntry);
|
||||||
|
if (staticData.errorLog.traces.buffer.length > MAX_TRACES) {
|
||||||
|
staticData.errorLog.traces.buffer.shift();
|
||||||
|
}
|
||||||
|
staticData.errorLog.traces.nextId++;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ... rest of existing Parse Callback Data code unchanged
|
||||||
|
```
|
||||||
|
|
||||||
|
**Part C: Remove standalone Log Trace node if unused**
|
||||||
|
|
||||||
|
If the standalone Log Trace node from Plan 01 is not needed (because all tracing is done inline in existing Code nodes), remove it to avoid unnecessary node count increase. The decision depends on whether inline tracing (modifying existing nodes) or dedicated node (routing through Log Trace) was cleaner -- make the judgment during implementation.
|
||||||
|
|
||||||
|
**No new nodes needed** for this task -- all tracing is added inline to existing Code nodes. Node count should stay the same as after Plan 02 (or decrease by 1 if Log Trace standalone node is removed).
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
- Parse `n8n-workflow.json` to verify:
|
||||||
|
1. At least 4 result-handling Code nodes contain `staticData.errorLog?.debug?.enabled` check
|
||||||
|
2. Parse Callback Data Code node contains callback-routing trace
|
||||||
|
3. Trace entries include `event: 'sub-workflow-call'` and `event: 'callback-routing'`
|
||||||
|
4. Auto-disable check (`executionCount > 100`) exists in trace code
|
||||||
|
5. Node count has not increased from Plan 02 (tracing is inline, not new nodes)
|
||||||
|
- Verify JSON structure of all modified Code nodes is valid JavaScript
|
||||||
|
</verify>
|
||||||
|
<done>
|
||||||
|
Debug mode traces capture sub-workflow boundary data (success/action/hasError) at 6 Execute Workflow return points and callback routing data at Parse Callback Data. Traces auto-disable after 100 executions. /trace command can query traces by correlation ID.
|
||||||
|
</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
<task type="checkpoint:human-verify" gate="blocking">
|
||||||
|
<name>Task 2: Deploy all modified workflows and verify end-to-end</name>
|
||||||
|
<files>n8n-workflow.json, n8n-actions.json, n8n-update.json, n8n-logs.json, n8n-batch-ui.json, n8n-status.json, n8n-confirmation.json, n8n-matching.json</files>
|
||||||
|
<action>
|
||||||
|
Deploy all modified workflow files to n8n and run functional verification.
|
||||||
|
|
||||||
|
1. Import all 8 workflow JSON files to n8n (main + 7 sub-workflows)
|
||||||
|
2. Activate the main workflow
|
||||||
|
3. Run the verification tests described below
|
||||||
|
</action>
|
||||||
|
<verify>
|
||||||
|
1. Test hidden commands:
|
||||||
|
- Send `/debug status` in Telegram -> should show "Debug mode: OFF"
|
||||||
|
- Send `/errors` in Telegram -> should show "No errors recorded."
|
||||||
|
- Send `/debug on` -> should show "Debug mode enabled..."
|
||||||
|
- Send `/debug status` -> should show "Debug mode: ON"
|
||||||
|
|
||||||
|
2. Test normal functionality (verify no regression):
|
||||||
|
- Send `/status` -> should show container list (existing behavior)
|
||||||
|
- Tap a container -> should show status (existing behavior)
|
||||||
|
- Send `/stop nonexistent-container` -> should show error AND appear in /errors
|
||||||
|
- Send `/errors` -> should show the error entry
|
||||||
|
|
||||||
|
3. Test debug traces:
|
||||||
|
- With debug mode ON, perform any container action
|
||||||
|
- Send `/errors` -> should show debug status and trace count
|
||||||
|
- (Optional) test /trace with a correlation ID from the error output
|
||||||
|
|
||||||
|
4. Test cleanup:
|
||||||
|
- Send `/debug off` -> should disable debug mode
|
||||||
|
- Send `/clear-errors` -> should clear error buffer
|
||||||
|
</verify>
|
||||||
|
<done>
|
||||||
|
All workflows deployed. Hidden commands respond correctly. Errors captured in ring buffer. Debug traces capture boundary data. No regression to existing bot functionality. User confirms all tests pass.
|
||||||
|
</done>
|
||||||
|
</task>
|
||||||
|
|
||||||
|
</tasks>
|
||||||
|
|
||||||
|
<verification>
|
||||||
|
1. All debug traces capture sub-workflow boundary data when debug mode enabled
|
||||||
|
2. Callback routing traces capture callback data prefix for path diagnosis
|
||||||
|
3. Auto-disable works after 100 executions
|
||||||
|
4. Full deployment to n8n succeeds
|
||||||
|
5. No regression to existing bot commands
|
||||||
|
6. /errors shows real error data from failed operations
|
||||||
|
7. /trace returns entries for a specific correlation ID
|
||||||
|
</verification>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
- Debug mode captures the three specific pain points: sub-workflow data loss, callback routing confusion, and provides queryable execution data
|
||||||
|
- All workflows deploy and the bot functions correctly
|
||||||
|
- User confirms functional test passes
|
||||||
|
</success_criteria>
|
||||||
|
|
||||||
|
<output>
|
||||||
|
After completion, create `.planning/phases/10.2-better-logging-and-log-management/10.2-03-SUMMARY.md`
|
||||||
|
</output>
|
||||||
Reference in New Issue
Block a user