unraid-docker-manager/.planning/phases/10.2-better-logging-and-log-management/10.2-03-PLAN.md at main

Files

T

Lucas Berger c79a3fbf87 docs(10.2): plan phase — error ring buffer, sub-workflow error propagation, debug tracing

3 plans in 3 waves:
- Wave 1: Ring buffer foundation + hidden debug commands (/errors, /debug, /trace, /clear-errors)
- Wave 2: Structured error returns in all 7 sub-workflows + correlation ID tracking
- Wave 3: Debug mode tracing at sub-workflow boundaries + deployment verification

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-08 18:56:44 -05:00

10 KiB

Raw Permalink Blame History

phase, plan, type, wave, depends_on, files_modified, autonomous, must_haves

phase

plan

type

wave

depends_on

files_modified

autonomous

must_haves

10.2-better-logging-and-log-management

execute

10.2-02

n8n-workflow.json

false

truths

artifacts

key_links

Debug mode captures sub-workflow I/O boundary data when enabled

Debug mode captures callback routing decisions (which switch path taken)

Debug mode auto-disables after 100 executions

/trace command returns boundary data for a specific correlation ID

All modified workflows deploy to n8n and pass basic functional test

path	provides	contains
n8n-workflow.json	Debug trace capture at sub-workflow boundaries and callback routing	Log Trace

from	to	via	pattern
Result-handling Code nodes (after Execute Workflow)	inline trace capture	debug mode check after sub-workflow returns	debug.*enabled

from	to	via	pattern
Parse Callback Data	inline trace capture	callback routing trace capture	callback-routing

Add debug mode tracing at sub-workflow boundaries and callback routing decision points, then verify full deployment.

Purpose: Address the three specific pain points from CONTEXT.md: (1) sub-workflow data loss -- capture what data was sent/received at boundaries, (2) callback routing confusion -- trace which path callbacks take, (3) n8n API execution log parsing -- the ring buffer + /trace command makes execution data queryable without manual investigation. Final deployment checkpoint ensures everything works end-to-end.

Output: Modified n8n-workflow.json with debug traces wired, then deployed and verified.

<execution_context> @/home/luc/.claude/get-shit-done/workflows/execute-plan.md @/home/luc/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/STATE.md @.planning/ROADMAP.md @n8n-workflow.json @DEPLOY-SUBWORKFLOWS.md @.planning/phases/10.2-better-logging-and-log-management/10.2-CONTEXT.md @.planning/phases/10.2-better-logging-and-log-management/10.2-RESEARCH.md @.planning/phases/10.2-better-logging-and-log-management/10.2-01-SUMMARY.md @.planning/phases/10.2-better-logging-and-log-management/10.2-02-SUMMARY.md Task 1: Wire debug trace capture at sub-workflow boundaries and callback routing n8n-workflow.json Add debug trace capture points to the main workflow. Traces are stored via the Log Trace Code node (created in Plan 01) and only activate when `staticData.errorLog.debug.enabled === true`.

Part A: Sub-workflow boundary tracing

For the 6 highest-traffic Execute Workflow nodes (same targets as Plan 02 error capture), add trace capture AFTER the Execute Workflow node returns. This captures what was sent to and received from the sub-workflow.

Implementation approach: Modify the existing result-handling Code nodes (the ones that already process sub-workflow output) to add a trace block at the beginning of their code:

// Debug trace: capture sub-workflow boundary
const staticData = $getWorkflowStaticData('global');
if (staticData.errorLog?.debug?.enabled) {
  const MAX_TRACES = 50;
  if (!staticData.errorLog.traces) {
    staticData.errorLog.traces = { buffer: [], nextId: 1 };
  }

  // Auto-disable after 100 executions
  staticData.errorLog.debug.executionCount = (staticData.errorLog.debug.executionCount || 0) + 1;
  if (staticData.errorLog.debug.executionCount > 100) {
    staticData.errorLog.debug.enabled = false;
  } else {
    const traceEntry = {
      id: `trace_${String(staticData.errorLog.traces.nextId).padStart(3, '0')}`,
      correlationId: $input.item.json.correlationId || $execution.id,
      timestamp: new Date().toISOString(),
      executionId: $execution.id,
      event: 'sub-workflow-call',
      workflow: '<sub-workflow-name>',
      node: '<execute-workflow-node-name>',
      data: {
        output: {
          success: $input.item.json.success,
          action: $input.item.json.action,
          // Include key fields but NOT full payload to keep size bounded
          hasError: !!$input.item.json.error
        }
      }
    };

    staticData.errorLog.traces.buffer.push(traceEntry);
    if (staticData.errorLog.traces.buffer.length > MAX_TRACES) {
      staticData.errorLog.traces.buffer.shift();
    }
    staticData.errorLog.traces.nextId++;
  }
}

// ... rest of existing result-handling code unchanged

Target result-handling nodes to modify (add trace block at top of existing jsCode):

After Execute Container Action -> the result-handling node (Format Immediate Result or similar)
After Execute Inline Action -> its result handler
After Execute Text Update -> its result handler
After Execute Callback Update -> its result handler
After Execute Text Logs -> Handle Text Logs Result
After Execute Container Status -> its result handler (Route Status Result or similar)

For each, customize the workflow and node strings. Keep the trace data minimal: success, action, hasError fields only. Do NOT capture full input/output payloads (they would fill the ring buffer too quickly). Claude can get full payloads from the n8n API if needed.

Part B: Callback routing trace

Modify the Parse Callback Data Code node (which runs before Route Callback switch) to add a trace entry when debug mode is enabled:

// At the top of existing Parse Callback Data code:
const staticData = $getWorkflowStaticData('global');
if (staticData.errorLog?.debug?.enabled) {
  const MAX_TRACES = 50;
  if (!staticData.errorLog.traces) {
    staticData.errorLog.traces = { buffer: [], nextId: 1 };
  }

  const traceEntry = {
    id: `trace_${String(staticData.errorLog.traces.nextId).padStart(3, '0')}`,
    correlationId: $input.item.json.correlationId || $execution.id,
    timestamp: new Date().toISOString(),
    executionId: $execution.id,
    event: 'callback-routing',
    node: 'Parse Callback Data',
    data: {
      callbackData: $json.callback_query?.data || 'unknown',
      // The route taken will be determined by Route Callback switch node
      // We capture the callback data so Claude can trace which path it took
      parsedPrefix: ($json.callback_query?.data || '').split(':')[0]
    }
  };

  staticData.errorLog.traces.buffer.push(traceEntry);
  if (staticData.errorLog.traces.buffer.length > MAX_TRACES) {
    staticData.errorLog.traces.buffer.shift();
  }
  staticData.errorLog.traces.nextId++;
}

// ... rest of existing Parse Callback Data code unchanged

Part C: Remove standalone Log Trace node if unused

If the standalone Log Trace node from Plan 01 is not needed (because all tracing is done inline in existing Code nodes), remove it to avoid unnecessary node count increase. The decision depends on whether inline tracing (modifying existing nodes) or dedicated node (routing through Log Trace) was cleaner -- make the judgment during implementation.

No new nodes needed for this task -- all tracing is added inline to existing Code nodes. Node count should stay the same as after Plan 02 (or decrease by 1 if Log Trace standalone node is removed). - Parse n8n-workflow.json to verify: 1. At least 4 result-handling Code nodes contain staticData.errorLog?.debug?.enabled check 2. Parse Callback Data Code node contains callback-routing trace 3. Trace entries include event: 'sub-workflow-call' and event: 'callback-routing' 4. Auto-disable check (executionCount > 100) exists in trace code 5. Node count has not increased from Plan 02 (tracing is inline, not new nodes) - Verify JSON structure of all modified Code nodes is valid JavaScript Debug mode traces capture sub-workflow boundary data (success/action/hasError) at 6 Execute Workflow return points and callback routing data at Parse Callback Data. Traces auto-disable after 100 executions. /trace command can query traces by correlation ID.

Task 2: Deploy all modified workflows and verify end-to-end n8n-workflow.json, n8n-actions.json, n8n-update.json, n8n-logs.json, n8n-batch-ui.json, n8n-status.json, n8n-confirmation.json, n8n-matching.json Deploy all modified workflow files to n8n and run functional verification.

Import all 8 workflow JSON files to n8n (main + 7 sub-workflows)
Activate the main workflow
Run the verification tests described below
1. Test hidden commands:
  - Send /debug status in Telegram -> should show "Debug mode: OFF"
  - Send /errors in Telegram -> should show "No errors recorded."
  - Send /debug on -> should show "Debug mode enabled..."
  - Send /debug status -> should show "Debug mode: ON"
2. Test normal functionality (verify no regression):
  - Send /status -> should show container list (existing behavior)
  - Tap a container -> should show status (existing behavior)
  - Send /stop nonexistent-container -> should show error AND appear in /errors
  - Send /errors -> should show the error entry
3. Test debug traces:
  - With debug mode ON, perform any container action
  - Send /errors -> should show debug status and trace count
  - (Optional) test /trace with a correlation ID from the error output
4. Test cleanup:
  - Send /debug off -> should disable debug mode
  - Send /clear-errors -> should clear error buffer All workflows deployed. Hidden commands respond correctly. Errors captured in ring buffer. Debug traces capture boundary data. No regression to existing bot functionality. User confirms all tests pass.

1. All debug traces capture sub-workflow boundary data when debug mode enabled 2. Callback routing traces capture callback data prefix for path diagnosis 3. Auto-disable works after 100 executions 4. Full deployment to n8n succeeds 5. No regression to existing bot commands 6. /errors shows real error data from failed operations 7. /trace returns entries for a specific correlation ID

<success_criteria>

Debug mode captures the three specific pain points: sub-workflow data loss, callback routing confusion, and provides queryable execution data
All workflows deploy and the bot functions correctly
User confirms functional test passes </success_criteria>

After completion, create `.planning/phases/10.2-better-logging-and-log-management/10.2-03-SUMMARY.md`

10 KiB Raw Permalink Blame History

10 KiB

Raw Permalink Blame History