unraid-docker-manager/.planning/phases/10.2-better-logging-and-log-management/10.2-01-PLAN.md

---
phase: 10.2-better-logging-and-log-management
plan: 01
type: execute
wave: 1
depends_on: []
files_modified:
  - n8n-workflow.json
autonomous: true

must_haves:
  truths:
    - "Error ring buffer stores up to 50 structured error entries in workflow static data"
    - "/errors command returns recent errors in human-readable format via Telegram"
    - "/clear-errors command resets the error ring buffer"
    - "/debug on|off|status toggles debug mode stored in static data"
    - "/trace <correlationId> returns all entries matching a correlation ID"
    - "Hidden commands are NOT listed in /start help menu"
  artifacts:
    - path: "n8n-workflow.json"
      provides: "Error ring buffer infrastructure, debug toggle, hidden command routing and responses"
      contains: "Process Debug Command"
  key_links:
    - from: "Keyword Router"
      to: "Process Debug Command"
      via: "switch output for /errors, /clear-errors, /debug, /trace keywords"
      pattern: "keyword-errors|keyword-debug|keyword-trace|keyword-clear-errors"
    - from: "Process Debug Command"
      to: "Send Debug Response"
      via: "formatted message output"
      pattern: "chatId.*text"
---

<objective>
Build the error ring buffer foundation and hidden Telegram debug commands in the main workflow.

Purpose: Establish the centralized error/trace storage infrastructure (workflow static data with ring buffer) and the hidden command interface (/errors, /clear-errors, /debug, /trace) that Claude and the user can use for quick diagnostics. This is the foundation that Plans 02 and 03 build upon.

Output: Modified `n8n-workflow.json` with ring buffer initialization, hidden command routing, and Telegram response nodes for all 4 debug commands.
</objective>

<execution_context>
@/home/luc/.claude/get-shit-done/workflows/execute-plan.md
@/home/luc/.claude/get-shit-done/templates/summary.md
</execution_context>

<context>
@.planning/STATE.md
@.planning/ROADMAP.md
@n8n-workflow.json
@DEPLOY-SUBWORKFLOWS.md
@.planning/phases/10.2-better-logging-and-log-management/10.2-CONTEXT.md
@.planning/phases/10.2-better-logging-and-log-management/10.2-RESEARCH.md
</context>

<tasks>

<task type="auto">
  <name>Task 1: Add hidden command routing to Keyword Router and create debug command processor</name>
  <files>n8n-workflow.json</files>
  <action>
Modify the Keyword Router switch node (id: `switch-keyword-router`) to add 4 new keyword outputs BEFORE the fallback output. The new outputs must use `startsWith` operator (not `contains`) to avoid false matches with regular text:

1. **keyword-errors**: matches `/errors` (startsWith, case-insensitive) -> outputKey: "errors"
2. **keyword-clear-errors**: matches `/clear` (startsWith, case-insensitive) -> outputKey: "clear-errors"
3. **keyword-debug**: matches `/debug` (startsWith, case-insensitive) -> outputKey: "debug"
4. **keyword-trace**: matches `/trace` (startsWith, case-insensitive) -> outputKey: "trace"

All 4 outputs route to a single new Code node: **Process Debug Command** (id: `code-process-debug-command`).

The Process Debug Command Code node implements ALL 4 commands in a single code block. It reads `$json.message.text` to determine which command was invoked, then:

**Static data structure** (initialize if missing):
```javascript
const staticData = $getWorkflowStaticData('global');
if (!staticData.errorLog) {
  staticData.errorLog = {
    debug: { enabled: false, executionCount: 0 },
    errors: { buffer: [], nextId: 1, count: 0, lastCleared: new Date().toISOString() },
    traces: { buffer: [], nextId: 1 }
  };
}
```

**Command handling:**

- `/errors [N]`: Read `staticData.errorLog.errors.buffer`, take last N entries (default 5, max 50), format each as:
  ```
  #{id} - {timestamp}
  Workflow: {workflow} > {node}
  {userMessage}
  HTTP: {httpCode} (if present)
  ```
  If no errors, return "No errors recorded." Include total error count and debug mode status at bottom.

- `/clear-errors`: Reset `staticData.errorLog.errors.buffer = []`, reset `nextId = 1`, update `lastCleared`. Return "Error log cleared. {count} entries removed."

- `/debug on|off|status`:
  - `on`: Set `staticData.errorLog.debug.enabled = true`, reset `executionCount = 0`. Return "Debug mode enabled. Tracing sub-workflow boundaries and callback routing."
  - `off`: Set `staticData.errorLog.debug.enabled = false`. Return "Debug mode disabled."
  - `status` (or no argument): Return debug mode state, execution count, error buffer size, trace buffer size.

- `/trace <correlationId>`: Search both `staticData.errorLog.errors.buffer` and `staticData.errorLog.traces.buffer` for entries matching the correlationId. Format results chronologically. If no matches, return "No entries found for correlation ID: {id}".

The Code node returns `{ json: { chatId, text } }` where chatId comes from `$json.message.chat.id` and text is the formatted response (use HTML parse mode with `<pre>` for structured output).

**Wire the output** of Process Debug Command to a new Telegram node: **Send Debug Response** (id: `telegram-send-debug-response`), which sends the message using `chatId` and `text` from the Code node output, with `parse_mode: HTML`. Use the standard Telegram credential (id: `I0xTTiASl7C1NZhJ`, name: "Telegram account").

**Positioning:** Place Process Debug Command at position [1120, -200] and Send Debug Response at [1340, -200] to keep them visually grouped above the existing menu path.

**Important:** Do NOT modify the Show Menu text or /start command response. These debug commands must remain hidden/unlisted.

Node count impact: +2 new nodes (Code + Telegram send).
  </action>
  <verify>
    - Parse `n8n-workflow.json` with `python3 -c "import json; ..."` to verify:
      1. Keyword Router has 4 new switch rules (errors, clear-errors, debug, trace)
      2. Process Debug Command node exists with type `n8n-nodes-base.code`
      3. Send Debug Response node exists with type `n8n-nodes-base.telegram`
      4. Connections exist: Keyword Router -> Process Debug Command -> Send Debug Response
      5. Total node count is 170 (168 + 2 new nodes)
    - Verify the Code node's jsCode contains `$getWorkflowStaticData('global')` and handles all 4 commands
  </verify>
  <done>
    Keyword Router routes /errors, /clear-errors, /debug, /trace to Process Debug Command. The code node initializes static data structure, implements all 4 commands, and outputs formatted text. Send Debug Response delivers the message via Telegram. No changes to existing Show Menu text.
  </done>
</task>

<task type="auto">
  <name>Task 2: Add error logging utility function and ring buffer write helper</name>
  <files>n8n-workflow.json</files>
  <action>
Create a new Code node: **Log Error** (id: `code-log-error`) that serves as the centralized error logging entry point. This node will be called from multiple places in the main workflow (wired in Plan 02).

The Log Error node expects input with these fields:
- `correlationId` (string, optional - falls back to execution ID)
- `workflow` (string - "main" or sub-workflow name like "n8n-actions")
- `node` (string - node that encountered the error)
- `operation` (string - what was being done, e.g., "docker.stop")
- `userMessage` (string - user-friendly error summary)
- `errorMessage` (string - technical error message)
- `errorStack` (string, optional - stack trace)
- `httpCode` (number, optional - HTTP status code)
- `rawResponse` (string, optional - raw API response, will be truncated)
- `contextData` (object, optional - additional context like containerId, subWorkflowInput/Output)
- `chatId` (number - for pass-through to downstream)
- `text` (string, optional - for pass-through to downstream)

Implementation:
```javascript
const staticData = $getWorkflowStaticData('global');
const input = $input.item.json;

// Initialize if needed
if (!staticData.errorLog) {
  staticData.errorLog = {
    debug: { enabled: false, executionCount: 0 },
    errors: { buffer: [], nextId: 1, count: 0, lastCleared: new Date().toISOString() },
    traces: { buffer: [], nextId: 1 }
  };
}

const MAX_ERRORS = 50;
const errorEntry = {
  id: `err_${String(staticData.errorLog.errors.nextId).padStart(3, '0')}`,
  correlationId: input.correlationId || $execution.id,
  timestamp: new Date().toISOString(),
  executionId: $execution.id,
  workflow: input.workflow || 'main',
  node: input.node || 'unknown',
  operation: input.operation || 'unknown',
  userMessage: input.userMessage || input.errorMessage || 'Unknown error',
  error: {
    message: input.errorMessage || 'Unknown error',
    stack: (input.errorStack || '').substring(0, 500),
    httpCode: input.httpCode || null,
    rawResponse: (input.rawResponse || '').substring(0, 1000)
  },
  context: input.contextData || {}
};

// Ring buffer: push and rotate
staticData.errorLog.errors.buffer.push(errorEntry);
if (staticData.errorLog.errors.buffer.length > MAX_ERRORS) {
  staticData.errorLog.errors.buffer.shift();
}
staticData.errorLog.errors.nextId++;
staticData.errorLog.errors.count++;

// Pass through all input data so downstream nodes still work
return { json: { ...input, _errorLogged: true, _errorId: errorEntry.id } };
```

**Positioning:** Place Log Error at position [2600, -200] (utility area, visually separate from main flow).

Also create a **Log Trace** Code node (id: `code-log-trace`) for debug-mode trace entries. This node:
- Checks `staticData.errorLog.debug.enabled` first; if false, passes data through unchanged
- If debug is enabled, increments `executionCount` and checks auto-disable at 100
- Stores trace entry in `staticData.errorLog.traces.buffer` (ring buffer, max 50)
- Trace entry fields: `id`, `correlationId`, `timestamp`, `executionId`, `event` (string: "sub-workflow-call" | "callback-routing"), `workflow`, `node`, `data` (object with input/output/duration or callbackData/routeTaken)
- Passes through all input data unchanged

**Positioning:** Place Log Trace at position [2600, -400].

These two nodes are NOT connected to anything yet -- they will be wired in Plan 02. They are standalone utility Code nodes that Plan 02 and Plan 03 will reference.

Node count impact: +2 new nodes (total now 172 including Task 1's additions).
  </action>
  <verify>
    - Parse `n8n-workflow.json` to verify:
      1. Log Error node exists (id: `code-log-error`) with type `n8n-nodes-base.code`
      2. Log Trace node exists (id: `code-log-trace`) with type `n8n-nodes-base.code`
      3. Log Error's jsCode contains `$getWorkflowStaticData`, `MAX_ERRORS = 50`, `buffer.push`, `buffer.shift`
      4. Log Trace's jsCode contains `debug.enabled` check and auto-disable at 100
      5. Total node count is 172
      6. Both nodes have no incoming or outgoing connections (standalone)
  </verify>
  <done>
    Log Error and Log Trace utility nodes exist in the main workflow with correct ring buffer logic, field truncation, and auto-disable. They are ready to be wired by Plan 02 and Plan 03.
  </done>
</task>

</tasks>

<verification>
1. `python3 -c "import json; wf=json.load(open('n8n-workflow.json')); print(len(wf['nodes']))"` returns 172
2. Keyword Router has outputs for errors, clear-errors, debug, trace
3. All 4 hidden commands handled in Process Debug Command code
4. Log Error and Log Trace utility nodes exist and are unconnected
5. No changes to Show Menu text (commands remain hidden)
6. All new nodes use correct n8n typeVersion and credential references
</verification>

<success_criteria>
- Main workflow has 172 nodes (168 + 4 new)
- Ring buffer infrastructure initialized in workflow static data
- /errors, /clear-errors, /debug, /trace commands routed and handled
- Log Error and Log Trace utility nodes ready for wiring
- No regression to existing functionality
</success_criteria>

<output>
After completion, create `.planning/phases/10.2-better-logging-and-log-management/10.2-01-SUMMARY.md`
</output>