diff --git a/.planning/phases/10.2-better-logging-and-log-management/10.2-CONTEXT.md b/.planning/phases/10.2-better-logging-and-log-management/10.2-CONTEXT.md new file mode 100644 index 0000000..d115514 --- /dev/null +++ b/.planning/phases/10.2-better-logging-and-log-management/10.2-CONTEXT.md @@ -0,0 +1,75 @@ +# Phase 10.2: Better Logging & Log Management - Context + +**Gathered:** 2026-02-08 +**Status:** Ready for planning + + +## Phase Boundary + +Improve operational visibility into the bot's own execution. Add centralized error capture, execution tracing, and debugging infrastructure so that issues (sub-workflow data loss, callback routing confusion, Docker API failures) can be diagnosed programmatically rather than through manual investigation of n8n execution logs. + +This is NOT about container log viewing (the /logs command) — it's about the bot's internal execution logging. + + + + +## Implementation Decisions + +### Error capture & reporting +- Errors display inline to the user as summary + cause (e.g., "Failed to stop nginx: Docker API returned 404 (container not found)") +- Full diagnostic data (sub-workflow name, node, raw response, stack trace) captured in central error store for Claude's use +- Only report errors on user-triggered actions — no proactive/unsolicited error notifications +- Error store uses ring buffer: last 50 errors, auto-rotated +- Manual clear command also available (/clear-errors or similar, hidden/unlisted) + +### Execution traceability +- All sub-workflows report errors back to main workflow for centralized storage +- Trace data designed for programmatic access — Claude can query it during debugging sessions +- Hidden/unlisted Telegram commands for quick error checks (e.g., /errors to see recent errors) +- File-based access also available for deep investigation during debugging sessions + +### Log output & storage +- Error/trace data stored in n8n workflow static data (main workflow) +- Centralized in main workflow — sub-workflows report back, main stores +- Auto-rotate (ring buffer, 50 entries) + manual clear command +- Both Telegram commands (quick checks) and file/API access (deep investigation) + +### Debug mode +- Debug mode is for Claude's use during debugging — not user-facing +- Must address three specific pain points: + 1. **Sub-workflow data loss** — capture what data was sent to and received from each sub-workflow at boundaries + 2. **Callback routing confusion** — trace which path a callback took through routing logic + 3. **n8n API execution log parsing** — make execution data easily queryable without manual workflow investigation + +### Claude's Discretion +- Trace format and structure (timeline vs. data snapshots vs. both) +- Whether to trace all executions or only errors (overhead vs. usefulness) +- Structured entries vs. simple log lines (what enables best debugging) +- Debug toggle mechanism (global toggle, per-request, or always-on for errors) +- Log level granularity (on/off vs. error/warn/info) +- What specific debug data to capture (raw API responses, sub-workflow I/O, timing) +- Telegram command naming and exact interface + + + + +## Specific Ideas + +- "I want you to be more easily able to track down issues when they occur" — the driving goal is Claude's ability to programmatically diagnose issues +- Past pain points: sub-workflow boundary data disappearing, callback routing taking unexpected paths, difficulty parsing n8n execution API responses +- "These logs would resolve these issues" — the logging infrastructure should make the three pain points immediately queryable +- Error commands should be hidden/unlisted (developer/debug tools, not part of normal command set) + + + + +## Deferred Ideas + +None — discussion stayed within phase scope + + + +--- + +*Phase: 10.2-better-logging-and-log-management* +*Context gathered: 2026-02-08*