From 4c09d619430f124012eca6459a6a24b506a63609 Mon Sep 17 00:00:00 2001 From: Lucas Berger Date: Sat, 31 Jan 2026 20:55:29 -0500 Subject: [PATCH] docs(05): research phase domain Phase 5: Polish & Deploy - Standard stack identified (n8n, Telegram Bot API, Docker) - Architecture patterns documented (Switch routing, persistent keyboards, error workflows) - Pitfalls catalogued (credential leaks, testing limitations, configuration issues) - Code examples for keyword routing, persistent menus, error handling Co-Authored-By: Claude Opus 4.5 --- .../phases/05-polish-deploy/05-RESEARCH.md | 549 ++++++++++++++++++ 1 file changed, 549 insertions(+) create mode 100644 .planning/phases/05-polish-deploy/05-RESEARCH.md diff --git a/.planning/phases/05-polish-deploy/05-RESEARCH.md b/.planning/phases/05-polish-deploy/05-RESEARCH.md new file mode 100644 index 0000000..e4581e4 --- /dev/null +++ b/.planning/phases/05-polish-deploy/05-RESEARCH.md @@ -0,0 +1,549 @@ +# Phase 5: Polish & Deploy - Research + +**Researched:** 2026-01-31 +**Domain:** Production deployment with n8n workflow polishing, Telegram bot UX, and deployment packaging +**Confidence:** HIGH + +## Summary + +Phase 5 focuses on production-ready deployment requiring four main areas: removing NLU/Claude nodes and replacing with keyword routing, implementing Telegram persistent menu buttons for discoverability, hardening error handling with minimal user-facing messages, and packaging the workflow for deployment with proper credential handling. + +The standard approach for n8n production workflows emphasizes testing in non-production environments first, using n8n's built-in credentials system for sensitive data, implementing centralized error handling with the Error Trigger node, and exporting workflow JSON to version control while ensuring credentials are never hardcoded. For Telegram bots, the persistent menu pattern uses ReplyKeyboardMarkup with is_persistent=true to keep command buttons always visible, while inline keyboards handle dynamic interactions like container selection. + +Based on user decisions from CONTEXT.md, the implementation will use n8n's Switch node for keyword matching (replacing Claude nodes entirely), ReplyKeyboardMarkup for the persistent menu with grouped commands, n8n credentials system for the Telegram user ID, and minimal error messages following the "Failed to X" pattern with infrastructure-specific messages only for Docker socket errors. + +**Primary recommendation:** Use n8n Switch node with string "contains" operators for keyword routing, set up persistent Telegram menu with ReplyKeyboardMarkup, move sensitive values to n8n credentials before exporting workflow JSON, and create root-level README with step-by-step deployment instructions. + +## Standard Stack + +The established tools for this deployment phase: + +### Core +| Library | Version | Purpose | Why Standard | +|---------|---------|---------|--------------| +| n8n | Current stable | Workflow orchestration and credential management | Already deployed on Unraid, handles webhook security | +| Telegram Bot API | 2.0+ | Persistent menu buttons and inline keyboards | Native support for is_persistent parameter added in Bot API 2.0 | +| Docker API | Host version | Container management via Unix socket | Standard on Unraid installations | + +### Supporting +| Library | Version | Purpose | When to Use | +|---------|---------|---------|-------------| +| n8n Error Trigger | Built-in | Centralized error workflow | Production error handling and monitoring | +| n8n HTTP Request node | Built-in | Telegram API calls for keyboards | When native Telegram node has limitations | +| Git | Any | Version control for workflow JSON | Workflow versioning and rollback capability | + +### Alternatives Considered +| Instead of | Could Use | Tradeoff | +|------------|-----------|----------| +| Switch node routing | IF node cascade | Switch handles multiple routes cleaner, IF requires nested structure | +| ReplyKeyboardMarkup | InlineKeyboardMarkup | Reply keyboards persist but take keyboard space, inline are per-message | +| n8n credentials | Environment variables | n8n CE blocks env var access in expressions (known limitation) | + +**Installation:** +```bash +# No additional packages needed - using built-in n8n nodes +# Workflow will be imported via n8n UI or CLI +``` + +## Architecture Patterns + +### Recommended Workflow Structure +``` +Telegram Trigger +├── Route Update Type (Switch: message vs callback_query) +│ ├── [message path] +│ │ └── Auth Check (IF) +│ │ └── Keyword Router (Switch: contains operations) +│ │ ├── status → Container Status flow +│ │ ├── start → Container Action flow +│ │ ├── stop → Container Action flow +│ │ ├── restart → Container Action flow +│ │ ├── update → Container Action flow +│ │ ├── logs → Logs flow +│ │ └── [fallback] → Show Menu +│ └── [callback_query path] +│ └── Auth Check (IF) +│ └── [existing callback handlers] +└── Error Trigger Workflow (separate) + └── Log + Notify +``` + +### Pattern 1: Keyword Routing with Switch Node +**What:** Replace NLU intent parsing with simple keyword matching using Switch node with multiple "contains" rules +**When to use:** User input routing for command-based bots where keywords are predictable +**Example:** +```json +{ + "parameters": { + "rules": { + "values": [ + { + "conditions": { + "conditions": [ + { + "leftValue": "={{ $json.message.text.toLowerCase() }}", + "rightValue": "status", + "operator": { + "type": "string", + "operation": "contains" + } + } + ] + }, + "renameOutput": true, + "outputKey": "status" + } + ] + }, + "options": { + "fallbackOutput": "extra" + } + }, + "type": "n8n-nodes-base.switch" +} +``` +**Source:** [n8n Switch node documentation](https://docs.n8n.io/integrations/builtin/core-nodes/n8n-nodes-base.switch/) + +### Pattern 2: Persistent Telegram Menu Button +**What:** Use ReplyKeyboardMarkup with is_persistent=true to display command buttons that remain visible when keyboard is hidden +**When to use:** When users need constant access to core commands without remembering keywords +**Example:** +```json +{ + "chat_id": "{{ $json.chatId }}", + "text": "Welcome! Use buttons below:", + "reply_markup": { + "keyboard": [ + [{"text": "📊 Status"}], + [{"text": "▶️ Start"}, {"text": "⏹️ Stop"}], + [{"text": "🔄 Restart"}, {"text": "⬆️ Update"}], + [{"text": "📜 Logs"}] + ], + "is_persistent": true, + "resize_keyboard": true + } +} +``` +**Source:** [Telegram Bot API - Persistent Menu](https://core.telegram.org/bots/api) + +### Pattern 3: Credential References in n8n +**What:** Store sensitive values in n8n credentials system and reference them in workflow expressions +**When to use:** Any hardcoded sensitive data (user IDs, tokens, API keys) before exporting workflow +**Example:** +```javascript +// In n8n IF node condition - checking authorized user +// Instead of: $json.message.from.id === 123456789 +// Use credential reference: +$json.message.from.id === parseInt($credentials.telegramAuth.userId) +``` +**Source:** [n8n Credentials Documentation](https://docs.n8n.io/credentials/) + +### Pattern 4: Centralized Error Workflow +**What:** Create separate workflow with Error Trigger node that catches failures from all workflows +**When to use:** Production deployments requiring error monitoring and graceful failure handling +**Example:** +``` +Error Workflow: +[Error Trigger] + → [Code: Format Error Details] + → [Telegram: Notify Admin "Cannot connect to Docker"] + → [HTTP: Log to monitoring service] +``` +**Source:** [n8n Error Handling](https://docs.n8n.io/flow-logic/error-handling/) + +### Anti-Patterns to Avoid +- **Hardcoding credentials in workflow nodes** - Export will expose sensitive data, use n8n credentials system instead +- **Complex regex in Switch conditions** - Simple "contains" operations are sufficient for keyword matching, regex adds complexity +- **Verbose error messages to end users** - Expose internal state and overwhelm users; keep messages terse +- **Editing production workflows directly** - Test changes in duplicate workflow first to prevent breaking live bot +- **Using "Save Execution Progress"** - Debug feature causes excessive database writes in production (3000+ writes/day for 30-node workflow running 100x/day) + +## Don't Hand-Roll + +Problems that look simple but have existing solutions: + +| Problem | Don't Build | Use Instead | Why | +|---------|-------------|-------------|-----| +| Secure credential storage | Custom encryption or env vars | n8n credentials system | Built-in AES256 encryption, credential sharing, OAuth support | +| Error tracking | Manual logging nodes | Error Trigger workflow | Automatic error capture, centralized handling, no manual wiring | +| Telegram keyboard rendering | String concatenation | Telegram reply_markup object | Proper escaping, layout control, persistent menu support | +| Workflow versioning | Manual JSON backups | Git with workflow export | Diff tracking, rollback capability, team collaboration | +| User authorization | Custom auth logic | n8n IF node + credentials | Simple, tested, integrates with credential system | +| Keyword matching | Custom parser code | Switch node "contains" | Native n8n, no code maintenance, visual debugging | +| Retry logic for API calls | Custom retry code | n8n HTTP Request retry options | Exponential backoff, jitter, configurable attempts built-in | + +**Key insight:** n8n provides production-grade features (credentials, error handling, retry logic) that seem simple to replicate but have edge cases around encryption keys, error propagation, and failure recovery. Using built-in capabilities ensures upgrades don't break custom solutions. + +## Common Pitfalls + +### Pitfall 1: Credentials Leak in Exported Workflow +**What goes wrong:** Hardcoded user IDs, API keys, or tokens remain in workflow JSON when exported, exposing sensitive data when sharing or committing to Git. +**Why it happens:** n8n CE blocks environment variable access in expressions, leading developers to hardcode values directly in nodes. +**How to avoid:** +- Create custom credential type in n8n with required fields (e.g., "Telegram Auth" with userId field) +- Reference credential in expressions: `$credentials.telegramAuth.userId` +- Before export, verify no hardcoded IDs with: `grep -E '[0-9]{8,}' workflow.json` +**Warning signs:** grep finds large numbers in workflow JSON, credential fields in nodes show raw values instead of credential references + +### Pitfall 2: Testing Error Workflows with Manual Execution +**What goes wrong:** Error Trigger only fires on automatic workflow failures, not manual test runs. Developers think error handling works but it never triggers in production. +**Why it happens:** n8n Error Trigger is designed for production errors only, manual executions bypass error workflows. +**How to avoid:** +- Use "Stop and Error" node in main workflow to force failures +- Test by triggering workflow via webhook/Telegram (automatic execution) +- Verify error workflow with intentional Docker socket disconnect +**Warning signs:** Error workflow never shows execution history, production failures go unhandled + +### Pitfall 3: Switch Node Fallback Misconfiguration +**What goes wrong:** Setting fallback to "none" silently drops messages that don't match any rules. Users send commands but get no response. +**Why it happens:** Default fallback is "none" - messages that don't match any routing rule disappear without executing downstream nodes. +**How to avoid:** +- Set Switch node fallback to "extra" output +- Connect fallback output to "Show Menu" or "Unknown command" response +- Test with unrecognized input: "asdfgh" should get helpful response +**Warning signs:** Some user messages disappear without response, execution history shows Switch node with no output paths taken + +### Pitfall 4: Case-Sensitive Keyword Matching +**What goes wrong:** User types "Status" (capitalized) but Switch rule checks for lowercase "status", command not recognized. +**Why it happens:** Switch node conditions are case-sensitive by default. +**How to avoid:** +- Normalize input: `$json.message.text.toLowerCase()` in leftValue expression +- Set Switch node "Ignore Case" option to true +- Test with various capitalizations: "status", "Status", "STATUS" +**Warning signs:** Same command works sometimes but not others based on capitalization + +### Pitfall 5: Persistent Keyboard Overwrites +**What goes wrong:** Every response includes full keyboard definition, causing Telegram to re-render unnecessarily and creating visual flickering. +**Why it happens:** Setting reply_markup on every message instead of only on initial welcome or menu request. +**How to avoid:** +- Send keyboard only on /start command, unknown input, or explicit menu request +- Normal responses omit reply_markup parameter (preserves existing keyboard) +- Use `reply_markup: {"remove_keyboard": true}` only when intentionally hiding keyboard +**Warning signs:** Keyboard flickers on every bot response, excessive data in Telegram messages + +### Pitfall 6: Workflow Export Without Encryption Key +**What goes wrong:** Workflow imported on different n8n instance can't decrypt credentials, all authenticated nodes fail. +**Why it happens:** n8n uses N8N_ENCRYPTION_KEY for credential encryption; different instances have different keys. +**How to avoid:** +- Document in README: credentials must be recreated on target n8n instance +- Export workflow, manually create credentials on new instance +- Never copy encryption key between environments (security risk) +- Use external secrets manager (Vault, AWS Secrets Manager) for team environments +**Warning signs:** Imported workflow shows credentials as "missing" or nodes fail with auth errors + +### Pitfall 7: Inline Keyboard Callback Data Limits +**What goes wrong:** Callback data exceeds Telegram's 64-byte limit, inline buttons fail silently. +**Why it happens:** Encoding full container names or multiple parameters in callback_data without length validation. +**How to avoid:** +- Use short encoding: single-char action codes (s/t/r/x for start/stop/restart/update) +- Validate callback_data length: `callback_data.length <= 64` +- Batch limit already addressed (4 containers max) +**Warning signs:** Inline buttons don't respond when clicked, no callback_query received + +### Pitfall 8: Docker Socket Permission Errors After Deployment +**What goes wrong:** n8n container can execute curl commands but gets "permission denied" on /var/run/docker.sock. +**Why it happens:** n8n runs as node user (UID 1000) without docker group membership. +**How to avoid:** +- n8n container must use `--group-add 281` (docker group on Unraid) +- Document in deployment README as required Docker run flag +- Test with: `docker exec n8n curl --unix-socket /var/run/docker.sock http://localhost/containers/json` +**Warning signs:** "Cannot connect to Docker" messages, curl permission denied errors + +## Code Examples + +Verified patterns from official sources: + +### Keyword Router Switch Node +```json +{ + "parameters": { + "rules": { + "values": [ + { + "id": "match-status", + "conditions": { + "options": { + "caseSensitive": false + }, + "conditions": [ + { + "leftValue": "={{ $json.message.text }}", + "rightValue": "status", + "operator": { + "type": "string", + "operation": "contains" + } + } + ] + }, + "renameOutput": true, + "outputKey": "status" + }, + { + "id": "match-start", + "conditions": { + "conditions": [ + { + "leftValue": "={{ $json.message.text.toLowerCase() }}", + "rightValue": "start", + "operator": { + "type": "string", + "operation": "contains" + } + } + ] + }, + "outputKey": "start" + } + ] + }, + "options": { + "fallbackOutput": "extra" + } + }, + "name": "Keyword Router", + "type": "n8n-nodes-base.switch" +} +``` +**Source:** [n8n Switch node docs](https://docs.n8n.io/integrations/builtin/core-nodes/n8n-nodes-base.switch/) + +### Persistent Menu with HTTP Request Node +```javascript +// In n8n HTTP Request node sending Telegram message +// URL: https://api.telegram.org/bot{{ $credentials.telegramApi.token }}/sendMessage +// Method: POST +// Body (JSON): +{ + "chat_id": "={{ $json.message.chat.id }}", + "text": "Use buttons below or type commands:", + "parse_mode": "HTML", + "reply_markup": { + "keyboard": [ + [{"text": "📊 Status"}], + [{"text": "▶️ Start"}, {"text": "⏹️ Stop"}], + [{"text": "🔄 Restart"}, {"text": "⬆️ Update"}], + [{"text": "📜 Logs"}] + ], + "is_persistent": true, + "resize_keyboard": true, + "one_time_keyboard": false + } +} +``` +**Source:** [Telegram Bot API - ReplyKeyboardMarkup](https://core.telegram.org/bots/api) + +### Error Handler Workflow +```json +{ + "name": "Docker Bot Error Handler", + "nodes": [ + { + "parameters": {}, + "name": "Error Trigger", + "type": "n8n-nodes-base.errorTrigger", + "position": [240, 300] + }, + { + "parameters": { + "jsCode": "// Format error for user notification\nconst error = $json.error;\nconst workflow = $json.workflow;\n\n// Check for Docker socket errors\nif (error.message && error.message.includes('docker.sock')) {\n return {\n userMessage: 'Cannot connect to Docker',\n adminMessage: `Docker socket error in ${workflow.name}: ${error.message}`\n };\n}\n\n// Generic infrastructure error\nreturn {\n userMessage: 'Something went wrong',\n adminMessage: `Error in ${workflow.name} at node ${error.node.name}: ${error.message}`\n};" + }, + "name": "Format Error", + "type": "n8n-nodes-base.code", + "position": [440, 300] + }, + { + "parameters": { + "chatId": "={{ $credentials.telegramAuth.userId }}", + "text": "={{ $json.userMessage }}", + "additionalFields": { + "parse_mode": "HTML" + } + }, + "name": "Notify User", + "type": "n8n-nodes-base.telegram", + "position": [640, 300], + "credentials": { + "telegramApi": { + "id": "telegram-credential", + "name": "Telegram API" + } + } + } + ] +} +``` +**Source:** [n8n Error Trigger documentation](https://docs.n8n.io/integrations/builtin/core-nodes/n8n-nodes-base.errortrigger/) + +### Credential Reference Pattern +```javascript +// In n8n IF node - check authorized user +// Instead of hardcoding: $json.message.from.id === 123456789 +// Create credential type "Telegram Auth" with field "userId" +// Then reference in condition: + +// Condition leftValue: +$json.message.from.id + +// Condition rightValue (using credential): +={{ parseInt($credentials.telegramAuth.userId) }} + +// operator: equals (number type) +``` +**Source:** [n8n Credentials Library](https://docs.n8n.io/credentials/) + +### Deployment README Template +```markdown +# Docker Manager Bot - Deployment Guide + +## Prerequisites + +- Unraid server with Docker enabled +- n8n container running on Unraid +- Telegram Bot Token (from @BotFather) +- Your Telegram User ID (from @userinfobot) + +## Installation Steps + +### 1. Create n8n Credentials + +In n8n UI, create two credentials: + +**Telegram API:** +- Type: Telegram API +- Name: `Telegram API` +- Access Token: `` + +**Telegram Auth:** +- Type: Generic Credential Type → HTTP Header Auth +- Name: `Telegram Auth` +- Add custom field: `userId` = `` + +### 2. Import Workflow + +1. Copy `n8n-workflow.json` to your server +2. In n8n UI: Workflows → Import from File +3. Select `n8n-workflow.json` +4. Map credentials when prompted: + - `Telegram API` → your Telegram API credential + - `Telegram Auth` → your Telegram Auth credential + +### 3. Configure n8n Container + +Ensure n8n container has Docker socket access: + +```bash +docker run -d \\ + --name n8n \\ + --group-add 281 \\ + -v /var/run/docker.sock:/var/run/docker.sock \\ + -v /path/to/curl:/usr/bin/curl:ro \\ + n8nio/n8n +``` + +**Required:** +- `--group-add 281` - Docker group for socket access +- Socket mount: `/var/run/docker.sock` +- Static curl binary mount + +### 4. Activate Workflow + +1. Open imported workflow in n8n +2. Click "Active" toggle in top-right +3. Test by messaging your bot: "status" + +## Usage + +Send commands via Telegram: +- **status** - View container status +- **start ** - Start container +- **stop ** - Stop container +- **restart ** - Restart container +- **update ** - Pull latest image and restart +- **logs ** - View recent logs + +Or use persistent menu buttons for common actions. + +## Troubleshooting + +**Bot doesn't respond:** +- Check workflow is Active +- Verify Telegram credentials are correct +- Check n8n execution logs + +**"Cannot connect to Docker":** +- Verify `--group-add 281` in n8n container +- Check docker.sock mount exists +- Test: `docker exec n8n curl --unix-socket /var/run/docker.sock http://localhost/containers/json` + +**Credentials missing after import:** +- Credentials are not exported with workflow +- Recreate credentials in n8n UI +- Re-map in workflow settings +``` +**Source:** [README Best Practices](https://github.com/jehna/readme-best-practices) + +## State of the Art + +| Old Approach | Current Approach | When Changed | Impact | +|--------------|------------------|--------------|--------| +| Claude API for NLU | Keyword matching with Switch node | 2026-01-31 | Removes external API dependency, faster response, no API costs | +| Commands menu | Persistent ReplyKeyboardMarkup | Telegram Bot API 2.0 | Menu always visible, better UX for non-technical users | +| Hardcoded user ID | n8n credentials system | Project start | Allows sharing workflow without exposing sensitive data | +| Manual workflow backup | Git version control | Industry standard | Enables rollback, change tracking, team collaboration | +| Ad-hoc error handling | Error Trigger workflow | n8n v0.x | Centralized error management, consistent user experience | + +**Deprecated/outdated:** +- **Custom keyboard on each message**: Use is_persistent instead - avoids re-rendering and flickering +- **Environment variables in n8n CE**: Use credentials system - env vars blocked in expressions +- **"Save Execution Progress" in production**: Disable - causes excessive database writes (known performance issue) +- **IF node cascades for routing**: Use Switch node - cleaner multiple-output routing + +## Open Questions + +Things that couldn't be fully resolved: + +1. **Exact menu button layout UX** + - What we know: Telegram supports grouped buttons (arrays within keyboard array), emojis render correctly + - What's unclear: Optimal grouping for 6 commands (Status + 5 actions) - user preference on rows vs columns + - Recommendation: Start with CONTEXT.md structure (Status solo, Actions in pairs), iterate based on user feedback during testing + +2. **Retry buttons on retriable errors** + - What we know: Telegram inline keyboards can include retry buttons that re-trigger callback with same parameters + - What's unclear: Whether retry UX adds value vs just asking user to tap action again + - Recommendation: Mark as Claude's discretion in CONTEXT.md - implement if time permits, not critical for v1.0 + +3. **README location** + - What we know: Root README is standard for project entry point, docs/ folder separates documentation from code + - What's unclear: This is n8n workflow (JSON) not code - root vs docs/ both valid + - Recommendation: Use root README.md (marked as Claude's discretion) - single-file deployment guide, no docs/ needed for single-workflow project + +## Sources + +### Primary (HIGH confidence) +- [n8n Switch node documentation](https://docs.n8n.io/integrations/builtin/core-nodes/n8n-nodes-base.switch/) - Keyword routing patterns +- [n8n Error Handling documentation](https://docs.n8n.io/flow-logic/error-handling/) - Error Trigger workflow setup +- [n8n Credentials Library](https://docs.n8n.io/credentials/) - Credential system and references +- [n8n Workflow Export/Import](https://docs.n8n.io/workflows/export-import/) - Export best practices and sensitive data handling +- [Telegram Bot API](https://core.telegram.org/bots/api) - ReplyKeyboardMarkup and is_persistent parameter + +### Secondary (MEDIUM confidence) +- [n8n Credential Hygiene (Medium, Jan 2026)](https://medium.com/@bhagyarana80/n8n-credential-hygiene-for-self-hosted-reality-cfa90ef1a114) - Credential best practices verified with official docs +- [7 Common n8n Workflow Mistakes (Medium, Jan 2026)](https://medium.com/@juanm.acebal/7-common-n8n-workflow-mistakes-that-can-break-your-automations-9638903fb076) - Pitfalls cross-referenced with n8n documentation +- [n8n Workflow Testing (Medium, Jan 2026)](https://medium.com/@Modexa/n8n-workflow-testing-without-the-panic-deploy-7376586a8b43) - Testing practices verified with community discussions +- [Seven n8n Workflow Best Practices for 2026](https://michaelitoback.com/n8n-workflow-best-practices/) - Current best practices aggregated from multiple sources +- [README Best Practices](https://github.com/jehna/readme-best-practices) - README structure template + +### Tertiary (LOW confidence) +- [n8n Telegram Bot Templates](https://n8n.io/workflows/) - Example workflows for pattern reference, not authoritative for best practices +- Various n8n Community Forum discussions - Real-world issues but not official guidance + +## Metadata + +**Confidence breakdown:** +- Standard stack: HIGH - Official n8n and Telegram Bot API documentation verified +- Architecture patterns: HIGH - Direct verification with official docs and existing workflow structure +- Pitfalls: MEDIUM - Mix of official documentation (Error Trigger) and community-reported issues (verified where possible) +- Code examples: HIGH - All examples based on official API documentation and n8n node schemas + +**Research date:** 2026-01-31 +**Valid until:** 2026-02-28 (30 days) - n8n stable platform, Telegram Bot API unlikely to change core features