Files: - STACK.md: Socket proxy, n8n API, Telegram keyboards - FEATURES.md: Table stakes, differentiators, MVP scope - ARCHITECTURE.md: Integration points, data flow changes - PITFALLS.md: Top 5 risks with prevention strategies - SUMMARY.md: Executive summary, build order, confidence Key findings: - Stack: LinuxServer socket-proxy, HTTP Request nodes for keyboards - Architecture: TCP curl migration (~15 nodes), new callback routes - Critical pitfall: Socket proxy breaks existing curl commands Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
16 KiB
Pitfalls Research: v1.1
Project: Unraid Docker Manager Milestone: v1.1 - n8n Integration & Polish Researched: 2026-02-02 Confidence: MEDIUM-HIGH (verified with official docs where possible)
Context
This research identifies pitfalls specific to adding these features to an existing working system:
- n8n API access (programmatic workflow read/update/test/logs)
- Docker socket proxy (security hardening)
- Telegram inline keyboards (UX improvements)
- Unraid update sync (clear "update available" badge)
Risk focus: Breaking existing functionality while adding new features.
n8n API Access Pitfalls
| Pitfall | Warning Signs | Prevention | Phase |
|---|---|---|---|
| API key with full access | API key created without scopes; all workflows accessible | Enterprise: use scoped API keys (read-only for Claude Code initially). Non-enterprise: accept risk, rotate keys every 6-12 months | API Setup |
| Missing X-N8N-API-KEY header | 401 Unauthorized errors on all API calls | Store API key in Claude Code MCP config; always send as X-N8N-API-KEY header, not Bearer token |
API Setup |
| Workflow ID mismatch after import | API calls return 404; workflow actions fail | Workflow IDs change on import; query /api/v1/workflows first to get current IDs, don't hardcode |
API Setup |
| Editing active workflow via API | Production workflow changes unexpectedly; users see partial updates | n8n 2.0: Save vs Publish are separate actions. Use API to read only; manual publish via UI | API Setup |
| N8N_BLOCK_ENV_ACCESS_IN_NODE default | Code nodes can't access env vars; returns undefined | n8n 2.0+ blocks env vars by default. Use credentials system instead, or explicitly set N8N_BLOCK_ENV_ACCESS_IN_NODE=false |
API Setup |
| API not enabled on instance | Connection refused on /api/v1 endpoints | Self-hosted: API is available by default. Cloud trial: API not available. Verify with curl http://localhost:5678/api/v1/workflows |
API Setup |
| Rate limiting on rapid API calls | 429 errors when reading workflow repeatedly | Add delay between API calls (1-2 seconds); use caching for workflow data that doesn't change frequently | API Usage |
Sources:
Docker Socket Security Pitfalls
| Pitfall | Warning Signs | Prevention | Phase |
|---|---|---|---|
| Proxy exposes POST by default | Container can create/delete containers; security scan flags | Set POST=0 on socket proxy; most read operations work with GET only |
Socket Proxy |
Using --privileged unnecessarily |
Security audit fails; container has excessive permissions | Remove --privileged flag; Tecnativa proxy works without it on standard Docker |
Socket Proxy |
| Outdated socket proxy image | Using latest tag which is 3+ years old |
Pin to specific version: tecnativa/docker-socket-proxy:0.2.0 or use linuxserver/socket-proxy |
Socket Proxy |
| Proxy port exposed publicly | Port 2375 accessible from network; security scan fails | Never expose proxy port; run on internal Docker network only | Socket Proxy |
| Insufficient permissions for n8n | "Permission denied" or empty responses from Docker API | Enable minimum required: CONTAINERS=1, ALLOW_START=1, ALLOW_STOP=1, ALLOW_RESTARTS=1 for actions |
Socket Proxy |
| Breaking existing curl commands | Existing workflow fails after adding proxy; commands timeout | Socket proxy uses TCP, not Unix socket. Update curl commands: curl http://socket-proxy:2375/... instead of --unix-socket |
Socket Proxy |
| Network isolation breaks connectivity | n8n can't reach proxy; "connection refused" errors | Both containers must be on same Docker network; verify with docker network inspect |
Socket Proxy |
| Permissions too restrictive | Container list works but start/stop fails | Must explicitly enable action endpoints: ALLOW_START=1, ALLOW_STOP=1, ALLOW_RESTARTS=1 (separate from CONTAINERS=1) |
Socket Proxy |
| Missing INFO or VERSION permissions | Some Docker API calls fail unexpectedly | VERSION=1 and PING=1 are enabled by default; may need INFO=1 for system queries |
Socket Proxy |
Minimum safe configuration for this project:
environment:
- CONTAINERS=1 # Read container info
- ALLOW_START=1 # Start containers
- ALLOW_STOP=1 # Stop containers
- ALLOW_RESTARTS=1 # Restart containers
- IMAGES=1 # Pull images (for updates)
- POST=1 # Required for start/stop/restart actions
- NETWORKS=0 # Not needed
- VOLUMES=0 # Not needed
- BUILD=0 # Not needed
- COMMIT=0 # Not needed
- CONFIGS=0 # Not needed
- SECRETS=0 # Security critical - keep disabled
- EXEC=0 # Security critical - keep disabled
- AUTH=0 # Security critical - keep disabled
Sources:
- Tecnativa docker-socket-proxy
- LinuxServer socket-proxy
- Docker Community Forums - Socket Proxy Security
Telegram Keyboard Pitfalls
| Pitfall | Warning Signs | Prevention | Phase |
|---|---|---|---|
| Native node rejects dynamic keyboards | Error: "The value '...' is not supported!" | Use HTTP Request node for inline keyboards instead of native Telegram node; this is a known n8n limitation | Keyboards |
| callback_data exceeds 64 bytes | Buttons don't respond; no callback_query received; 400 BUTTON_DATA_INVALID | Use short codes: s:plex not start_container:plex-media-server. Hash long names to 8-char IDs |
Keyboards |
| Callback auth path missing | Keyboard clicks ignored; no response to button press | Existing workflow already handles callback_query (line 56-74 in workflow). Ensure new keyboards use same auth flow | Keyboards |
| Multiple additional fields ignored | Button has both callback_data and URL; only URL works | n8n Telegram node limitation - can't use both. Choose one per button: either action (callback) or link (URL) | Keyboards |
| Keyboard flickers on every message | Visual glitches; keyboard re-renders constantly | Send reply_markup only on /start or menu requests; omit from action responses (keyboard persists) |
Keyboards |
| Inline vs Reply keyboard confusion | Wrong keyboard type appears; buttons don't trigger callbacks | Inline keyboards (InlineKeyboardMarkup) for callbacks; Reply keyboards (ReplyKeyboardMarkup) for persistent menus. Use inline for container actions | Keyboards |
| answerCallbackQuery not called | "Loading..." spinner persists after button click; Telegram shows timeout | Must call answerCallbackQuery within 10 seconds of receiving callback_query, even if just to acknowledge |
Keyboards |
| Button layout exceeds limits | Buttons don't appear; API error | Bot API 7.0: max 100 buttons total per message. For container lists, paginate or limit to 8-10 buttons | Keyboards |
Recommended keyboard structure for container actions:
// Short callback_data pattern: action:container_short_id
// Example: "s:abc123" for start, "x:abc123" for stop
{
"inline_keyboard": [
[
{"text": "Start", "callback_data": "s:" + containerId.slice(0,8)},
{"text": "Stop", "callback_data": "x:" + containerId.slice(0,8)}
],
[
{"text": "Restart", "callback_data": "r:" + containerId.slice(0,8)},
{"text": "Logs", "callback_data": "l:" + containerId.slice(0,8)}
]
]
}
Sources:
- n8n GitHub Issue #19955 - Inline Keyboard Expression
- n8n Telegram Callback Operations
- Telegram Bot API - InlineKeyboardButton
Unraid Integration Pitfalls
| Pitfall | Warning Signs | Prevention | Phase |
|---|---|---|---|
| Update badge persists after bot update | Unraid UI shows "update available" after container updated via bot | Delete /var/lib/docker/unraid-update-status.json to force recheck; or trigger Unraid's check mechanism |
Unraid Sync |
| unraid-update-status.json format unknown | Attempted to modify file directly; broke Unraid Docker tab | File format is undocumented. Safest approach: delete file and let Unraid regenerate. Don't modify directly | Unraid Sync |
| Unraid only checks for new updates | Badge never clears; only sees new updates, not cleared updates | This is known Unraid behavior. Deletion of status file is current workaround per Unraid forums | Unraid Sync |
| Race condition on status file | Status file deleted but badge still shows; file regenerated too fast | Wait for Unraid's update check interval, or manually trigger "Check for Updates" from Unraid UI after deletion | Unraid Sync |
| Bot can't access Unraid filesystem | Permission denied when accessing /var/lib/docker/ | n8n container needs additional volume mount: /var/lib/docker:/var/lib/docker or execute via SSH |
Unraid Sync |
| Breaking Unraid's Docker management | Unraid Docker tab shows errors; containers appear in wrong state | Never modify Unraid's internal files (in /boot/config/docker or /var/lib/docker) except update-status.json deletion | Unraid Sync |
Unraid sync approach (safest):
- After bot successfully updates container
- Execute:
rm -f /var/lib/docker/unraid-update-status.json - Unraid will regenerate on next "Check for Updates" or automatically
Sources:
- Unraid Forums - Update notification regression
- Unraid Forums - Update badge persists
- Unraid Forums - Containers show update available incorrectly
Integration Pitfalls (Breaking Existing Functionality)
| Pitfall | Warning Signs | Prevention | Phase |
|---|---|---|---|
| Socket proxy breaks existing curl | All Docker commands fail after adding proxy | Existing workflow uses --unix-socket. Migrate curl commands to use proxy TCP endpoint: http://socket-proxy:2375 |
Socket Proxy |
| Auth flow bypassed on new paths | New keyboard handlers skip user ID check; anyone can click buttons | Existing workflow has auth at lines 92-122 and 126-155. Copy same pattern for any new callback handlers | All |
| Workflow test vs production mismatch | Works in test mode; fails when activated | Test with actual Telegram messages, not just manual execution. Production triggers differ from manual runs | All |
| n8n 2.0 upgrade breaks workflow | After n8n update, workflow stops working; nodes missing | n8n 2.0 has breaking changes: Execute Command disabled by default, Start node removed, env vars blocked. Check migration guide before upgrading | All |
| Credential reference breaks after import | Imported workflow can't decrypt credentials; all nodes fail | n8n uses N8N_ENCRYPTION_KEY. After import, must recreate credentials manually in n8n UI | All |
| HTTP Request node vs Execute Command | HTTP Request can't reach Docker socket; timeout errors | HTTP Request node doesn't support Unix sockets. Keep using Execute Command with curl for Docker API (or migrate to TCP proxy) | Socket Proxy |
| Parallel execution race conditions | Two button clicks cause conflicting container states | Add debounce logic: ignore rapid duplicate callbacks within 2-3 seconds. Store last action timestamp | Keyboards |
| Error workflow doesn't fire | Errors occur but no notification; silent failures | Error Trigger only fires on automatic executions, not manual test runs. Test by triggering via Telegram with intentional failure | All |
| Save vs Publish confusion (n8n 2.0) | Edited workflow but production still uses old version | n8n 2.0 separates Save (preserves edits) from Publish (updates production). Must explicitly publish changes | All |
Pre-migration checklist:
- Export current workflow JSON as backup
- Document current curl commands and endpoints
- Test each existing command works after changes
- Verify auth flow applies to new handlers
- Test error handling triggers correctly
Sources:
- n8n v2.0 Breaking Changes
- n8n Manual vs Production Executions
- n8n Community - Test vs Production Behavior
Summary: Top 5 Risks
Ranked by likelihood x impact for this specific milestone:
1. Socket Proxy Breaks Existing Commands (HIGH likelihood, HIGH impact)
Why: Current workflow uses --unix-socket flag. Socket proxy uses TCP. All existing functionality breaks if not migrated correctly.
Prevention:
- Add socket proxy container first (don't remove direct socket yet)
- Update curl commands one-by-one to use proxy
- Test each command works via proxy
- Only then remove direct socket mount
2. Native Telegram Node Rejects Dynamic Keyboards (HIGH likelihood, MEDIUM impact)
Why: n8n's native Telegram node has a known bug (Issue #19955) where it rejects array expressions for inline keyboards. Prevention: Use HTTP Request node to call Telegram API directly for any dynamic keyboard generation. Keep native node for simple text responses only.
3. Unraid Update Badge Never Clears (HIGH likelihood, LOW impact)
Why: Unraid doesn't check for "no longer outdated" containers - only new updates. Documented behavior, not a bug.
Prevention: Delete /var/lib/docker/unraid-update-status.json after successful bot update. Requires additional volume mount or SSH access.
4. n8n 2.0 Breaking Changes on Upgrade (MEDIUM likelihood, HIGH impact)
Why: n8n 2.0 (released Dec 2025) has multiple breaking changes: Execute Command disabled by default, env vars blocked, Save/Publish separation. Prevention:
- Check current n8n version before starting
- If upgrading, run Migration Report first (Settings > Migration Report)
- Don't upgrade n8n during this milestone unless necessary
5. callback_data Exceeds 64 Bytes (MEDIUM likelihood, MEDIUM impact)
Why: Container names can be long (e.g., linuxserver-plex-media-server). Adding action prefix easily exceeds 64 bytes.
Prevention: Use short action codes (s:, x:, r:, l:) and container ID prefix (8 chars) instead of full names. Map back via lookup.
Phase Assignment Summary
| Phase | Pitfalls to Address |
|---|---|
| API Setup | API key scoping, header format, workflow ID discovery, env var blocking |
| Socket Proxy | Proxy configuration, permission settings, curl command migration, network setup |
| Keyboards | HTTP Request node for keyboards, callback_data limits, answerCallbackQuery |
| Unraid Sync | Update status file deletion, volume mount for access |
| All Phases | Auth flow consistency, test vs production, error workflow testing |
Confidence Assessment
| Area | Confidence | Rationale |
|---|---|---|
| n8n API | HIGH | Official docs verified, known breaking changes documented |
| Docker Socket Proxy | HIGH | Official Tecnativa docs, community best practices verified |
| Telegram Keyboards | MEDIUM-HIGH | n8n GitHub issues confirm limitations, Telegram API docs verified |
| Unraid Integration | MEDIUM | Forum posts describe workaround, but file format undocumented |
| Integration Risks | MEDIUM | Based on existing v1.0 codebase analysis and general patterns |
Research date: 2026-02-02 Valid until: 2026-03-02 (30 days - n8n and Telegram APIs stable)