Files
unraid-docker-manager/.planning/research/PITFALLS.md
T
Lucas Berger 811030cee4 docs: complete v1.1 research (4 researchers + synthesis)
Files:
- STACK.md: Socket proxy, n8n API, Telegram keyboards
- FEATURES.md: Table stakes, differentiators, MVP scope
- ARCHITECTURE.md: Integration points, data flow changes
- PITFALLS.md: Top 5 risks with prevention strategies
- SUMMARY.md: Executive summary, build order, confidence

Key findings:
- Stack: LinuxServer socket-proxy, HTTP Request nodes for keyboards
- Architecture: TCP curl migration (~15 nodes), new callback routes
- Critical pitfall: Socket proxy breaks existing curl commands

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 22:09:06 -05:00

16 KiB

Pitfalls Research: v1.1

Project: Unraid Docker Manager Milestone: v1.1 - n8n Integration & Polish Researched: 2026-02-02 Confidence: MEDIUM-HIGH (verified with official docs where possible)

Context

This research identifies pitfalls specific to adding these features to an existing working system:

  • n8n API access (programmatic workflow read/update/test/logs)
  • Docker socket proxy (security hardening)
  • Telegram inline keyboards (UX improvements)
  • Unraid update sync (clear "update available" badge)

Risk focus: Breaking existing functionality while adding new features.


n8n API Access Pitfalls

Pitfall Warning Signs Prevention Phase
API key with full access API key created without scopes; all workflows accessible Enterprise: use scoped API keys (read-only for Claude Code initially). Non-enterprise: accept risk, rotate keys every 6-12 months API Setup
Missing X-N8N-API-KEY header 401 Unauthorized errors on all API calls Store API key in Claude Code MCP config; always send as X-N8N-API-KEY header, not Bearer token API Setup
Workflow ID mismatch after import API calls return 404; workflow actions fail Workflow IDs change on import; query /api/v1/workflows first to get current IDs, don't hardcode API Setup
Editing active workflow via API Production workflow changes unexpectedly; users see partial updates n8n 2.0: Save vs Publish are separate actions. Use API to read only; manual publish via UI API Setup
N8N_BLOCK_ENV_ACCESS_IN_NODE default Code nodes can't access env vars; returns undefined n8n 2.0+ blocks env vars by default. Use credentials system instead, or explicitly set N8N_BLOCK_ENV_ACCESS_IN_NODE=false API Setup
API not enabled on instance Connection refused on /api/v1 endpoints Self-hosted: API is available by default. Cloud trial: API not available. Verify with curl http://localhost:5678/api/v1/workflows API Setup
Rate limiting on rapid API calls 429 errors when reading workflow repeatedly Add delay between API calls (1-2 seconds); use caching for workflow data that doesn't change frequently API Usage

Sources:


Docker Socket Security Pitfalls

Pitfall Warning Signs Prevention Phase
Proxy exposes POST by default Container can create/delete containers; security scan flags Set POST=0 on socket proxy; most read operations work with GET only Socket Proxy
Using --privileged unnecessarily Security audit fails; container has excessive permissions Remove --privileged flag; Tecnativa proxy works without it on standard Docker Socket Proxy
Outdated socket proxy image Using latest tag which is 3+ years old Pin to specific version: tecnativa/docker-socket-proxy:0.2.0 or use linuxserver/socket-proxy Socket Proxy
Proxy port exposed publicly Port 2375 accessible from network; security scan fails Never expose proxy port; run on internal Docker network only Socket Proxy
Insufficient permissions for n8n "Permission denied" or empty responses from Docker API Enable minimum required: CONTAINERS=1, ALLOW_START=1, ALLOW_STOP=1, ALLOW_RESTARTS=1 for actions Socket Proxy
Breaking existing curl commands Existing workflow fails after adding proxy; commands timeout Socket proxy uses TCP, not Unix socket. Update curl commands: curl http://socket-proxy:2375/... instead of --unix-socket Socket Proxy
Network isolation breaks connectivity n8n can't reach proxy; "connection refused" errors Both containers must be on same Docker network; verify with docker network inspect Socket Proxy
Permissions too restrictive Container list works but start/stop fails Must explicitly enable action endpoints: ALLOW_START=1, ALLOW_STOP=1, ALLOW_RESTARTS=1 (separate from CONTAINERS=1) Socket Proxy
Missing INFO or VERSION permissions Some Docker API calls fail unexpectedly VERSION=1 and PING=1 are enabled by default; may need INFO=1 for system queries Socket Proxy

Minimum safe configuration for this project:

environment:
  - CONTAINERS=1      # Read container info
  - ALLOW_START=1     # Start containers
  - ALLOW_STOP=1      # Stop containers
  - ALLOW_RESTARTS=1  # Restart containers
  - IMAGES=1          # Pull images (for updates)
  - POST=1            # Required for start/stop/restart actions
  - NETWORKS=0        # Not needed
  - VOLUMES=0         # Not needed
  - BUILD=0           # Not needed
  - COMMIT=0          # Not needed
  - CONFIGS=0         # Not needed
  - SECRETS=0         # Security critical - keep disabled
  - EXEC=0            # Security critical - keep disabled
  - AUTH=0            # Security critical - keep disabled

Sources:


Telegram Keyboard Pitfalls

Pitfall Warning Signs Prevention Phase
Native node rejects dynamic keyboards Error: "The value '...' is not supported!" Use HTTP Request node for inline keyboards instead of native Telegram node; this is a known n8n limitation Keyboards
callback_data exceeds 64 bytes Buttons don't respond; no callback_query received; 400 BUTTON_DATA_INVALID Use short codes: s:plex not start_container:plex-media-server. Hash long names to 8-char IDs Keyboards
Callback auth path missing Keyboard clicks ignored; no response to button press Existing workflow already handles callback_query (line 56-74 in workflow). Ensure new keyboards use same auth flow Keyboards
Multiple additional fields ignored Button has both callback_data and URL; only URL works n8n Telegram node limitation - can't use both. Choose one per button: either action (callback) or link (URL) Keyboards
Keyboard flickers on every message Visual glitches; keyboard re-renders constantly Send reply_markup only on /start or menu requests; omit from action responses (keyboard persists) Keyboards
Inline vs Reply keyboard confusion Wrong keyboard type appears; buttons don't trigger callbacks Inline keyboards (InlineKeyboardMarkup) for callbacks; Reply keyboards (ReplyKeyboardMarkup) for persistent menus. Use inline for container actions Keyboards
answerCallbackQuery not called "Loading..." spinner persists after button click; Telegram shows timeout Must call answerCallbackQuery within 10 seconds of receiving callback_query, even if just to acknowledge Keyboards
Button layout exceeds limits Buttons don't appear; API error Bot API 7.0: max 100 buttons total per message. For container lists, paginate or limit to 8-10 buttons Keyboards

Recommended keyboard structure for container actions:

// Short callback_data pattern: action:container_short_id
// Example: "s:abc123" for start, "x:abc123" for stop
{
  "inline_keyboard": [
    [
      {"text": "Start", "callback_data": "s:" + containerId.slice(0,8)},
      {"text": "Stop", "callback_data": "x:" + containerId.slice(0,8)}
    ],
    [
      {"text": "Restart", "callback_data": "r:" + containerId.slice(0,8)},
      {"text": "Logs", "callback_data": "l:" + containerId.slice(0,8)}
    ]
  ]
}

Sources:


Unraid Integration Pitfalls

Pitfall Warning Signs Prevention Phase
Update badge persists after bot update Unraid UI shows "update available" after container updated via bot Delete /var/lib/docker/unraid-update-status.json to force recheck; or trigger Unraid's check mechanism Unraid Sync
unraid-update-status.json format unknown Attempted to modify file directly; broke Unraid Docker tab File format is undocumented. Safest approach: delete file and let Unraid regenerate. Don't modify directly Unraid Sync
Unraid only checks for new updates Badge never clears; only sees new updates, not cleared updates This is known Unraid behavior. Deletion of status file is current workaround per Unraid forums Unraid Sync
Race condition on status file Status file deleted but badge still shows; file regenerated too fast Wait for Unraid's update check interval, or manually trigger "Check for Updates" from Unraid UI after deletion Unraid Sync
Bot can't access Unraid filesystem Permission denied when accessing /var/lib/docker/ n8n container needs additional volume mount: /var/lib/docker:/var/lib/docker or execute via SSH Unraid Sync
Breaking Unraid's Docker management Unraid Docker tab shows errors; containers appear in wrong state Never modify Unraid's internal files (in /boot/config/docker or /var/lib/docker) except update-status.json deletion Unraid Sync

Unraid sync approach (safest):

  1. After bot successfully updates container
  2. Execute: rm -f /var/lib/docker/unraid-update-status.json
  3. Unraid will regenerate on next "Check for Updates" or automatically

Sources:


Integration Pitfalls (Breaking Existing Functionality)

Pitfall Warning Signs Prevention Phase
Socket proxy breaks existing curl All Docker commands fail after adding proxy Existing workflow uses --unix-socket. Migrate curl commands to use proxy TCP endpoint: http://socket-proxy:2375 Socket Proxy
Auth flow bypassed on new paths New keyboard handlers skip user ID check; anyone can click buttons Existing workflow has auth at lines 92-122 and 126-155. Copy same pattern for any new callback handlers All
Workflow test vs production mismatch Works in test mode; fails when activated Test with actual Telegram messages, not just manual execution. Production triggers differ from manual runs All
n8n 2.0 upgrade breaks workflow After n8n update, workflow stops working; nodes missing n8n 2.0 has breaking changes: Execute Command disabled by default, Start node removed, env vars blocked. Check migration guide before upgrading All
Credential reference breaks after import Imported workflow can't decrypt credentials; all nodes fail n8n uses N8N_ENCRYPTION_KEY. After import, must recreate credentials manually in n8n UI All
HTTP Request node vs Execute Command HTTP Request can't reach Docker socket; timeout errors HTTP Request node doesn't support Unix sockets. Keep using Execute Command with curl for Docker API (or migrate to TCP proxy) Socket Proxy
Parallel execution race conditions Two button clicks cause conflicting container states Add debounce logic: ignore rapid duplicate callbacks within 2-3 seconds. Store last action timestamp Keyboards
Error workflow doesn't fire Errors occur but no notification; silent failures Error Trigger only fires on automatic executions, not manual test runs. Test by triggering via Telegram with intentional failure All
Save vs Publish confusion (n8n 2.0) Edited workflow but production still uses old version n8n 2.0 separates Save (preserves edits) from Publish (updates production). Must explicitly publish changes All

Pre-migration checklist:

  • Export current workflow JSON as backup
  • Document current curl commands and endpoints
  • Test each existing command works after changes
  • Verify auth flow applies to new handlers
  • Test error handling triggers correctly

Sources:


Summary: Top 5 Risks

Ranked by likelihood x impact for this specific milestone:

1. Socket Proxy Breaks Existing Commands (HIGH likelihood, HIGH impact)

Why: Current workflow uses --unix-socket flag. Socket proxy uses TCP. All existing functionality breaks if not migrated correctly. Prevention:

  1. Add socket proxy container first (don't remove direct socket yet)
  2. Update curl commands one-by-one to use proxy
  3. Test each command works via proxy
  4. Only then remove direct socket mount

2. Native Telegram Node Rejects Dynamic Keyboards (HIGH likelihood, MEDIUM impact)

Why: n8n's native Telegram node has a known bug (Issue #19955) where it rejects array expressions for inline keyboards. Prevention: Use HTTP Request node to call Telegram API directly for any dynamic keyboard generation. Keep native node for simple text responses only.

3. Unraid Update Badge Never Clears (HIGH likelihood, LOW impact)

Why: Unraid doesn't check for "no longer outdated" containers - only new updates. Documented behavior, not a bug. Prevention: Delete /var/lib/docker/unraid-update-status.json after successful bot update. Requires additional volume mount or SSH access.

4. n8n 2.0 Breaking Changes on Upgrade (MEDIUM likelihood, HIGH impact)

Why: n8n 2.0 (released Dec 2025) has multiple breaking changes: Execute Command disabled by default, env vars blocked, Save/Publish separation. Prevention:

  1. Check current n8n version before starting
  2. If upgrading, run Migration Report first (Settings > Migration Report)
  3. Don't upgrade n8n during this milestone unless necessary

5. callback_data Exceeds 64 Bytes (MEDIUM likelihood, MEDIUM impact)

Why: Container names can be long (e.g., linuxserver-plex-media-server). Adding action prefix easily exceeds 64 bytes. Prevention: Use short action codes (s:, x:, r:, l:) and container ID prefix (8 chars) instead of full names. Map back via lookup.


Phase Assignment Summary

Phase Pitfalls to Address
API Setup API key scoping, header format, workflow ID discovery, env var blocking
Socket Proxy Proxy configuration, permission settings, curl command migration, network setup
Keyboards HTTP Request node for keyboards, callback_data limits, answerCallbackQuery
Unraid Sync Update status file deletion, volume mount for access
All Phases Auth flow consistency, test vs production, error workflow testing

Confidence Assessment

Area Confidence Rationale
n8n API HIGH Official docs verified, known breaking changes documented
Docker Socket Proxy HIGH Official Tecnativa docs, community best practices verified
Telegram Keyboards MEDIUM-HIGH n8n GitHub issues confirm limitations, Telegram API docs verified
Unraid Integration MEDIUM Forum posts describe workaround, but file format undocumented
Integration Risks MEDIUM Based on existing v1.0 codebase analysis and general patterns

Research date: 2026-02-02 Valid until: 2026-03-02 (30 days - n8n and Telegram APIs stable)