Files

T

Lucas Berger 811030cee4 docs: complete v1.1 research (4 researchers + synthesis)

Files:
- STACK.md: Socket proxy, n8n API, Telegram keyboards
- FEATURES.md: Table stakes, differentiators, MVP scope
- ARCHITECTURE.md: Integration points, data flow changes
- PITFALLS.md: Top 5 risks with prevention strategies
- SUMMARY.md: Executive summary, build order, confidence

Key findings:
- Stack: LinuxServer socket-proxy, HTTP Request nodes for keyboards
- Architecture: TCP curl migration (~15 nodes), new callback routes
- Critical pitfall: Socket proxy breaks existing curl commands

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-02 22:09:06 -05:00

16 KiB

Raw Blame History

Pitfalls Research: v1.1

Project: Unraid Docker Manager Milestone: v1.1 - n8n Integration & Polish Researched: 2026-02-02 Confidence: MEDIUM-HIGH (verified with official docs where possible)

Context

This research identifies pitfalls specific to adding these features to an existing working system:

n8n API access (programmatic workflow read/update/test/logs)
Docker socket proxy (security hardening)
Telegram inline keyboards (UX improvements)
Unraid update sync (clear "update available" badge)

Risk focus: Breaking existing functionality while adding new features.

n8n API Access Pitfalls

Pitfall	Warning Signs	Prevention	Phase
API key with full access	API key created without scopes; all workflows accessible	Enterprise: use scoped API keys (read-only for Claude Code initially). Non-enterprise: accept risk, rotate keys every 6-12 months	API Setup
Missing X-N8N-API-KEY header	401 Unauthorized errors on all API calls	Store API key in Claude Code MCP config; always send as `X-N8N-API-KEY` header, not Bearer token	API Setup
Workflow ID mismatch after import	API calls return 404; workflow actions fail	Workflow IDs change on import; query `/api/v1/workflows` first to get current IDs, don't hardcode	API Setup
Editing active workflow via API	Production workflow changes unexpectedly; users see partial updates	n8n 2.0: Save vs Publish are separate actions. Use API to read only; manual publish via UI	API Setup
N8N_BLOCK_ENV_ACCESS_IN_NODE default	Code nodes can't access env vars; returns undefined	n8n 2.0+ blocks env vars by default. Use credentials system instead, or explicitly set `N8N_BLOCK_ENV_ACCESS_IN_NODE=false`	API Setup
API not enabled on instance	Connection refused on /api/v1 endpoints	Self-hosted: API is available by default. Cloud trial: API not available. Verify with `curl http://localhost:5678/api/v1/workflows`	API Setup
Rate limiting on rapid API calls	429 errors when reading workflow repeatedly	Add delay between API calls (1-2 seconds); use caching for workflow data that doesn't change frequently	API Usage

Sources:

Docker Socket Security Pitfalls

Pitfall	Warning Signs	Prevention	Phase
Proxy exposes POST by default	Container can create/delete containers; security scan flags	Set `POST=0` on socket proxy; most read operations work with GET only	Socket Proxy
Using `--privileged` unnecessarily	Security audit fails; container has excessive permissions	Remove `--privileged` flag; Tecnativa proxy works without it on standard Docker	Socket Proxy
Outdated socket proxy image	Using `latest` tag which is 3+ years old	Pin to specific version: `tecnativa/docker-socket-proxy:0.2.0` or use `linuxserver/socket-proxy`	Socket Proxy
Proxy port exposed publicly	Port 2375 accessible from network; security scan fails	Never expose proxy port; run on internal Docker network only	Socket Proxy
Insufficient permissions for n8n	"Permission denied" or empty responses from Docker API	Enable minimum required: `CONTAINERS=1`, `ALLOW_START=1`, `ALLOW_STOP=1`, `ALLOW_RESTARTS=1` for actions	Socket Proxy
Breaking existing curl commands	Existing workflow fails after adding proxy; commands timeout	Socket proxy uses TCP, not Unix socket. Update curl commands: `curl http://socket-proxy:2375/...` instead of `--unix-socket`	Socket Proxy
Network isolation breaks connectivity	n8n can't reach proxy; "connection refused" errors	Both containers must be on same Docker network; verify with `docker network inspect`	Socket Proxy
Permissions too restrictive	Container list works but start/stop fails	Must explicitly enable action endpoints: `ALLOW_START=1`, `ALLOW_STOP=1`, `ALLOW_RESTARTS=1` (separate from `CONTAINERS=1`)	Socket Proxy
Missing INFO or VERSION permissions	Some Docker API calls fail unexpectedly	`VERSION=1` and `PING=1` are enabled by default; may need `INFO=1` for system queries	Socket Proxy

Minimum safe configuration for this project:

environment:
  - CONTAINERS=1      # Read container info
  - ALLOW_START=1     # Start containers
  - ALLOW_STOP=1      # Stop containers
  - ALLOW_RESTARTS=1  # Restart containers
  - IMAGES=1          # Pull images (for updates)
  - POST=1            # Required for start/stop/restart actions
  - NETWORKS=0        # Not needed
  - VOLUMES=0         # Not needed
  - BUILD=0           # Not needed
  - COMMIT=0          # Not needed
  - CONFIGS=0         # Not needed
  - SECRETS=0         # Security critical - keep disabled
  - EXEC=0            # Security critical - keep disabled
  - AUTH=0            # Security critical - keep disabled

Sources:

Telegram Keyboard Pitfalls

Pitfall	Warning Signs	Prevention	Phase
Native node rejects dynamic keyboards	Error: "The value '...' is not supported!"	Use HTTP Request node for inline keyboards instead of native Telegram node; this is a known n8n limitation	Keyboards
callback_data exceeds 64 bytes	Buttons don't respond; no callback_query received; 400 BUTTON_DATA_INVALID	Use short codes: `s:plex` not `start_container:plex-media-server`. Hash long names to 8-char IDs	Keyboards
Callback auth path missing	Keyboard clicks ignored; no response to button press	Existing workflow already handles callback_query (line 56-74 in workflow). Ensure new keyboards use same auth flow	Keyboards
Multiple additional fields ignored	Button has both callback_data and URL; only URL works	n8n Telegram node limitation - can't use both. Choose one per button: either action (callback) or link (URL)	Keyboards
Keyboard flickers on every message	Visual glitches; keyboard re-renders constantly	Send `reply_markup` only on /start or menu requests; omit from action responses (keyboard persists)	Keyboards
Inline vs Reply keyboard confusion	Wrong keyboard type appears; buttons don't trigger callbacks	Inline keyboards (InlineKeyboardMarkup) for callbacks; Reply keyboards (ReplyKeyboardMarkup) for persistent menus. Use inline for container actions	Keyboards
answerCallbackQuery not called	"Loading..." spinner persists after button click; Telegram shows timeout	Must call `answerCallbackQuery` within 10 seconds of receiving callback_query, even if just to acknowledge	Keyboards
Button layout exceeds limits	Buttons don't appear; API error	Bot API 7.0: max 100 buttons total per message. For container lists, paginate or limit to 8-10 buttons	Keyboards

Recommended keyboard structure for container actions:

// Short callback_data pattern: action:container_short_id
// Example: "s:abc123" for start, "x:abc123" for stop
{
  "inline_keyboard": [
    [
      {"text": "Start", "callback_data": "s:" + containerId.slice(0,8)},
      {"text": "Stop", "callback_data": "x:" + containerId.slice(0,8)}
    ],
    [
      {"text": "Restart", "callback_data": "r:" + containerId.slice(0,8)},
      {"text": "Logs", "callback_data": "l:" + containerId.slice(0,8)}
    ]
  ]
}

Sources:

Unraid Integration Pitfalls

Pitfall	Warning Signs	Prevention	Phase
Update badge persists after bot update	Unraid UI shows "update available" after container updated via bot	Delete `/var/lib/docker/unraid-update-status.json` to force recheck; or trigger Unraid's check mechanism	Unraid Sync
unraid-update-status.json format unknown	Attempted to modify file directly; broke Unraid Docker tab	File format is undocumented. Safest approach: delete file and let Unraid regenerate. Don't modify directly	Unraid Sync
Unraid only checks for new updates	Badge never clears; only sees new updates, not cleared updates	This is known Unraid behavior. Deletion of status file is current workaround per Unraid forums	Unraid Sync
Race condition on status file	Status file deleted but badge still shows; file regenerated too fast	Wait for Unraid's update check interval, or manually trigger "Check for Updates" from Unraid UI after deletion	Unraid Sync
Bot can't access Unraid filesystem	Permission denied when accessing /var/lib/docker/	n8n container needs additional volume mount: `/var/lib/docker:/var/lib/docker` or execute via SSH	Unraid Sync
Breaking Unraid's Docker management	Unraid Docker tab shows errors; containers appear in wrong state	Never modify Unraid's internal files (in /boot/config/docker or /var/lib/docker) except update-status.json deletion	Unraid Sync

Unraid sync approach (safest):

After bot successfully updates container
Execute: rm -f /var/lib/docker/unraid-update-status.json
Unraid will regenerate on next "Check for Updates" or automatically

Sources:

Integration Pitfalls (Breaking Existing Functionality)

Pitfall	Warning Signs	Prevention	Phase
Socket proxy breaks existing curl	All Docker commands fail after adding proxy	Existing workflow uses `--unix-socket`. Migrate curl commands to use proxy TCP endpoint: `http://socket-proxy:2375`	Socket Proxy
Auth flow bypassed on new paths	New keyboard handlers skip user ID check; anyone can click buttons	Existing workflow has auth at lines 92-122 and 126-155. Copy same pattern for any new callback handlers	All
Workflow test vs production mismatch	Works in test mode; fails when activated	Test with actual Telegram messages, not just manual execution. Production triggers differ from manual runs	All
n8n 2.0 upgrade breaks workflow	After n8n update, workflow stops working; nodes missing	n8n 2.0 has breaking changes: Execute Command disabled by default, Start node removed, env vars blocked. Check migration guide before upgrading	All
Credential reference breaks after import	Imported workflow can't decrypt credentials; all nodes fail	n8n uses N8N_ENCRYPTION_KEY. After import, must recreate credentials manually in n8n UI	All
HTTP Request node vs Execute Command	HTTP Request can't reach Docker socket; timeout errors	HTTP Request node doesn't support Unix sockets. Keep using Execute Command with curl for Docker API (or migrate to TCP proxy)	Socket Proxy
Parallel execution race conditions	Two button clicks cause conflicting container states	Add debounce logic: ignore rapid duplicate callbacks within 2-3 seconds. Store last action timestamp	Keyboards
Error workflow doesn't fire	Errors occur but no notification; silent failures	Error Trigger only fires on automatic executions, not manual test runs. Test by triggering via Telegram with intentional failure	All
Save vs Publish confusion (n8n 2.0)	Edited workflow but production still uses old version	n8n 2.0 separates Save (preserves edits) from Publish (updates production). Must explicitly publish changes	All

Pre-migration checklist:

Export current workflow JSON as backup
Document current curl commands and endpoints
Test each existing command works after changes
Verify auth flow applies to new handlers
Test error handling triggers correctly

Sources:

Summary: Top 5 Risks

Ranked by likelihood x impact for this specific milestone:

1. Socket Proxy Breaks Existing Commands (HIGH likelihood, HIGH impact)

Why: Current workflow uses --unix-socket flag. Socket proxy uses TCP. All existing functionality breaks if not migrated correctly. Prevention:

Add socket proxy container first (don't remove direct socket yet)
Update curl commands one-by-one to use proxy
Test each command works via proxy
Only then remove direct socket mount

2. Native Telegram Node Rejects Dynamic Keyboards (HIGH likelihood, MEDIUM impact)

Why: n8n's native Telegram node has a known bug (Issue #19955) where it rejects array expressions for inline keyboards. Prevention: Use HTTP Request node to call Telegram API directly for any dynamic keyboard generation. Keep native node for simple text responses only.

3. Unraid Update Badge Never Clears (HIGH likelihood, LOW impact)

Why: Unraid doesn't check for "no longer outdated" containers - only new updates. Documented behavior, not a bug. Prevention: Delete /var/lib/docker/unraid-update-status.json after successful bot update. Requires additional volume mount or SSH access.

4. n8n 2.0 Breaking Changes on Upgrade (MEDIUM likelihood, HIGH impact)

Why: n8n 2.0 (released Dec 2025) has multiple breaking changes: Execute Command disabled by default, env vars blocked, Save/Publish separation. Prevention:

Check current n8n version before starting
If upgrading, run Migration Report first (Settings > Migration Report)
Don't upgrade n8n during this milestone unless necessary

5. callback_data Exceeds 64 Bytes (MEDIUM likelihood, MEDIUM impact)

Why: Container names can be long (e.g., linuxserver-plex-media-server). Adding action prefix easily exceeds 64 bytes. Prevention: Use short action codes (s:, x:, r:, l:) and container ID prefix (8 chars) instead of full names. Map back via lookup.

Phase Assignment Summary

Phase	Pitfalls to Address
API Setup	API key scoping, header format, workflow ID discovery, env var blocking
Socket Proxy	Proxy configuration, permission settings, curl command migration, network setup
Keyboards	HTTP Request node for keyboards, callback_data limits, answerCallbackQuery
Unraid Sync	Update status file deletion, volume mount for access
All Phases	Auth flow consistency, test vs production, error workflow testing

Confidence Assessment

Area	Confidence	Rationale
n8n API	HIGH	Official docs verified, known breaking changes documented
Docker Socket Proxy	HIGH	Official Tecnativa docs, community best practices verified
Telegram Keyboards	MEDIUM-HIGH	n8n GitHub issues confirm limitations, Telegram API docs verified
Unraid Integration	MEDIUM	Forum posts describe workaround, but file format undocumented
Integration Risks	MEDIUM	Based on existing v1.0 codebase analysis and general patterns

Research date: 2026-02-02 Valid until: 2026-03-02 (30 days - n8n and Telegram APIs stable)

16 KiB Raw Blame History

Pitfalls Research: v1.1

Context

n8n API Access Pitfalls

Docker Socket Security Pitfalls

Telegram Keyboard Pitfalls

Unraid Integration Pitfalls

Integration Pitfalls (Breaking Existing Functionality)

Summary: Top 5 Risks

1. Socket Proxy Breaks Existing Commands (HIGH likelihood, HIGH impact)

2. Native Telegram Node Rejects Dynamic Keyboards (HIGH likelihood, MEDIUM impact)

3. Unraid Update Badge Never Clears (HIGH likelihood, LOW impact)

4. n8n 2.0 Breaking Changes on Upgrade (MEDIUM likelihood, HIGH impact)

5. callback_data Exceeds 64 Bytes (MEDIUM likelihood, MEDIUM impact)

Phase Assignment Summary

Confidence Assessment

16 KiB

Raw Blame History