Files
unraid-docker-manager/.planning/research/FEATURES.md
T
Lucas Berger 811030cee4 docs: complete v1.1 research (4 researchers + synthesis)
Files:
- STACK.md: Socket proxy, n8n API, Telegram keyboards
- FEATURES.md: Table stakes, differentiators, MVP scope
- ARCHITECTURE.md: Integration points, data flow changes
- PITFALLS.md: Top 5 risks with prevention strategies
- SUMMARY.md: Executive summary, build order, confidence

Key findings:
- Stack: LinuxServer socket-proxy, HTTP Request nodes for keyboards
- Architecture: TCP curl migration (~15 nodes), new callback routes
- Critical pitfall: Socket proxy breaks existing curl commands

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 22:09:06 -05:00

19 KiB

Features Research: v1.1

Domain: Telegram Bot for Docker Container Management Researched: 2026-02-02 Confidence: MEDIUM-HIGH (WebSearch verified with official docs where available)

Telegram Inline Keyboards

Table Stakes

Feature Why Expected Complexity Dependencies
Callback button handling Core inline keyboard functionality - buttons must trigger actions Low Telegram Trigger already handles callback_query
answerCallbackQuery response Required by Telegram - clients show loading animation until answered (up to 1 minute) Low None
Edit message after button press Standard pattern - update existing message rather than send new one to reduce clutter Low None
Container action buttons Users expect tap-to-action for start/stop/restart without typing Medium Existing container matching logic
Status view with action buttons Show container list with inline buttons for each container Medium Existing status command

Differentiators

Feature Value Proposition Complexity Dependencies
Confirmation dialogs for dangerous actions "Are you sure?" before stop/restart/update prevents accidental actions Low None - edit message with Yes/No buttons
Contextual button removal Remove buttons after action completes (prevents double-tap issues) Low None
Dynamic container list keyboards Generate buttons based on actual running containers Medium Container listing logic
Progress indicators via message edit Update message with "Updating..." then "Complete" states Low None
Pagination for many containers "Next page" button when >8-10 containers Medium None

Anti-features

Anti-Feature Why Avoid What to Do Instead
Reply keyboards for actions Takes over user keyboard space, sends visible messages to chat Use inline keyboards attached to bot messages
More than 5 buttons per row Wraps poorly on mobile/desktop, breaks muscle memory Max 3-4 buttons per row for container actions
Complex callback_data structures 64-byte limit, easy to exceed with JSON Use short action codes: start_plex, stop_sonarr
Buttons without feedback Users think tap didn't work, tap again Always answerCallbackQuery, even for errors
Auto-refreshing keyboards High API traffic, rate limiting risk Refresh on explicit user action only

Implementation Notes

Critical constraint: callback_data is limited to 64 bytes. Use short codes like action:containername rather than JSON structures.

n8n native node limitation: The Telegram node doesn't support dynamic inline keyboards well. Workaround is HTTP Request node calling Telegram Bot API directly for sendMessage with reply_markup parameter.

Pattern for confirmations:

  1. User taps "Stop plex"
  2. Edit message: "Stop plex container?" with [Yes] [Cancel] buttons
  3. User taps Yes -> perform action, edit message with result, remove buttons
  4. User taps Cancel -> edit message back to original state

Sources:


Batch Operations

Table Stakes

Feature Why Expected Complexity Dependencies
Update multiple specified containers Core batch use case - update plex sonarr radarr Medium Existing update logic, loop handling
Sequential execution Process one at a time to avoid resource contention Low None
Per-container status feedback "Updated plex... Updated sonarr..." progress Low Existing message sending
Error handling per container One failure shouldn't abort the batch Low Try-catch per iteration
Final summary message "3 updated, 1 failed: jellyfin" Low Accumulator pattern

Differentiators

Feature Value Proposition Complexity Dependencies
"Update all" command Single command to update everything (with confirmation) Medium Container listing
"Update all except X" Exclude specific containers from batch Medium Exclusion pattern
Parallel status checks Check which containers have updates available first Medium None
Batch operation confirmation Show what will happen before doing it Low Keyboard buttons
Cancel mid-batch Stop processing remaining containers High State management

Anti-features

Anti-Feature Why Avoid What to Do Instead
Parallel container updates Resource contention, disk I/O saturation, network bandwidth Sequential with progress feedback
Silent batch operations User thinks bot is frozen during long batch Send progress message per container
Update without checking first Wastes time on already-updated containers Check for updates, report "3 containers have updates"
Auto-update on schedule Out of scope - user might be using system when update causes downtime User-initiated only, this is reactive tool

Implementation Notes

Existing update flow: Current implementation pulls image, recreates container, cleans up old image. Batch needs to wrap this in a loop.

Progress pattern:

User: update all
Bot: Found 5 containers with updates. Update now? [Yes] [Cancel]
User: Yes
Bot: Updating plex (1/5)...
Bot: (edit) Updated plex. Updating sonarr (2/5)...
...
Bot: (edit) Batch complete: 5 updated, 0 failed.

Watchtower-style options (NOT recommended for this bot):

  • Watchtower does automatic updates on schedule
  • This bot is intentionally reactive (user asks, bot does)
  • Automation can cause downtime at bad times

Sources:


Development API Workflow

Table Stakes

Feature Why Expected Complexity Dependencies
API key authentication Standard n8n API auth method Low n8n configuration
Get workflow by ID Read current workflow JSON Low n8n REST API
Update workflow Push modified workflow back Low n8n REST API
Activate/deactivate workflow Turn workflow on/off programmatically Low n8n REST API
Get execution list See recent runs Low n8n REST API
Get execution details/logs Debug failed executions Low n8n REST API

Differentiators

Feature Value Proposition Complexity Dependencies
Execute workflow on demand Trigger test run via API Medium n8n REST API with test data
Version comparison Diff local vs deployed workflow High JSON diff tooling
Backup before update Save current version before pushing changes Low File system or git
Rollback capability Restore previous version on failure Medium Version history
MCP integration Claude Code can manage workflows via MCP High MCP server setup

Anti-features

Anti-Feature Why Avoid What to Do Instead
Direct n8n database access Bypasses API, can corrupt state Use REST API only
Credential exposure via API API returns credential IDs, not values Never try to extract credential values
Auto-deploy on git push Adds CI/CD complexity, not needed for single-user Manual deploy via API call
Real-time workflow editing n8n UI is better for this API for read/bulk operations only

Implementation Notes

n8n REST API key endpoints:

Operation Method Endpoint
List workflows GET /api/v1/workflows
Get workflow GET /api/v1/workflows/{id}
Update workflow PUT /api/v1/workflows/{id}
Activate POST /api/v1/workflows/{id}/activate
Deactivate POST /api/v1/workflows/{id}/deactivate
List executions GET /api/v1/executions
Get execution GET /api/v1/executions/{id}
Execute workflow POST /rest/workflows/{id}/run

Authentication: Header X-N8N-API-KEY: your_api_key

Workflow structure: n8n workflows are JSON documents (~3,200 lines for this bot). Key sections:

  • nodes[] - Array of workflow nodes
  • connections - How nodes connect
  • settings - Workflow-level settings

MCP option: There's an unofficial n8n MCP server (makafeli/n8n-workflow-builder) that could enable Claude Code to manage workflows directly, but this adds complexity. Standard REST API is simpler for v1.1.

Sources:


Update Notification Sync

Table Stakes

Feature Why Expected Complexity Dependencies
Update clears bot's "update available" state Bot should know container is now current Low Already works - re-check after update
Accurate update status reporting Status command shows which have updates Medium Image digest comparison

Differentiators

Feature Value Proposition Complexity Dependencies
Sync with Unraid UI Clear "update available" badge in Unraid web UI High Unraid API or file manipulation
Pre-update check Show what version you're on, what version available Medium Image tag inspection
Update notification to user "3 containers have updates available" proactive message Medium Scheduled check, notification logic

Anti-features

Anti-Feature Why Avoid What to Do Instead
Taking over Unraid notifications Explicitly out of scope per PROJECT.md Keep Unraid notifications, bot is for control
Proactive monitoring Bot is reactive per PROJECT.md User checks status manually
Blocking Unraid auto-updates User may want both systems Coexist with Unraid's own update mechanism

Implementation Notes

The core problem: When you update a container via the bot (or Watchtower), Unraid's web UI may still show "update available" because it has its own tracking.

Unraid update status file: /var/lib/docker/unraid-update-status.json

  • This file tracks which containers have updates
  • Deleting it forces Unraid to recheck
  • Can also trigger recheck via: Settings > Docker > Check for Updates

Unraid API (v7.2+):

  • GraphQL API for Docker containers
  • Can query container status
  • Mutations for notifications exist
  • API key auth: x-api-key header

Practical approach for v1.1:

  1. Minimum: Document that Unraid UI may lag behind - user can click "Check for Updates" in Unraid
  2. Better: After bot update, delete /var/lib/docker/unraid-update-status.json to force Unraid recheck
  3. Best (requires Unraid 7.2+): Use Unraid GraphQL API to clear notification state

Known issue: Users report Unraid shows "update ready" even after container is updated. This is a known Unraid bug where it only checks for new updates, not whether containers are now current.

Sources:


Docker Socket Security

Table Stakes

Feature Why Expected Complexity Dependencies
Remove direct socket from internet-exposed n8n Security requirement per PROJECT.md scope Medium Socket proxy setup
Maintain all existing functionality Bot should work identically after security change Medium API compatibility
Container start/stop/restart/update Core actions must still work Low Proxy allows these APIs
Container list/inspect Status command must still work Low Proxy allows read APIs
Image pull Update command needs this Low Proxy configuration

Differentiators

Feature Value Proposition Complexity Dependencies
Granular API restrictions Only allow APIs the bot actually uses Low Socket proxy env vars
Block dangerous APIs Prevent exec, create, system commands Low Socket proxy defaults
Audit logging Log all Docker API calls through proxy Medium Proxy logging config

Anti-features

Anti-Feature Why Avoid What to Do Instead
Read-only socket mount (:ro) Doesn't actually protect - socket as pipe stays writable Use proper socket proxy
Direct socket access from internet-facing container Full root access if n8n is compromised Socket proxy isolates access
Allowing exec API Enables arbitrary command execution in containers Block exec in proxy
Allowing create/network APIs Bot doesn't need to create containers Block creation APIs

Implementation Notes

Recommended: Tecnativa/docker-socket-proxy or LinuxServer.io/docker-socket-proxy

Both provide HAProxy-based filtering of Docker API requests.

Minimal proxy configuration for this bot:

# docker-compose.yml
services:
  socket-proxy:
    image: tecnativa/docker-socket-proxy
    environment:
      - CONTAINERS=1      # List/inspect containers
      - IMAGES=1          # Pull images
      - POST=1            # Allow write operations
      - SERVICES=0        # Swarm services (not needed)
      - TASKS=0           # Swarm tasks (not needed)
      - NETWORKS=0        # Network management (not needed)
      - VOLUMES=0         # Volume management (not needed)
      - EXEC=0            # CRITICAL: Block exec
      - BUILD=0           # CRITICAL: Block build
      - COMMIT=0          # CRITICAL: Block commit
      - SECRETS=0         # CRITICAL: Block secrets
      - CONFIGS=0         # CRITICAL: Block configs
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    networks:
      - docker-proxy

  n8n:
    # ... existing config ...
    environment:
      - DOCKER_HOST=tcp://socket-proxy:2375
    networks:
      - docker-proxy
      # Plus existing networks

Key security benefits:

  1. n8n no longer has direct socket access
  2. Only whitelisted API categories are available
  3. EXEC=0 prevents arbitrary command execution
  4. Proxy is on internal network only, not internet-exposed

Migration path:

  1. Deploy socket-proxy container
  2. Update n8n to use DOCKER_HOST=tcp://socket-proxy:2375
  3. Remove direct socket mount from n8n
  4. Test all bot commands still work

Sources:


Feature Summary Table

Feature Complexity Dependencies Priority Notes
Inline Keyboards
Basic callback handling Low Existing trigger Must Have Foundation for all buttons
Container action buttons Medium Container matching Must Have Core UX improvement
Confirmation dialogs Low None Should Have Prevents accidents
Dynamic keyboard generation Medium HTTP Request node Must Have n8n native node limitation workaround
Batch Operations
Update multiple containers Medium Existing update Must Have Sequential with progress
"Update all" command Medium Container listing Should Have With confirmation
Per-container feedback Low None Must Have Progress visibility
n8n API
API key setup Low n8n config Must Have Enable programmatic access
Read workflow Low REST API Must Have Development workflow
Update workflow Low REST API Must Have Development workflow
Activate/deactivate Low REST API Should Have Testing workflow
Update Sync
Delete status file Low SSH/exec access Should Have Simple Unraid sync
Unraid GraphQL API High Unraid 7.2+, API key Nice to Have Requires version check
Security
Socket proxy deployment Medium New container Must Have Security requirement
API restriction config Low Proxy env vars Must Have Minimize attack surface
Migration testing Low All commands Must Have Verify no regression

MVP Recommendation for v1.1

Phase 1: Foundation (Must Have)

  1. Docker socket security via proxy - security first
  2. n8n API access setup - enables faster development
  3. Basic inline keyboard infrastructure - callback handling

Phase 2: UX Improvements (Should Have) 4. Container action buttons from status view 5. Confirmation dialogs for stop/update actions 6. Batch update with progress feedback

Phase 3: Polish (Nice to Have) 7. Unraid update status sync (file deletion method) 8. "Update all" convenience command

Confidence Assessment

Area Confidence Reason
Telegram Inline Keyboards HIGH Official Telegram docs + n8n docs verified
Batch Operations MEDIUM-HIGH Standard Docker patterns, well-documented
n8n API MEDIUM API exists but detailed endpoint docs required fetching
Unraid Update Sync MEDIUM Community knowledge, API docs limited
Docker Socket Security HIGH Well-documented proxy solutions

Gaps to Address in Phase Planning

  1. Exact n8n API endpoints - Need to verify full endpoint list during implementation
  2. Unraid version compatibility - GraphQL API requires Unraid 7.2+, need version check
  3. n8n Telegram node workarounds - HTTP Request approach needs testing
  4. Socket proxy on Unraid - Deployment specifics for Unraid environment