Files

T

Lucas Berger 811030cee4 docs: complete v1.1 research (4 researchers + synthesis)

Files:
- STACK.md: Socket proxy, n8n API, Telegram keyboards
- FEATURES.md: Table stakes, differentiators, MVP scope
- ARCHITECTURE.md: Integration points, data flow changes
- PITFALLS.md: Top 5 risks with prevention strategies
- SUMMARY.md: Executive summary, build order, confidence

Key findings:
- Stack: LinuxServer socket-proxy, HTTP Request nodes for keyboards
- Architecture: TCP curl migration (~15 nodes), new callback routes
- Critical pitfall: Socket proxy breaks existing curl commands

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-02 22:09:06 -05:00

19 KiB

Raw Blame History

Features Research: v1.1

Domain: Telegram Bot for Docker Container Management Researched: 2026-02-02 Confidence: MEDIUM-HIGH (WebSearch verified with official docs where available)

Telegram Inline Keyboards

Table Stakes

Feature	Why Expected	Complexity	Dependencies
Callback button handling	Core inline keyboard functionality - buttons must trigger actions	Low	Telegram Trigger already handles callback_query
answerCallbackQuery response	Required by Telegram - clients show loading animation until answered (up to 1 minute)	Low	None
Edit message after button press	Standard pattern - update existing message rather than send new one to reduce clutter	Low	None
Container action buttons	Users expect tap-to-action for start/stop/restart without typing	Medium	Existing container matching logic
Status view with action buttons	Show container list with inline buttons for each container	Medium	Existing status command

Differentiators

Feature	Value Proposition	Complexity	Dependencies
Confirmation dialogs for dangerous actions	"Are you sure?" before stop/restart/update prevents accidental actions	Low	None - edit message with Yes/No buttons
Contextual button removal	Remove buttons after action completes (prevents double-tap issues)	Low	None
Dynamic container list keyboards	Generate buttons based on actual running containers	Medium	Container listing logic
Progress indicators via message edit	Update message with "Updating..." then "Complete" states	Low	None
Pagination for many containers	"Next page" button when >8-10 containers	Medium	None

Anti-features

Anti-Feature	Why Avoid	What to Do Instead
Reply keyboards for actions	Takes over user keyboard space, sends visible messages to chat	Use inline keyboards attached to bot messages
More than 5 buttons per row	Wraps poorly on mobile/desktop, breaks muscle memory	Max 3-4 buttons per row for container actions
Complex callback_data structures	64-byte limit, easy to exceed with JSON	Use short action codes: `start_plex`, `stop_sonarr`
Buttons without feedback	Users think tap didn't work, tap again	Always answerCallbackQuery, even for errors
Auto-refreshing keyboards	High API traffic, rate limiting risk	Refresh on explicit user action only

Implementation Notes

Critical constraint: callback_data is limited to 64 bytes. Use short codes like action:containername rather than JSON structures.

n8n native node limitation: The Telegram node doesn't support dynamic inline keyboards well. Workaround is HTTP Request node calling Telegram Bot API directly for sendMessage with reply_markup parameter.

Pattern for confirmations:

User taps "Stop plex"
Edit message: "Stop plex container?" with [Yes] [Cancel] buttons
User taps Yes -> perform action, edit message with result, remove buttons
User taps Cancel -> edit message back to original state

Sources:

Telegram Bot Features (HIGH confidence)
Telegram Bot API Buttons (HIGH confidence)
n8n Telegram Callback Operations (HIGH confidence)
n8n Community: Dynamic Inline Keyboard (MEDIUM confidence)

Batch Operations

Table Stakes

Feature	Why Expected	Complexity	Dependencies
Update multiple specified containers	Core batch use case - `update plex sonarr radarr`	Medium	Existing update logic, loop handling
Sequential execution	Process one at a time to avoid resource contention	Low	None
Per-container status feedback	"Updated plex... Updated sonarr..." progress	Low	Existing message sending
Error handling per container	One failure shouldn't abort the batch	Low	Try-catch per iteration
Final summary message	"3 updated, 1 failed: jellyfin"	Low	Accumulator pattern

Differentiators

Feature	Value Proposition	Complexity	Dependencies
"Update all" command	Single command to update everything (with confirmation)	Medium	Container listing
"Update all except X"	Exclude specific containers from batch	Medium	Exclusion pattern
Parallel status checks	Check which containers have updates available first	Medium	None
Batch operation confirmation	Show what will happen before doing it	Low	Keyboard buttons
Cancel mid-batch	Stop processing remaining containers	High	State management

Anti-features

Anti-Feature	Why Avoid	What to Do Instead
Parallel container updates	Resource contention, disk I/O saturation, network bandwidth	Sequential with progress feedback
Silent batch operations	User thinks bot is frozen during long batch	Send progress message per container
Update without checking first	Wastes time on already-updated containers	Check for updates, report "3 containers have updates"
Auto-update on schedule	Out of scope - user might be using system when update causes downtime	User-initiated only, this is reactive tool

Implementation Notes

Existing update flow: Current implementation pulls image, recreates container, cleans up old image. Batch needs to wrap this in a loop.

Progress pattern:

User: update all
Bot: Found 5 containers with updates. Update now? [Yes] [Cancel]
User: Yes
Bot: Updating plex (1/5)...
Bot: (edit) Updated plex. Updating sonarr (2/5)...
...
Bot: (edit) Batch complete: 5 updated, 0 failed.

Watchtower-style options (NOT recommended for this bot):

Watchtower does automatic updates on schedule
This bot is intentionally reactive (user asks, bot does)
Automation can cause downtime at bad times

Sources:

Watchtower Documentation (HIGH confidence)
Docker Multi-Container Apps (HIGH confidence)
How to Update Docker Containers (MEDIUM confidence)

Development API Workflow

Table Stakes

Feature	Why Expected	Complexity	Dependencies
API key authentication	Standard n8n API auth method	Low	n8n configuration
Get workflow by ID	Read current workflow JSON	Low	n8n REST API
Update workflow	Push modified workflow back	Low	n8n REST API
Activate/deactivate workflow	Turn workflow on/off programmatically	Low	n8n REST API
Get execution list	See recent runs	Low	n8n REST API
Get execution details/logs	Debug failed executions	Low	n8n REST API

Differentiators

Feature	Value Proposition	Complexity	Dependencies
Execute workflow on demand	Trigger test run via API	Medium	n8n REST API with test data
Version comparison	Diff local vs deployed workflow	High	JSON diff tooling
Backup before update	Save current version before pushing changes	Low	File system or git
Rollback capability	Restore previous version on failure	Medium	Version history
MCP integration	Claude Code can manage workflows via MCP	High	MCP server setup

Anti-features

Anti-Feature	Why Avoid	What to Do Instead
Direct n8n database access	Bypasses API, can corrupt state	Use REST API only
Credential exposure via API	API returns credential IDs, not values	Never try to extract credential values
Auto-deploy on git push	Adds CI/CD complexity, not needed for single-user	Manual deploy via API call
Real-time workflow editing	n8n UI is better for this	API for read/bulk operations only

Implementation Notes

n8n REST API key endpoints:

Operation	Method	Endpoint
List workflows	GET	`/api/v1/workflows`
Get workflow	GET	`/api/v1/workflows/{id}`
Update workflow	PUT	`/api/v1/workflows/{id}`
Activate	POST	`/api/v1/workflows/{id}/activate`
Deactivate	POST	`/api/v1/workflows/{id}/deactivate`
List executions	GET	`/api/v1/executions`
Get execution	GET	`/api/v1/executions/{id}`
Execute workflow	POST	`/rest/workflows/{id}/run`

Authentication: Header X-N8N-API-KEY: your_api_key

Workflow structure: n8n workflows are JSON documents (~3,200 lines for this bot). Key sections:

nodes[] - Array of workflow nodes
connections - How nodes connect
settings - Workflow-level settings

MCP option: There's an unofficial n8n MCP server (makafeli/n8n-workflow-builder) that could enable Claude Code to manage workflows directly, but this adds complexity. Standard REST API is simpler for v1.1.

Sources:

n8n API Documentation (HIGH confidence)
n8n API Reference (HIGH confidence)
n8n Workflow Manager API Template (MEDIUM confidence)
Python n8n API Guide (MEDIUM confidence)

Update Notification Sync

Table Stakes

Feature	Why Expected	Complexity	Dependencies
Update clears bot's "update available" state	Bot should know container is now current	Low	Already works - re-check after update
Accurate update status reporting	Status command shows which have updates	Medium	Image digest comparison

Differentiators

Feature	Value Proposition	Complexity	Dependencies
Sync with Unraid UI	Clear "update available" badge in Unraid web UI	High	Unraid API or file manipulation
Pre-update check	Show what version you're on, what version available	Medium	Image tag inspection
Update notification to user	"3 containers have updates available" proactive message	Medium	Scheduled check, notification logic

Anti-features

Anti-Feature	Why Avoid	What to Do Instead
Taking over Unraid notifications	Explicitly out of scope per PROJECT.md	Keep Unraid notifications, bot is for control
Proactive monitoring	Bot is reactive per PROJECT.md	User checks status manually
Blocking Unraid auto-updates	User may want both systems	Coexist with Unraid's own update mechanism

Implementation Notes

The core problem: When you update a container via the bot (or Watchtower), Unraid's web UI may still show "update available" because it has its own tracking.

Unraid update status file: /var/lib/docker/unraid-update-status.json

This file tracks which containers have updates
Deleting it forces Unraid to recheck
Can also trigger recheck via: Settings > Docker > Check for Updates

Unraid API (v7.2+):

GraphQL API for Docker containers
Can query container status
Mutations for notifications exist
API key auth: x-api-key header

Practical approach for v1.1:

Minimum: Document that Unraid UI may lag behind - user can click "Check for Updates" in Unraid
Better: After bot update, delete /var/lib/docker/unraid-update-status.json to force Unraid recheck
Best (requires Unraid 7.2+): Use Unraid GraphQL API to clear notification state

Known issue: Users report Unraid shows "update ready" even after container is updated. This is a known Unraid bug where it only checks for new updates, not whether containers are now current.

Sources:

Unraid API Documentation (HIGH confidence)
Unraid Docker Integration DeepWiki (MEDIUM confidence)
Watchtower + Unraid Discussion (MEDIUM confidence)
Unraid Forum: Update Badge Issues (MEDIUM confidence)

Docker Socket Security

Table Stakes

Feature	Why Expected	Complexity	Dependencies
Remove direct socket from internet-exposed n8n	Security requirement per PROJECT.md scope	Medium	Socket proxy setup
Maintain all existing functionality	Bot should work identically after security change	Medium	API compatibility
Container start/stop/restart/update	Core actions must still work	Low	Proxy allows these APIs
Container list/inspect	Status command must still work	Low	Proxy allows read APIs
Image pull	Update command needs this	Low	Proxy configuration

Differentiators

Feature	Value Proposition	Complexity	Dependencies
Granular API restrictions	Only allow APIs the bot actually uses	Low	Socket proxy env vars
Block dangerous APIs	Prevent exec, create, system commands	Low	Socket proxy defaults
Audit logging	Log all Docker API calls through proxy	Medium	Proxy logging config

Anti-features

Anti-Feature	Why Avoid	What to Do Instead
Read-only socket mount (:ro)	Doesn't actually protect - socket as pipe stays writable	Use proper socket proxy
Direct socket access from internet-facing container	Full root access if n8n is compromised	Socket proxy isolates access
Allowing exec API	Enables arbitrary command execution in containers	Block exec in proxy
Allowing create/network APIs	Bot doesn't need to create containers	Block creation APIs

Implementation Notes

Recommended: Tecnativa/docker-socket-proxy or LinuxServer.io/docker-socket-proxy

Both provide HAProxy-based filtering of Docker API requests.

Minimal proxy configuration for this bot:

# docker-compose.yml
services:
  socket-proxy:
    image: tecnativa/docker-socket-proxy
    environment:
      - CONTAINERS=1      # List/inspect containers
      - IMAGES=1          # Pull images
      - POST=1            # Allow write operations
      - SERVICES=0        # Swarm services (not needed)
      - TASKS=0           # Swarm tasks (not needed)
      - NETWORKS=0        # Network management (not needed)
      - VOLUMES=0         # Volume management (not needed)
      - EXEC=0            # CRITICAL: Block exec
      - BUILD=0           # CRITICAL: Block build
      - COMMIT=0          # CRITICAL: Block commit
      - SECRETS=0         # CRITICAL: Block secrets
      - CONFIGS=0         # CRITICAL: Block configs
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    networks:
      - docker-proxy

  n8n:
    # ... existing config ...
    environment:
      - DOCKER_HOST=tcp://socket-proxy:2375
    networks:
      - docker-proxy
      # Plus existing networks

Key security benefits:

n8n no longer has direct socket access
Only whitelisted API categories are available
EXEC=0 prevents arbitrary command execution
Proxy is on internal network only, not internet-exposed

Migration path:

Deploy socket-proxy container
Update n8n to use DOCKER_HOST=tcp://socket-proxy:2375
Remove direct socket mount from n8n
Test all bot commands still work

Sources:

Tecnativa docker-socket-proxy (HIGH confidence)
LinuxServer.io docker-socket-proxy (HIGH confidence)
Docker Socket Security Guide (MEDIUM confidence)

Feature Summary Table

Feature	Complexity	Dependencies	Priority	Notes
Inline Keyboards
Basic callback handling	Low	Existing trigger	Must Have	Foundation for all buttons
Container action buttons	Medium	Container matching	Must Have	Core UX improvement
Confirmation dialogs	Low	None	Should Have	Prevents accidents
Dynamic keyboard generation	Medium	HTTP Request node	Must Have	n8n native node limitation workaround
Batch Operations
Update multiple containers	Medium	Existing update	Must Have	Sequential with progress
"Update all" command	Medium	Container listing	Should Have	With confirmation
Per-container feedback	Low	None	Must Have	Progress visibility
n8n API
API key setup	Low	n8n config	Must Have	Enable programmatic access
Read workflow	Low	REST API	Must Have	Development workflow
Update workflow	Low	REST API	Must Have	Development workflow
Activate/deactivate	Low	REST API	Should Have	Testing workflow
Update Sync
Delete status file	Low	SSH/exec access	Should Have	Simple Unraid sync
Unraid GraphQL API	High	Unraid 7.2+, API key	Nice to Have	Requires version check
Security
Socket proxy deployment	Medium	New container	Must Have	Security requirement
API restriction config	Low	Proxy env vars	Must Have	Minimize attack surface
Migration testing	Low	All commands	Must Have	Verify no regression

MVP Recommendation for v1.1

Phase 1: Foundation (Must Have)

Docker socket security via proxy - security first
n8n API access setup - enables faster development
Basic inline keyboard infrastructure - callback handling

Phase 2: UX Improvements (Should Have) 4. Container action buttons from status view 5. Confirmation dialogs for stop/update actions 6. Batch update with progress feedback

Phase 3: Polish (Nice to Have) 7. Unraid update status sync (file deletion method) 8. "Update all" convenience command

Confidence Assessment

Area	Confidence	Reason
Telegram Inline Keyboards	HIGH	Official Telegram docs + n8n docs verified
Batch Operations	MEDIUM-HIGH	Standard Docker patterns, well-documented
n8n API	MEDIUM	API exists but detailed endpoint docs required fetching
Unraid Update Sync	MEDIUM	Community knowledge, API docs limited
Docker Socket Security	HIGH	Well-documented proxy solutions

Gaps to Address in Phase Planning

Exact n8n API endpoints - Need to verify full endpoint list during implementation
Unraid version compatibility - GraphQL API requires Unraid 7.2+, need version check
n8n Telegram node workarounds - HTTP Request approach needs testing
Socket proxy on Unraid - Deployment specifics for Unraid environment

19 KiB Raw Blame History

Features Research: v1.1

Telegram Inline Keyboards

Table Stakes

Differentiators

Anti-features

Implementation Notes

Batch Operations

Table Stakes

Differentiators

Anti-features

Implementation Notes

Development API Workflow

Table Stakes

Differentiators

Anti-features

Implementation Notes

Update Notification Sync

Table Stakes

Differentiators

Anti-features

Implementation Notes

Docker Socket Security

Table Stakes

Differentiators

Anti-features

Implementation Notes

Feature Summary Table

MVP Recommendation for v1.1

Confidence Assessment

Gaps to Address in Phase Planning

19 KiB

Raw Blame History