docs: complete v1.1 research (4 researchers + synthesis)

Files:
- STACK.md: Socket proxy, n8n API, Telegram keyboards
- FEATURES.md: Table stakes, differentiators, MVP scope
- ARCHITECTURE.md: Integration points, data flow changes
- PITFALLS.md: Top 5 risks with prevention strategies
- SUMMARY.md: Executive summary, build order, confidence

Key findings:
- Stack: LinuxServer socket-proxy, HTTP Request nodes for keyboards
- Architecture: TCP curl migration (~15 nodes), new callback routes
- Critical pitfall: Socket proxy breaks existing curl commands

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Lucas Berger
2026-02-02 22:09:06 -05:00
parent ff289677ab
commit 811030cee4
5 changed files with 1614 additions and 0 deletions
+381
View File
@@ -0,0 +1,381 @@
# Features Research: v1.1
**Domain:** Telegram Bot for Docker Container Management
**Researched:** 2026-02-02
**Confidence:** MEDIUM-HIGH (WebSearch verified with official docs where available)
## Telegram Inline Keyboards
### Table Stakes
| Feature | Why Expected | Complexity | Dependencies |
|---------|--------------|------------|--------------|
| Callback button handling | Core inline keyboard functionality - buttons must trigger actions | Low | Telegram Trigger already handles callback_query |
| answerCallbackQuery response | Required by Telegram - clients show loading animation until answered (up to 1 minute) | Low | None |
| Edit message after button press | Standard pattern - update existing message rather than send new one to reduce clutter | Low | None |
| Container action buttons | Users expect tap-to-action for start/stop/restart without typing | Medium | Existing container matching logic |
| Status view with action buttons | Show container list with inline buttons for each container | Medium | Existing status command |
### Differentiators
| Feature | Value Proposition | Complexity | Dependencies |
|---------|-------------------|------------|--------------|
| Confirmation dialogs for dangerous actions | "Are you sure?" before stop/restart/update prevents accidental actions | Low | None - edit message with Yes/No buttons |
| Contextual button removal | Remove buttons after action completes (prevents double-tap issues) | Low | None |
| Dynamic container list keyboards | Generate buttons based on actual running containers | Medium | Container listing logic |
| Progress indicators via message edit | Update message with "Updating..." then "Complete" states | Low | None |
| Pagination for many containers | "Next page" button when >8-10 containers | Medium | None |
### Anti-features
| Anti-Feature | Why Avoid | What to Do Instead |
|--------------|-----------|-------------------|
| Reply keyboards for actions | Takes over user keyboard space, sends visible messages to chat | Use inline keyboards attached to bot messages |
| More than 5 buttons per row | Wraps poorly on mobile/desktop, breaks muscle memory | Max 3-4 buttons per row for container actions |
| Complex callback_data structures | 64-byte limit, easy to exceed with JSON | Use short action codes: `start_plex`, `stop_sonarr` |
| Buttons without feedback | Users think tap didn't work, tap again | Always answerCallbackQuery, even for errors |
| Auto-refreshing keyboards | High API traffic, rate limiting risk | Refresh on explicit user action only |
### Implementation Notes
**Critical constraint:** callback_data is limited to 64 bytes. Use short codes like `action:containername` rather than JSON structures.
**n8n native node limitation:** The Telegram node doesn't support dynamic inline keyboards well. Workaround is HTTP Request node calling Telegram Bot API directly for `sendMessage` with `reply_markup` parameter.
**Pattern for confirmations:**
1. User taps "Stop plex"
2. Edit message: "Stop plex container?" with [Yes] [Cancel] buttons
3. User taps Yes -> perform action, edit message with result, remove buttons
4. User taps Cancel -> edit message back to original state
**Sources:**
- [Telegram Bot Features](https://core.telegram.org/bots/features) (HIGH confidence)
- [Telegram Bot API Buttons](https://core.telegram.org/api/bots/buttons) (HIGH confidence)
- [n8n Telegram Callback Operations](https://docs.n8n.io/integrations/builtin/app-nodes/n8n-nodes-base.telegram/callback-operations/) (HIGH confidence)
- [n8n Community: Dynamic Inline Keyboard](https://community.n8n.io/t/dynamic-inline-keyboard-for-telegram-bot/86568) (MEDIUM confidence)
---
## Batch Operations
### Table Stakes
| Feature | Why Expected | Complexity | Dependencies |
|---------|--------------|------------|--------------|
| Update multiple specified containers | Core batch use case - `update plex sonarr radarr` | Medium | Existing update logic, loop handling |
| Sequential execution | Process one at a time to avoid resource contention | Low | None |
| Per-container status feedback | "Updated plex... Updated sonarr..." progress | Low | Existing message sending |
| Error handling per container | One failure shouldn't abort the batch | Low | Try-catch per iteration |
| Final summary message | "3 updated, 1 failed: jellyfin" | Low | Accumulator pattern |
### Differentiators
| Feature | Value Proposition | Complexity | Dependencies |
|---------|-------------------|------------|--------------|
| "Update all" command | Single command to update everything (with confirmation) | Medium | Container listing |
| "Update all except X" | Exclude specific containers from batch | Medium | Exclusion pattern |
| Parallel status checks | Check which containers have updates available first | Medium | None |
| Batch operation confirmation | Show what will happen before doing it | Low | Keyboard buttons |
| Cancel mid-batch | Stop processing remaining containers | High | State management |
### Anti-features
| Anti-Feature | Why Avoid | What to Do Instead |
|--------------|-----------|-------------------|
| Parallel container updates | Resource contention, disk I/O saturation, network bandwidth | Sequential with progress feedback |
| Silent batch operations | User thinks bot is frozen during long batch | Send progress message per container |
| Update without checking first | Wastes time on already-updated containers | Check for updates, report "3 containers have updates" |
| Auto-update on schedule | Out of scope - user might be using system when update causes downtime | User-initiated only, this is reactive tool |
### Implementation Notes
**Existing update flow:** Current implementation pulls image, recreates container, cleans up old image. Batch needs to wrap this in a loop.
**Progress pattern:**
```
User: update all
Bot: Found 5 containers with updates. Update now? [Yes] [Cancel]
User: Yes
Bot: Updating plex (1/5)...
Bot: (edit) Updated plex. Updating sonarr (2/5)...
...
Bot: (edit) Batch complete: 5 updated, 0 failed.
```
**Watchtower-style options (NOT recommended for this bot):**
- Watchtower does automatic updates on schedule
- This bot is intentionally reactive (user asks, bot does)
- Automation can cause downtime at bad times
**Sources:**
- [Watchtower Documentation](https://containrrr.dev/watchtower/) (HIGH confidence)
- [Docker Multi-Container Apps](https://docs.docker.com/get-started/docker-concepts/running-containers/multi-container-applications/) (HIGH confidence)
- [How to Update Docker Containers](https://phoenixnap.com/kb/update-docker-image-container) (MEDIUM confidence)
---
## Development API Workflow
### Table Stakes
| Feature | Why Expected | Complexity | Dependencies |
|---------|--------------|------------|--------------|
| API key authentication | Standard n8n API auth method | Low | n8n configuration |
| Get workflow by ID | Read current workflow JSON | Low | n8n REST API |
| Update workflow | Push modified workflow back | Low | n8n REST API |
| Activate/deactivate workflow | Turn workflow on/off programmatically | Low | n8n REST API |
| Get execution list | See recent runs | Low | n8n REST API |
| Get execution details/logs | Debug failed executions | Low | n8n REST API |
### Differentiators
| Feature | Value Proposition | Complexity | Dependencies |
|---------|-------------------|------------|--------------|
| Execute workflow on demand | Trigger test run via API | Medium | n8n REST API with test data |
| Version comparison | Diff local vs deployed workflow | High | JSON diff tooling |
| Backup before update | Save current version before pushing changes | Low | File system or git |
| Rollback capability | Restore previous version on failure | Medium | Version history |
| MCP integration | Claude Code can manage workflows via MCP | High | MCP server setup |
### Anti-features
| Anti-Feature | Why Avoid | What to Do Instead |
|--------------|-----------|-------------------|
| Direct n8n database access | Bypasses API, can corrupt state | Use REST API only |
| Credential exposure via API | API returns credential IDs, not values | Never try to extract credential values |
| Auto-deploy on git push | Adds CI/CD complexity, not needed for single-user | Manual deploy via API call |
| Real-time workflow editing | n8n UI is better for this | API for read/bulk operations only |
### Implementation Notes
**n8n REST API key endpoints:**
| Operation | Method | Endpoint |
|-----------|--------|----------|
| List workflows | GET | `/api/v1/workflows` |
| Get workflow | GET | `/api/v1/workflows/{id}` |
| Update workflow | PUT | `/api/v1/workflows/{id}` |
| Activate | POST | `/api/v1/workflows/{id}/activate` |
| Deactivate | POST | `/api/v1/workflows/{id}/deactivate` |
| List executions | GET | `/api/v1/executions` |
| Get execution | GET | `/api/v1/executions/{id}` |
| Execute workflow | POST | `/rest/workflows/{id}/run` |
**Authentication:** Header `X-N8N-API-KEY: your_api_key`
**Workflow structure:** n8n workflows are JSON documents (~3,200 lines for this bot). Key sections:
- `nodes[]` - Array of workflow nodes
- `connections` - How nodes connect
- `settings` - Workflow-level settings
**MCP option:** There's an unofficial n8n MCP server (makafeli/n8n-workflow-builder) that could enable Claude Code to manage workflows directly, but this adds complexity. Standard REST API is simpler for v1.1.
**Sources:**
- [n8n API Documentation](https://docs.n8n.io/api/) (HIGH confidence)
- [n8n API Reference](https://docs.n8n.io/api/api-reference/) (HIGH confidence)
- [n8n Workflow Manager API Template](https://n8n.io/workflows/4166-n8n-workflow-manager-api/) (MEDIUM confidence)
- [Python n8n API Guide](https://martinuke0.github.io/posts/2025-12-10-a-detailed-guide-to-using-the-n8n-api-with-python/) (MEDIUM confidence)
---
## Update Notification Sync
### Table Stakes
| Feature | Why Expected | Complexity | Dependencies |
|---------|--------------|------------|--------------|
| Update clears bot's "update available" state | Bot should know container is now current | Low | Already works - re-check after update |
| Accurate update status reporting | Status command shows which have updates | Medium | Image digest comparison |
### Differentiators
| Feature | Value Proposition | Complexity | Dependencies |
|---------|-------------------|------------|--------------|
| Sync with Unraid UI | Clear "update available" badge in Unraid web UI | High | Unraid API or file manipulation |
| Pre-update check | Show what version you're on, what version available | Medium | Image tag inspection |
| Update notification to user | "3 containers have updates available" proactive message | Medium | Scheduled check, notification logic |
### Anti-features
| Anti-Feature | Why Avoid | What to Do Instead |
|--------------|-----------|-------------------|
| Taking over Unraid notifications | Explicitly out of scope per PROJECT.md | Keep Unraid notifications, bot is for control |
| Proactive monitoring | Bot is reactive per PROJECT.md | User checks status manually |
| Blocking Unraid auto-updates | User may want both systems | Coexist with Unraid's own update mechanism |
### Implementation Notes
**The core problem:** When you update a container via the bot (or Watchtower), Unraid's web UI may still show "update available" because it has its own tracking.
**Unraid update status file:** `/var/lib/docker/unraid-update-status.json`
- This file tracks which containers have updates
- Deleting it forces Unraid to recheck
- Can also trigger recheck via: Settings > Docker > Check for Updates
**Unraid API (v7.2+):**
- GraphQL API for Docker containers
- Can query container status
- Mutations for notifications exist
- API key auth: `x-api-key` header
**Practical approach for v1.1:**
1. **Minimum:** Document that Unraid UI may lag behind - user can click "Check for Updates" in Unraid
2. **Better:** After bot update, delete `/var/lib/docker/unraid-update-status.json` to force Unraid recheck
3. **Best (requires Unraid 7.2+):** Use Unraid GraphQL API to clear notification state
**Known issue:** Users report Unraid shows "update ready" even after container is updated. This is a known Unraid bug where it only checks for new updates, not whether containers are now current.
**Sources:**
- [Unraid API Documentation](https://docs.unraid.net/API/how-to-use-the-api/) (HIGH confidence)
- [Unraid Docker Integration DeepWiki](https://deepwiki.com/unraid/api/2.4.1-docker-integration) (MEDIUM confidence)
- [Watchtower + Unraid Discussion](https://github.com/containrrr/watchtower/discussions/1389) (MEDIUM confidence)
- [Unraid Forum: Update Badge Issues](https://forums.unraid.net/topic/157820-docker-shows-update-ready-after-updating/) (MEDIUM confidence)
---
## Docker Socket Security
### Table Stakes
| Feature | Why Expected | Complexity | Dependencies |
|---------|--------------|------------|--------------|
| Remove direct socket from internet-exposed n8n | Security requirement per PROJECT.md scope | Medium | Socket proxy setup |
| Maintain all existing functionality | Bot should work identically after security change | Medium | API compatibility |
| Container start/stop/restart/update | Core actions must still work | Low | Proxy allows these APIs |
| Container list/inspect | Status command must still work | Low | Proxy allows read APIs |
| Image pull | Update command needs this | Low | Proxy configuration |
### Differentiators
| Feature | Value Proposition | Complexity | Dependencies |
|---------|-------------------|------------|--------------|
| Granular API restrictions | Only allow APIs the bot actually uses | Low | Socket proxy env vars |
| Block dangerous APIs | Prevent exec, create, system commands | Low | Socket proxy defaults |
| Audit logging | Log all Docker API calls through proxy | Medium | Proxy logging config |
### Anti-features
| Anti-Feature | Why Avoid | What to Do Instead |
|--------------|-----------|-------------------|
| Read-only socket mount (:ro) | Doesn't actually protect - socket as pipe stays writable | Use proper socket proxy |
| Direct socket access from internet-facing container | Full root access if n8n is compromised | Socket proxy isolates access |
| Allowing exec API | Enables arbitrary command execution in containers | Block exec in proxy |
| Allowing create/network APIs | Bot doesn't need to create containers | Block creation APIs |
### Implementation Notes
**Recommended: Tecnativa/docker-socket-proxy or LinuxServer.io/docker-socket-proxy**
Both provide HAProxy-based filtering of Docker API requests.
**Minimal proxy configuration for this bot:**
```yaml
# docker-compose.yml
services:
socket-proxy:
image: tecnativa/docker-socket-proxy
environment:
- CONTAINERS=1 # List/inspect containers
- IMAGES=1 # Pull images
- POST=1 # Allow write operations
- SERVICES=0 # Swarm services (not needed)
- TASKS=0 # Swarm tasks (not needed)
- NETWORKS=0 # Network management (not needed)
- VOLUMES=0 # Volume management (not needed)
- EXEC=0 # CRITICAL: Block exec
- BUILD=0 # CRITICAL: Block build
- COMMIT=0 # CRITICAL: Block commit
- SECRETS=0 # CRITICAL: Block secrets
- CONFIGS=0 # CRITICAL: Block configs
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
networks:
- docker-proxy
n8n:
# ... existing config ...
environment:
- DOCKER_HOST=tcp://socket-proxy:2375
networks:
- docker-proxy
# Plus existing networks
```
**Key security benefits:**
1. n8n no longer has direct socket access
2. Only whitelisted API categories are available
3. EXEC=0 prevents arbitrary command execution
4. Proxy is on internal network only, not internet-exposed
**Migration path:**
1. Deploy socket-proxy container
2. Update n8n to use `DOCKER_HOST=tcp://socket-proxy:2375`
3. Remove direct socket mount from n8n
4. Test all bot commands still work
**Sources:**
- [Tecnativa docker-socket-proxy](https://github.com/Tecnativa/docker-socket-proxy) (HIGH confidence)
- [LinuxServer.io docker-socket-proxy](https://docs.linuxserver.io/images/docker-socket-proxy/) (HIGH confidence)
- [Docker Socket Security Guide](https://www.paulsblog.dev/how-to-secure-your-docker-environment-by-using-a-docker-socket-proxy/) (MEDIUM confidence)
---
## Feature Summary Table
| Feature | Complexity | Dependencies | Priority | Notes |
|---------|------------|--------------|----------|-------|
| **Inline Keyboards** | | | | |
| Basic callback handling | Low | Existing trigger | Must Have | Foundation for all buttons |
| Container action buttons | Medium | Container matching | Must Have | Core UX improvement |
| Confirmation dialogs | Low | None | Should Have | Prevents accidents |
| Dynamic keyboard generation | Medium | HTTP Request node | Must Have | n8n native node limitation workaround |
| **Batch Operations** | | | | |
| Update multiple containers | Medium | Existing update | Must Have | Sequential with progress |
| "Update all" command | Medium | Container listing | Should Have | With confirmation |
| Per-container feedback | Low | None | Must Have | Progress visibility |
| **n8n API** | | | | |
| API key setup | Low | n8n config | Must Have | Enable programmatic access |
| Read workflow | Low | REST API | Must Have | Development workflow |
| Update workflow | Low | REST API | Must Have | Development workflow |
| Activate/deactivate | Low | REST API | Should Have | Testing workflow |
| **Update Sync** | | | | |
| Delete status file | Low | SSH/exec access | Should Have | Simple Unraid sync |
| Unraid GraphQL API | High | Unraid 7.2+, API key | Nice to Have | Requires version check |
| **Security** | | | | |
| Socket proxy deployment | Medium | New container | Must Have | Security requirement |
| API restriction config | Low | Proxy env vars | Must Have | Minimize attack surface |
| Migration testing | Low | All commands | Must Have | Verify no regression |
## MVP Recommendation for v1.1
**Phase 1: Foundation (Must Have)**
1. Docker socket security via proxy - security first
2. n8n API access setup - enables faster development
3. Basic inline keyboard infrastructure - callback handling
**Phase 2: UX Improvements (Should Have)**
4. Container action buttons from status view
5. Confirmation dialogs for stop/update actions
6. Batch update with progress feedback
**Phase 3: Polish (Nice to Have)**
7. Unraid update status sync (file deletion method)
8. "Update all" convenience command
## Confidence Assessment
| Area | Confidence | Reason |
|------|------------|--------|
| Telegram Inline Keyboards | HIGH | Official Telegram docs + n8n docs verified |
| Batch Operations | MEDIUM-HIGH | Standard Docker patterns, well-documented |
| n8n API | MEDIUM | API exists but detailed endpoint docs required fetching |
| Unraid Update Sync | MEDIUM | Community knowledge, API docs limited |
| Docker Socket Security | HIGH | Well-documented proxy solutions |
## Gaps to Address in Phase Planning
1. **Exact n8n API endpoints** - Need to verify full endpoint list during implementation
2. **Unraid version compatibility** - GraphQL API requires Unraid 7.2+, need version check
3. **n8n Telegram node workarounds** - HTTP Request approach needs testing
4. **Socket proxy on Unraid** - Deployment specifics for Unraid environment