docs: complete v1.1 research (4 researchers + synthesis)
Files: - STACK.md: Socket proxy, n8n API, Telegram keyboards - FEATURES.md: Table stakes, differentiators, MVP scope - ARCHITECTURE.md: Integration points, data flow changes - PITFALLS.md: Top 5 risks with prevention strategies - SUMMARY.md: Executive summary, build order, confidence Key findings: - Stack: LinuxServer socket-proxy, HTTP Request nodes for keyboards - Architecture: TCP curl migration (~15 nodes), new callback routes - Critical pitfall: Socket proxy breaks existing curl commands Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,381 @@
|
||||
# Features Research: v1.1
|
||||
|
||||
**Domain:** Telegram Bot for Docker Container Management
|
||||
**Researched:** 2026-02-02
|
||||
**Confidence:** MEDIUM-HIGH (WebSearch verified with official docs where available)
|
||||
|
||||
## Telegram Inline Keyboards
|
||||
|
||||
### Table Stakes
|
||||
|
||||
| Feature | Why Expected | Complexity | Dependencies |
|
||||
|---------|--------------|------------|--------------|
|
||||
| Callback button handling | Core inline keyboard functionality - buttons must trigger actions | Low | Telegram Trigger already handles callback_query |
|
||||
| answerCallbackQuery response | Required by Telegram - clients show loading animation until answered (up to 1 minute) | Low | None |
|
||||
| Edit message after button press | Standard pattern - update existing message rather than send new one to reduce clutter | Low | None |
|
||||
| Container action buttons | Users expect tap-to-action for start/stop/restart without typing | Medium | Existing container matching logic |
|
||||
| Status view with action buttons | Show container list with inline buttons for each container | Medium | Existing status command |
|
||||
|
||||
### Differentiators
|
||||
|
||||
| Feature | Value Proposition | Complexity | Dependencies |
|
||||
|---------|-------------------|------------|--------------|
|
||||
| Confirmation dialogs for dangerous actions | "Are you sure?" before stop/restart/update prevents accidental actions | Low | None - edit message with Yes/No buttons |
|
||||
| Contextual button removal | Remove buttons after action completes (prevents double-tap issues) | Low | None |
|
||||
| Dynamic container list keyboards | Generate buttons based on actual running containers | Medium | Container listing logic |
|
||||
| Progress indicators via message edit | Update message with "Updating..." then "Complete" states | Low | None |
|
||||
| Pagination for many containers | "Next page" button when >8-10 containers | Medium | None |
|
||||
|
||||
### Anti-features
|
||||
|
||||
| Anti-Feature | Why Avoid | What to Do Instead |
|
||||
|--------------|-----------|-------------------|
|
||||
| Reply keyboards for actions | Takes over user keyboard space, sends visible messages to chat | Use inline keyboards attached to bot messages |
|
||||
| More than 5 buttons per row | Wraps poorly on mobile/desktop, breaks muscle memory | Max 3-4 buttons per row for container actions |
|
||||
| Complex callback_data structures | 64-byte limit, easy to exceed with JSON | Use short action codes: `start_plex`, `stop_sonarr` |
|
||||
| Buttons without feedback | Users think tap didn't work, tap again | Always answerCallbackQuery, even for errors |
|
||||
| Auto-refreshing keyboards | High API traffic, rate limiting risk | Refresh on explicit user action only |
|
||||
|
||||
### Implementation Notes
|
||||
|
||||
**Critical constraint:** callback_data is limited to 64 bytes. Use short codes like `action:containername` rather than JSON structures.
|
||||
|
||||
**n8n native node limitation:** The Telegram node doesn't support dynamic inline keyboards well. Workaround is HTTP Request node calling Telegram Bot API directly for `sendMessage` with `reply_markup` parameter.
|
||||
|
||||
**Pattern for confirmations:**
|
||||
1. User taps "Stop plex"
|
||||
2. Edit message: "Stop plex container?" with [Yes] [Cancel] buttons
|
||||
3. User taps Yes -> perform action, edit message with result, remove buttons
|
||||
4. User taps Cancel -> edit message back to original state
|
||||
|
||||
**Sources:**
|
||||
- [Telegram Bot Features](https://core.telegram.org/bots/features) (HIGH confidence)
|
||||
- [Telegram Bot API Buttons](https://core.telegram.org/api/bots/buttons) (HIGH confidence)
|
||||
- [n8n Telegram Callback Operations](https://docs.n8n.io/integrations/builtin/app-nodes/n8n-nodes-base.telegram/callback-operations/) (HIGH confidence)
|
||||
- [n8n Community: Dynamic Inline Keyboard](https://community.n8n.io/t/dynamic-inline-keyboard-for-telegram-bot/86568) (MEDIUM confidence)
|
||||
|
||||
---
|
||||
|
||||
## Batch Operations
|
||||
|
||||
### Table Stakes
|
||||
|
||||
| Feature | Why Expected | Complexity | Dependencies |
|
||||
|---------|--------------|------------|--------------|
|
||||
| Update multiple specified containers | Core batch use case - `update plex sonarr radarr` | Medium | Existing update logic, loop handling |
|
||||
| Sequential execution | Process one at a time to avoid resource contention | Low | None |
|
||||
| Per-container status feedback | "Updated plex... Updated sonarr..." progress | Low | Existing message sending |
|
||||
| Error handling per container | One failure shouldn't abort the batch | Low | Try-catch per iteration |
|
||||
| Final summary message | "3 updated, 1 failed: jellyfin" | Low | Accumulator pattern |
|
||||
|
||||
### Differentiators
|
||||
|
||||
| Feature | Value Proposition | Complexity | Dependencies |
|
||||
|---------|-------------------|------------|--------------|
|
||||
| "Update all" command | Single command to update everything (with confirmation) | Medium | Container listing |
|
||||
| "Update all except X" | Exclude specific containers from batch | Medium | Exclusion pattern |
|
||||
| Parallel status checks | Check which containers have updates available first | Medium | None |
|
||||
| Batch operation confirmation | Show what will happen before doing it | Low | Keyboard buttons |
|
||||
| Cancel mid-batch | Stop processing remaining containers | High | State management |
|
||||
|
||||
### Anti-features
|
||||
|
||||
| Anti-Feature | Why Avoid | What to Do Instead |
|
||||
|--------------|-----------|-------------------|
|
||||
| Parallel container updates | Resource contention, disk I/O saturation, network bandwidth | Sequential with progress feedback |
|
||||
| Silent batch operations | User thinks bot is frozen during long batch | Send progress message per container |
|
||||
| Update without checking first | Wastes time on already-updated containers | Check for updates, report "3 containers have updates" |
|
||||
| Auto-update on schedule | Out of scope - user might be using system when update causes downtime | User-initiated only, this is reactive tool |
|
||||
|
||||
### Implementation Notes
|
||||
|
||||
**Existing update flow:** Current implementation pulls image, recreates container, cleans up old image. Batch needs to wrap this in a loop.
|
||||
|
||||
**Progress pattern:**
|
||||
```
|
||||
User: update all
|
||||
Bot: Found 5 containers with updates. Update now? [Yes] [Cancel]
|
||||
User: Yes
|
||||
Bot: Updating plex (1/5)...
|
||||
Bot: (edit) Updated plex. Updating sonarr (2/5)...
|
||||
...
|
||||
Bot: (edit) Batch complete: 5 updated, 0 failed.
|
||||
```
|
||||
|
||||
**Watchtower-style options (NOT recommended for this bot):**
|
||||
- Watchtower does automatic updates on schedule
|
||||
- This bot is intentionally reactive (user asks, bot does)
|
||||
- Automation can cause downtime at bad times
|
||||
|
||||
**Sources:**
|
||||
- [Watchtower Documentation](https://containrrr.dev/watchtower/) (HIGH confidence)
|
||||
- [Docker Multi-Container Apps](https://docs.docker.com/get-started/docker-concepts/running-containers/multi-container-applications/) (HIGH confidence)
|
||||
- [How to Update Docker Containers](https://phoenixnap.com/kb/update-docker-image-container) (MEDIUM confidence)
|
||||
|
||||
---
|
||||
|
||||
## Development API Workflow
|
||||
|
||||
### Table Stakes
|
||||
|
||||
| Feature | Why Expected | Complexity | Dependencies |
|
||||
|---------|--------------|------------|--------------|
|
||||
| API key authentication | Standard n8n API auth method | Low | n8n configuration |
|
||||
| Get workflow by ID | Read current workflow JSON | Low | n8n REST API |
|
||||
| Update workflow | Push modified workflow back | Low | n8n REST API |
|
||||
| Activate/deactivate workflow | Turn workflow on/off programmatically | Low | n8n REST API |
|
||||
| Get execution list | See recent runs | Low | n8n REST API |
|
||||
| Get execution details/logs | Debug failed executions | Low | n8n REST API |
|
||||
|
||||
### Differentiators
|
||||
|
||||
| Feature | Value Proposition | Complexity | Dependencies |
|
||||
|---------|-------------------|------------|--------------|
|
||||
| Execute workflow on demand | Trigger test run via API | Medium | n8n REST API with test data |
|
||||
| Version comparison | Diff local vs deployed workflow | High | JSON diff tooling |
|
||||
| Backup before update | Save current version before pushing changes | Low | File system or git |
|
||||
| Rollback capability | Restore previous version on failure | Medium | Version history |
|
||||
| MCP integration | Claude Code can manage workflows via MCP | High | MCP server setup |
|
||||
|
||||
### Anti-features
|
||||
|
||||
| Anti-Feature | Why Avoid | What to Do Instead |
|
||||
|--------------|-----------|-------------------|
|
||||
| Direct n8n database access | Bypasses API, can corrupt state | Use REST API only |
|
||||
| Credential exposure via API | API returns credential IDs, not values | Never try to extract credential values |
|
||||
| Auto-deploy on git push | Adds CI/CD complexity, not needed for single-user | Manual deploy via API call |
|
||||
| Real-time workflow editing | n8n UI is better for this | API for read/bulk operations only |
|
||||
|
||||
### Implementation Notes
|
||||
|
||||
**n8n REST API key endpoints:**
|
||||
|
||||
| Operation | Method | Endpoint |
|
||||
|-----------|--------|----------|
|
||||
| List workflows | GET | `/api/v1/workflows` |
|
||||
| Get workflow | GET | `/api/v1/workflows/{id}` |
|
||||
| Update workflow | PUT | `/api/v1/workflows/{id}` |
|
||||
| Activate | POST | `/api/v1/workflows/{id}/activate` |
|
||||
| Deactivate | POST | `/api/v1/workflows/{id}/deactivate` |
|
||||
| List executions | GET | `/api/v1/executions` |
|
||||
| Get execution | GET | `/api/v1/executions/{id}` |
|
||||
| Execute workflow | POST | `/rest/workflows/{id}/run` |
|
||||
|
||||
**Authentication:** Header `X-N8N-API-KEY: your_api_key`
|
||||
|
||||
**Workflow structure:** n8n workflows are JSON documents (~3,200 lines for this bot). Key sections:
|
||||
- `nodes[]` - Array of workflow nodes
|
||||
- `connections` - How nodes connect
|
||||
- `settings` - Workflow-level settings
|
||||
|
||||
**MCP option:** There's an unofficial n8n MCP server (makafeli/n8n-workflow-builder) that could enable Claude Code to manage workflows directly, but this adds complexity. Standard REST API is simpler for v1.1.
|
||||
|
||||
**Sources:**
|
||||
- [n8n API Documentation](https://docs.n8n.io/api/) (HIGH confidence)
|
||||
- [n8n API Reference](https://docs.n8n.io/api/api-reference/) (HIGH confidence)
|
||||
- [n8n Workflow Manager API Template](https://n8n.io/workflows/4166-n8n-workflow-manager-api/) (MEDIUM confidence)
|
||||
- [Python n8n API Guide](https://martinuke0.github.io/posts/2025-12-10-a-detailed-guide-to-using-the-n8n-api-with-python/) (MEDIUM confidence)
|
||||
|
||||
---
|
||||
|
||||
## Update Notification Sync
|
||||
|
||||
### Table Stakes
|
||||
|
||||
| Feature | Why Expected | Complexity | Dependencies |
|
||||
|---------|--------------|------------|--------------|
|
||||
| Update clears bot's "update available" state | Bot should know container is now current | Low | Already works - re-check after update |
|
||||
| Accurate update status reporting | Status command shows which have updates | Medium | Image digest comparison |
|
||||
|
||||
### Differentiators
|
||||
|
||||
| Feature | Value Proposition | Complexity | Dependencies |
|
||||
|---------|-------------------|------------|--------------|
|
||||
| Sync with Unraid UI | Clear "update available" badge in Unraid web UI | High | Unraid API or file manipulation |
|
||||
| Pre-update check | Show what version you're on, what version available | Medium | Image tag inspection |
|
||||
| Update notification to user | "3 containers have updates available" proactive message | Medium | Scheduled check, notification logic |
|
||||
|
||||
### Anti-features
|
||||
|
||||
| Anti-Feature | Why Avoid | What to Do Instead |
|
||||
|--------------|-----------|-------------------|
|
||||
| Taking over Unraid notifications | Explicitly out of scope per PROJECT.md | Keep Unraid notifications, bot is for control |
|
||||
| Proactive monitoring | Bot is reactive per PROJECT.md | User checks status manually |
|
||||
| Blocking Unraid auto-updates | User may want both systems | Coexist with Unraid's own update mechanism |
|
||||
|
||||
### Implementation Notes
|
||||
|
||||
**The core problem:** When you update a container via the bot (or Watchtower), Unraid's web UI may still show "update available" because it has its own tracking.
|
||||
|
||||
**Unraid update status file:** `/var/lib/docker/unraid-update-status.json`
|
||||
- This file tracks which containers have updates
|
||||
- Deleting it forces Unraid to recheck
|
||||
- Can also trigger recheck via: Settings > Docker > Check for Updates
|
||||
|
||||
**Unraid API (v7.2+):**
|
||||
- GraphQL API for Docker containers
|
||||
- Can query container status
|
||||
- Mutations for notifications exist
|
||||
- API key auth: `x-api-key` header
|
||||
|
||||
**Practical approach for v1.1:**
|
||||
1. **Minimum:** Document that Unraid UI may lag behind - user can click "Check for Updates" in Unraid
|
||||
2. **Better:** After bot update, delete `/var/lib/docker/unraid-update-status.json` to force Unraid recheck
|
||||
3. **Best (requires Unraid 7.2+):** Use Unraid GraphQL API to clear notification state
|
||||
|
||||
**Known issue:** Users report Unraid shows "update ready" even after container is updated. This is a known Unraid bug where it only checks for new updates, not whether containers are now current.
|
||||
|
||||
**Sources:**
|
||||
- [Unraid API Documentation](https://docs.unraid.net/API/how-to-use-the-api/) (HIGH confidence)
|
||||
- [Unraid Docker Integration DeepWiki](https://deepwiki.com/unraid/api/2.4.1-docker-integration) (MEDIUM confidence)
|
||||
- [Watchtower + Unraid Discussion](https://github.com/containrrr/watchtower/discussions/1389) (MEDIUM confidence)
|
||||
- [Unraid Forum: Update Badge Issues](https://forums.unraid.net/topic/157820-docker-shows-update-ready-after-updating/) (MEDIUM confidence)
|
||||
|
||||
---
|
||||
|
||||
## Docker Socket Security
|
||||
|
||||
### Table Stakes
|
||||
|
||||
| Feature | Why Expected | Complexity | Dependencies |
|
||||
|---------|--------------|------------|--------------|
|
||||
| Remove direct socket from internet-exposed n8n | Security requirement per PROJECT.md scope | Medium | Socket proxy setup |
|
||||
| Maintain all existing functionality | Bot should work identically after security change | Medium | API compatibility |
|
||||
| Container start/stop/restart/update | Core actions must still work | Low | Proxy allows these APIs |
|
||||
| Container list/inspect | Status command must still work | Low | Proxy allows read APIs |
|
||||
| Image pull | Update command needs this | Low | Proxy configuration |
|
||||
|
||||
### Differentiators
|
||||
|
||||
| Feature | Value Proposition | Complexity | Dependencies |
|
||||
|---------|-------------------|------------|--------------|
|
||||
| Granular API restrictions | Only allow APIs the bot actually uses | Low | Socket proxy env vars |
|
||||
| Block dangerous APIs | Prevent exec, create, system commands | Low | Socket proxy defaults |
|
||||
| Audit logging | Log all Docker API calls through proxy | Medium | Proxy logging config |
|
||||
|
||||
### Anti-features
|
||||
|
||||
| Anti-Feature | Why Avoid | What to Do Instead |
|
||||
|--------------|-----------|-------------------|
|
||||
| Read-only socket mount (:ro) | Doesn't actually protect - socket as pipe stays writable | Use proper socket proxy |
|
||||
| Direct socket access from internet-facing container | Full root access if n8n is compromised | Socket proxy isolates access |
|
||||
| Allowing exec API | Enables arbitrary command execution in containers | Block exec in proxy |
|
||||
| Allowing create/network APIs | Bot doesn't need to create containers | Block creation APIs |
|
||||
|
||||
### Implementation Notes
|
||||
|
||||
**Recommended: Tecnativa/docker-socket-proxy or LinuxServer.io/docker-socket-proxy**
|
||||
|
||||
Both provide HAProxy-based filtering of Docker API requests.
|
||||
|
||||
**Minimal proxy configuration for this bot:**
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
services:
|
||||
socket-proxy:
|
||||
image: tecnativa/docker-socket-proxy
|
||||
environment:
|
||||
- CONTAINERS=1 # List/inspect containers
|
||||
- IMAGES=1 # Pull images
|
||||
- POST=1 # Allow write operations
|
||||
- SERVICES=0 # Swarm services (not needed)
|
||||
- TASKS=0 # Swarm tasks (not needed)
|
||||
- NETWORKS=0 # Network management (not needed)
|
||||
- VOLUMES=0 # Volume management (not needed)
|
||||
- EXEC=0 # CRITICAL: Block exec
|
||||
- BUILD=0 # CRITICAL: Block build
|
||||
- COMMIT=0 # CRITICAL: Block commit
|
||||
- SECRETS=0 # CRITICAL: Block secrets
|
||||
- CONFIGS=0 # CRITICAL: Block configs
|
||||
volumes:
|
||||
- /var/run/docker.sock:/var/run/docker.sock:ro
|
||||
networks:
|
||||
- docker-proxy
|
||||
|
||||
n8n:
|
||||
# ... existing config ...
|
||||
environment:
|
||||
- DOCKER_HOST=tcp://socket-proxy:2375
|
||||
networks:
|
||||
- docker-proxy
|
||||
# Plus existing networks
|
||||
```
|
||||
|
||||
**Key security benefits:**
|
||||
1. n8n no longer has direct socket access
|
||||
2. Only whitelisted API categories are available
|
||||
3. EXEC=0 prevents arbitrary command execution
|
||||
4. Proxy is on internal network only, not internet-exposed
|
||||
|
||||
**Migration path:**
|
||||
1. Deploy socket-proxy container
|
||||
2. Update n8n to use `DOCKER_HOST=tcp://socket-proxy:2375`
|
||||
3. Remove direct socket mount from n8n
|
||||
4. Test all bot commands still work
|
||||
|
||||
**Sources:**
|
||||
- [Tecnativa docker-socket-proxy](https://github.com/Tecnativa/docker-socket-proxy) (HIGH confidence)
|
||||
- [LinuxServer.io docker-socket-proxy](https://docs.linuxserver.io/images/docker-socket-proxy/) (HIGH confidence)
|
||||
- [Docker Socket Security Guide](https://www.paulsblog.dev/how-to-secure-your-docker-environment-by-using-a-docker-socket-proxy/) (MEDIUM confidence)
|
||||
|
||||
---
|
||||
|
||||
## Feature Summary Table
|
||||
|
||||
| Feature | Complexity | Dependencies | Priority | Notes |
|
||||
|---------|------------|--------------|----------|-------|
|
||||
| **Inline Keyboards** | | | | |
|
||||
| Basic callback handling | Low | Existing trigger | Must Have | Foundation for all buttons |
|
||||
| Container action buttons | Medium | Container matching | Must Have | Core UX improvement |
|
||||
| Confirmation dialogs | Low | None | Should Have | Prevents accidents |
|
||||
| Dynamic keyboard generation | Medium | HTTP Request node | Must Have | n8n native node limitation workaround |
|
||||
| **Batch Operations** | | | | |
|
||||
| Update multiple containers | Medium | Existing update | Must Have | Sequential with progress |
|
||||
| "Update all" command | Medium | Container listing | Should Have | With confirmation |
|
||||
| Per-container feedback | Low | None | Must Have | Progress visibility |
|
||||
| **n8n API** | | | | |
|
||||
| API key setup | Low | n8n config | Must Have | Enable programmatic access |
|
||||
| Read workflow | Low | REST API | Must Have | Development workflow |
|
||||
| Update workflow | Low | REST API | Must Have | Development workflow |
|
||||
| Activate/deactivate | Low | REST API | Should Have | Testing workflow |
|
||||
| **Update Sync** | | | | |
|
||||
| Delete status file | Low | SSH/exec access | Should Have | Simple Unraid sync |
|
||||
| Unraid GraphQL API | High | Unraid 7.2+, API key | Nice to Have | Requires version check |
|
||||
| **Security** | | | | |
|
||||
| Socket proxy deployment | Medium | New container | Must Have | Security requirement |
|
||||
| API restriction config | Low | Proxy env vars | Must Have | Minimize attack surface |
|
||||
| Migration testing | Low | All commands | Must Have | Verify no regression |
|
||||
|
||||
## MVP Recommendation for v1.1
|
||||
|
||||
**Phase 1: Foundation (Must Have)**
|
||||
1. Docker socket security via proxy - security first
|
||||
2. n8n API access setup - enables faster development
|
||||
3. Basic inline keyboard infrastructure - callback handling
|
||||
|
||||
**Phase 2: UX Improvements (Should Have)**
|
||||
4. Container action buttons from status view
|
||||
5. Confirmation dialogs for stop/update actions
|
||||
6. Batch update with progress feedback
|
||||
|
||||
**Phase 3: Polish (Nice to Have)**
|
||||
7. Unraid update status sync (file deletion method)
|
||||
8. "Update all" convenience command
|
||||
|
||||
## Confidence Assessment
|
||||
|
||||
| Area | Confidence | Reason |
|
||||
|------|------------|--------|
|
||||
| Telegram Inline Keyboards | HIGH | Official Telegram docs + n8n docs verified |
|
||||
| Batch Operations | MEDIUM-HIGH | Standard Docker patterns, well-documented |
|
||||
| n8n API | MEDIUM | API exists but detailed endpoint docs required fetching |
|
||||
| Unraid Update Sync | MEDIUM | Community knowledge, API docs limited |
|
||||
| Docker Socket Security | HIGH | Well-documented proxy solutions |
|
||||
|
||||
## Gaps to Address in Phase Planning
|
||||
|
||||
1. **Exact n8n API endpoints** - Need to verify full endpoint list during implementation
|
||||
2. **Unraid version compatibility** - GraphQL API requires Unraid 7.2+, need version check
|
||||
3. **n8n Telegram node workarounds** - HTTP Request approach needs testing
|
||||
4. **Socket proxy on Unraid** - Deployment specifics for Unraid environment
|
||||
Reference in New Issue
Block a user