diff --git a/.planning/phases/11-update-all-callback-limits/11-RESEARCH.md b/.planning/phases/11-update-all-callback-limits/11-RESEARCH.md new file mode 100644 index 0000000..3819946 --- /dev/null +++ b/.planning/phases/11-update-all-callback-limits/11-RESEARCH.md @@ -0,0 +1,505 @@ +# Phase 11: Update All & Callback Limits - Research + +**Researched:** 2026-02-08 +**Domain:** Telegram Bot API callback data optimization, n8n workflow state management +**Confidence:** HIGH + +## Summary + +Phase 11 adds "update all" functionality for :latest containers and fixes Telegram's 64-byte callback_data limit that currently restricts batch selection to ~2 containers. The main workflow already has partial "update all" implementation (text command routing, confirmation keyboard, :latest filtering) but lacks inline keyboard entry point. The critical blocker is the batch selection keyboard's callback_data format (`batch:toggle:0::plex` = 22 bytes + CSV of selected containers), which grows linearly with selection and hits the 64-byte limit after selecting 2-3 short-named containers. + +**Primary recommendation:** Replace CSV-in-callback approach with server-side state storage using n8n workflow static data to track batch selection state, reducing callback_data to fixed-size tokens (e.g., `batch:toggle:0:abc123:plex` where `abc123` is a session key). Add "Update All" button to container list keyboard that triggers the existing update-all confirmation flow. + +## Standard Stack + +### Core +| Library | Version | Purpose | Why Standard | +|---------|---------|---------|--------------| +| Telegram Bot API | 7.0+ | Inline keyboard, callback queries | Official Telegram bot interface, 64-byte callback_data limit enforced | +| n8n workflow | 1.x | Orchestration, sub-workflow execution | Project's existing automation platform | +| n8n static data | n8n built-in | Workflow-scoped persistence | n8n's native state storage (execution-scoped, not global) | + +### Supporting +| Library | Version | Purpose | When to Use | +|---------|---------|---------|-------------| +| Docker API | 1.47 | Container list, image tags | Filtering :latest containers for update-all | +| JavaScript (n8n Code nodes) | ES6+ | Callback parsing, keyboard building | All workflow logic implemented in Code nodes | + +### Alternatives Considered +| Instead of | Could Use | Tradeoff | +|------------|-----------|----------| +| n8n static data | External Redis/DB | n8n static data is execution-scoped (doesn't persist between executions), but workflow executions in this bot are short-lived (sub-minute), so state only needs to survive within a single conversation flow; Redis adds infrastructure complexity | +| Callback data tokens | Protobuf + base85 | Protobuf/base85 saves ~30% space but still hits 64-byte limit with 3+ selections; token approach eliminates linear growth | +| Session tokens | Callback data compression | Compression saves bytes but doesn't solve fundamental limit; tokens cap size at ~20 bytes regardless of selection count | + +**Installation:** +No new dependencies. Changes confined to existing n8n workflow JSON files. + +## Architecture Patterns + +### Current State (Problematic) +**Batch selection callback data format:** +``` +batch:toggle:{page}:{selectedCsv}:{containerName} +Example: batch:toggle:0:plex,sonarr,radarr:jellyfin + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ = 42 bytes (22 bytes overhead + 20 bytes names) +``` + +**Problem:** With 4 containers averaging 8 chars each (plex, sonarr, radarr, nzbget): +- Prefix overhead: `batch:toggle:0::` = 16 bytes +- Selection CSV: `plex,sonarr,radarr,nzbget` = 28 bytes +- Toggle name: `jellyfin` = 8 bytes +- **Total: 52 bytes** (leaves only 12 bytes headroom) + +With 5th container (10 chars): `plex,sonarr,radarr,nzbget,jellyfin` = 38 bytes → **62 bytes total** (over limit) + +### Recommended Pattern: Session-Based State Storage + +``` +Callback Data Format (Fixed Size): +batch:toggle:{sessionId}:{containerName} +Example: batch:toggle:a7f3d2:plex + ^^^^^^^^^^^^^^^^^^^^^^^ = 23 bytes (fixed, regardless of selection size) + +State Storage (n8n Static Data): +{ + "batchSessions": { + "a7f3d2": { + "chatId": 563878771, + "page": 0, + "selected": ["plex", "sonarr", "radarr", "nzbget", "jellyfin"], + "action": "stop", + "created": 1738972800000, + "expires": 1738973100000 // 5 minutes TTL + } + } +} +``` + +**Benefits:** +- Callback data size constant at ~25 bytes (60% reduction from worst case) +- Supports unlimited container selections +- Session cleanup prevents static data bloat + +### Pattern Implementation: Session Lifecycle + +#### 1. Session Creation (Batch Mode Entry) +```javascript +// Code node: "Initialize Batch Session" +const staticData = $getWorkflowStaticData('global'); +const sessions = JSON.parse(staticData._batchSessions || '{}'); + +// Generate 6-char session ID +const sessionId = Math.random().toString(36).substring(2, 8); +const now = Date.now(); + +sessions[sessionId] = { + chatId: $json.chatId, + page: 0, + selected: [], + action: $json.batchAction || 'stop', + created: now, + expires: now + 300000 // 5 minutes +}; + +// Clean expired sessions (prevent bloat) +Object.keys(sessions).forEach(id => { + if (sessions[id].expires < now) delete sessions[id]; +}); + +staticData._batchSessions = JSON.stringify(sessions); + +return { json: { sessionId, chatId: $json.chatId } }; +``` + +#### 2. Session Update (Toggle Selection) +```javascript +// Code node: "Update Batch Session" +const staticData = $getWorkflowStaticData('global'); +const sessions = JSON.parse(staticData._batchSessions || '{}'); +const sessionId = $json.sessionId; +const toggleName = $json.toggleName; + +if (!sessions[sessionId]) { + return { json: { error: 'Session expired', chatId: $json.chatId } }; +} + +const session = sessions[sessionId]; +const selected = new Set(session.selected); + +// Toggle selection +if (selected.has(toggleName)) { + selected.delete(toggleName); +} else { + selected.add(toggleName); +} + +session.selected = Array.from(selected); +staticData._batchSessions = JSON.stringify(sessions); + +return { json: { + sessionId, + selectedCount: selected.size, + selectedCsv: session.selected.join(',') +} }; +``` + +#### 3. Keyboard Building (Retrieve Session) +```javascript +// Code node: "Build Batch Keyboard With Session" +const staticData = $getWorkflowStaticData('global'); +const sessions = JSON.parse(staticData._batchSessions || '{}'); +const sessionId = $json.sessionId; +const session = sessions[sessionId]; + +const selectedSet = new Set(session.selected); + +// Build keyboard with fixed-size callbacks +const keyboard = displayContainers.map(c => { + const isSelected = selectedSet.has(c.name); + const icon = c.state === 'running' ? '🟢' : '⚪'; + const checkmark = isSelected ? '✓ ' : ''; + return [{ + text: `${checkmark}${icon} ${c.name}`, + callback_data: `batch:toggle:${sessionId}:${c.name}` // Fixed size + }]; +}); + +// Navigation buttons also use session ID +if (page > 0) { + navRow.push({ + text: '◀️ Previous', + callback_data: `batch:nav:${sessionId}:${page - 1}` + }); +} +``` + +### Pattern 2: Update All Entry Points + +**Text Command (Already Implemented):** +``` +User: "update all" + ↓ +Keyword Router → "updateall" output + ↓ +Get All Containers For Update All (HTTP: filter :latest) + ↓ +Build Update All Confirmation (keyboard with uall:confirm:{timestamp}) + ↓ +Send confirmation message +``` + +**Inline Keyboard Entry Point (NEW):** +``` +Container List keyboard: +[🟢 plex] [🟢 sonarr] +[🟢 radarr] [⚪ nzbget] +────────────────────── +[🔄 Update All :latest] ← NEW BUTTON +[◀️ Previous] [1/2] [Next ▶️] +``` + +Callback data: `uall:start` (10 bytes, no parameters needed — fetches :latest containers on click) + +### Anti-Patterns to Avoid + +- **Storing entire selection in callback_data:** Hits 64-byte limit after 2-3 containers +- **Using message ID as session key:** Message ID reused across conversations; use generated tokens +- **Global session store without TTL:** n8n static data persists indefinitely; must clean expired sessions +- **Session lookup without expiry check:** Old sessions can cause stale state bugs + +## Don't Hand-Roll + +| Problem | Don't Build | Use Instead | Why | +|---------|-------------|-------------|-----| +| Callback data compression | Custom LZ4/zlib compression | Session tokens + static data | Compression can't bypass 64-byte hard limit; tokens eliminate size dependency | +| Session ID generation | Timestamp-based sequential IDs | Math.random().toString(36) | Sequential IDs leak execution count; random alphanumeric sufficient for short-lived sessions | +| Static data serialization | Custom binary format | JSON.stringify/parse | n8n static data already uses JSON internally; custom format adds complexity | +| Session cleanup | Background cron node | Inline cleanup on session access | n8n workflows don't support background tasks; cleanup-on-access prevents bloat | + +**Key insight:** Telegram's 64-byte limit is a hard constraint enforced at the API level. The only viable workarounds are: (1) reduce callback_data to fixed-size tokens, or (2) use alternative callback methods (e.g., switch_inline_query). Token-based approach is simplest and requires no architecture changes beyond state management. + +## Common Pitfalls + +### Pitfall 1: n8n Static Data Scope Confusion +**What goes wrong:** Assuming `$getWorkflowStaticData('global')` persists across workflow activations or different workflow instances + +**Why it happens:** "Global" means "workflow-scoped" (accessible to all nodes in the workflow), not "instance-global" (persists forever). From Phase 10.2 UAT: static data is execution-scoped in n8n cloud and may not persist between executions. + +**How to avoid:** +- Document that sessions are conversation-scoped (survive single execution only) +- Implement TTL cleanup to prevent session bloat in long-running executions +- Test session persistence across multiple callback interactions in same execution + +**Warning signs:** +- User reports "session expired" immediately after creating batch selection +- Static data object grows unbounded with old session IDs +- Session lookups fail after workflow re-activation + +### Pitfall 2: Deep Nested Mutation of Static Data +**What goes wrong:** Modifying `staticData.sessions.abc123.selected.push('plex')` doesn't persist changes + +**Why it happens:** n8n only tracks top-level property changes for static data persistence. Deep mutations are silently lost. (From CLAUDE.md: "Deep nested mutations are silently lost. Always use JSON serialization.") + +**How to avoid:** +```javascript +// WRONG - deep mutation not persisted +staticData.sessions[sessionId].selected.push('plex'); + +// CORRECT - top-level assignment persisted +const sessions = JSON.parse(staticData._batchSessions || '{}'); +sessions[sessionId].selected.push('plex'); +staticData._batchSessions = JSON.stringify(sessions); +``` + +**Warning signs:** +- Session state reverts to initial state after toggle +- Selection list shows empty array despite successful toggles +- Debugging shows correct in-memory values but wrong persisted values + +### Pitfall 3: Callback Data URL Encoding +**What goes wrong:** Container names with spaces or special chars exceed 64-byte limit after URL encoding + +**Why it happens:** Telegram URL-encodes callback_data before enforcing 64-byte limit. `container name` becomes `container%20name` (+2 bytes per space). + +**How to avoid:** +- Normalize container names to remove leading slash (Docker returns `/plex`, store as `plex`) +- Session tokens are alphanumeric only (no encoding needed) +- Test with containers that have spaces, dashes, underscores + +**Warning signs:** +- Batch toggle works for `plex` but fails for `my-container-name-v2` +- Telegram API returns 400 Bad Request with no error details +- Callback data length looks under 64 bytes in code but fails at API + +### Pitfall 4: Update All Without Confirmation +**What goes wrong:** Adding "Update All" button that immediately triggers batch update without confirmation + +**Why it happens:** Copying pattern from batch start/stop exec buttons, which show confirmation for stop only + +**How to avoid:** +- **ALWAYS show confirmation for update-all** (updates are destructive — image pull can fail, container recreation can break state) +- Reuse existing `Build Update All Confirmation` code node (already implemented at line 2810 in main workflow) +- Add inline keyboard entry point that routes to confirmation flow, not direct execution + +**Warning signs:** +- User reports containers updated without confirmation prompt +- Update-all triggers immediately on button press +- No 30-second timeout check for update-all + +## Code Examples + +Verified patterns from existing implementation and Telegram Bot API: + +### Session-Based Batch Toggle +```javascript +// Source: n8n-batch-ui.json + Telegram Bot API docs +// Modified from current CSV-in-callback to session-based approach + +// Code node: "Handle Toggle With Session" +const triggerData = $('When executed by another workflow').item.json; +const sessionId = triggerData.sessionId; +const toggleName = triggerData.toggleName; +const chatId = triggerData.chatId; + +// Load session state +const staticData = $getWorkflowStaticData('global'); +const sessions = JSON.parse(staticData._batchSessions || '{}'); + +if (!sessions[sessionId]) { + return { + json: { + success: false, + action: 'expired', + queryId: triggerData.queryId, + chatId: chatId, + answerText: 'Session expired (5 min timeout)', + showAlert: true + } + }; +} + +const session = sessions[sessionId]; +const selectedSet = new Set(session.selected); + +// Toggle selection +if (selectedSet.has(toggleName)) { + selectedSet.delete(toggleName); +} else { + selectedSet.add(toggleName); +} + +session.selected = Array.from(selectedSet); + +// CRITICAL: Top-level assignment for persistence +staticData._batchSessions = JSON.stringify(sessions); + +return { + json: { + success: true, + action: 'toggle_update', + sessionId: sessionId, + selectedCount: selectedSet.size, + selectedCsv: session.selected.join(','), + needsKeyboardUpdate: true + } +}; +``` + +### Update All Inline Keyboard Entry +```javascript +// Source: n8n-status.json Build Container List node +// Add "Update All" button to container list keyboard + +// Code node: "Build Container List" (modified) +// ... existing container list logic ... + +// Add Update All button row after pagination +keyboard.push([ + { + text: '🔄 Update All :latest', + callback_data: 'uall:start' // 10 bytes, triggers existing flow + } +]); + +return { + json: { + success: true, + action: 'list', + chatId: chatId, + messageId: messageId, + text: message, + reply_markup: { inline_keyboard: keyboard } + } +}; +``` + +### Session Cleanup on Access +```javascript +// Source: n8n best practices + project patterns +// Clean expired sessions every time static data is accessed + +function getSessionsWithCleanup() { + const staticData = $getWorkflowStaticData('global'); + const sessions = JSON.parse(staticData._batchSessions || '{}'); + const now = Date.now(); + let cleaned = false; + + // Remove expired sessions (5-minute TTL) + Object.keys(sessions).forEach(id => { + if (sessions[id].expires < now) { + delete sessions[id]; + cleaned = true; + } + }); + + // Persist cleanup + if (cleaned) { + staticData._batchSessions = JSON.stringify(sessions); + } + + return sessions; +} + +// Usage in any session-access code +const sessions = getSessionsWithCleanup(); +``` + +### Callback Data Parser Update +```javascript +// Source: n8n-workflow.json "Parse Callback Data" node (line 589) +// Add session-based batch toggle parsing + +// Existing: batch:toggle:{page}:{selectedCsv}:{containerName} +// New: batch:toggle:{sessionId}:{containerName} + +if (rawData.startsWith('batch:toggle:')) { + const parts = rawData.substring(13).split(':'); + const sessionId = parts[0]; // Changed from page number + const toggleName = parts.slice(1).join(':'); // Handle names with colons + + return { + json: { + queryId, + chatId, + messageId, + isBatchToggle: true, + sessionId: sessionId, // NEW field + toggleName: toggleName, + // Removed: batchPage, selectedCsv (now in session state) + } + }; +} + +// NEW: Update All start button +if (rawData === 'uall:start') { + return { + json: { + queryId, + chatId, + messageId, + isUpdateAllStart: true, // Routes to existing confirmation flow + } + }; +} +``` + +## State of the Art + +| Old Approach | Current Approach | When Changed | Impact | +|--------------|------------------|--------------|--------| +| CSV in callback_data | Session tokens + server state | Telegram Bot API 7.0 (2023) enforced 64-byte limit strictly | Libraries like python-telegram-bot added CallbackDataCache in v20+ | +| Manual session cleanup | Inline cleanup on access | n8n lacks background tasks | Must clean on every session read to prevent bloat | +| Direct :latest image pull | Filter then confirm | Docker Hub rate limits (2020) | Always confirm batch operations to avoid wasted pulls | +| Batch exec without limit UI | Multi-select keyboard | Telegram inline keyboard UX (2018+) | Users expect checkbox-style interfaces for batch selection | + +**Deprecated/outdated:** +- **Storing full selection in callback_data**: Python-telegram-bot deprecated this pattern in v20.0 (2022), introduced CallbackDataCache for server-side storage +- **Unlimited batch operations without confirmation**: Docker Hub introduced rate limits (100 pulls/6hrs for free tier) in November 2020 — always confirm before batch image pulls +- **Using message_id as state key**: Early Telegram bots used message ID for state lookup, but message IDs are reused across chats — use chat_id + random token + +## Open Questions + +1. **What is the maximum practical selection size?** + - What we know: Session state stored in n8n static data (execution-scoped JSON) + - What's unclear: n8n static data size limits, if any + - Recommendation: Cap batch selection at 50 containers (practical UX limit for 6 per page = 9 pages), document limit in keyboard message + +2. **Should session TTL be configurable or hardcoded?** + - What we know: Telegram callback queries expire after message age threshold (unclear exact time) + - What's unclear: Optimal balance between UX (allow user to take time selecting) vs resource usage (cleanup frequency) + - Recommendation: Hardcode 5-minute TTL (matches Telegram confirmation timeout pattern already in workflow), revisit if users report timeout issues + +3. **Does Update All inline keyboard need "Update All (N containers)" dynamic count?** + - What we know: Text command shows count in confirmation (`Update 12 containers?`) + - What's unclear: Whether button should show live count (requires Docker API call on every list render) or static text + - Recommendation: Static text `🔄 Update All :latest` in list keyboard, dynamic count shown in confirmation message after click (reduces API calls) + +## Sources + +### Primary (HIGH confidence) +- [Telegram Bot API Official Docs](https://core.telegram.org/bots/api) - InlineKeyboardButton callback_data 1-64 bytes limit +- [Telegram Limits Reference](https://limits.tginfo.me/en) - Comprehensive Bot API limits documentation +- Project codebase: n8n-workflow.json (lines 276-296, 589-1020, 2750-3074) - Existing "update all" implementation, callback parser, :latest filtering +- Project codebase: n8n-batch-ui.json (lines 236-251) - Current CSV-in-callback approach and 64-byte limit check +- Project codebase: CLAUDE.md - n8n static data deep mutation pitfall, JSON serialization requirement + +### Secondary (MEDIUM confidence) +- [n8n getWorkflowStaticData Docs](https://docs.n8n.io/code/cookbook/builtin/get-workflow-static-data/) - Static data persistence behavior, execution scope +- [n8n Static Data Persistence GitHub Issue #17321](https://github.com/n8n-io/n8n/issues/17321) - Cloud execution-scoped behavior (may not persist between triggers) +- [python-telegram-bot CallbackDataCache](https://docs.python-telegram-bot.org/en/v21.9/telegram.ext.callbackdatacache.html) - Standard library pattern for callback_data workarounds +- [Telegram Inline Keyboard UX Guide](https://wyu-telegram.com/blogs/444/) - Best practices for multi-select interfaces + +### Tertiary (LOW confidence) +- [Medium: Telegram bot inline buttons with large data](https://medium.com/@knock.nevis/telegram-bot-inline-buttons-with-large-data-950e818c1272) - Community workarounds for 64-byte limit using Redis +- [Enhanced callback_data with protobuf + base85](https://seroperson.me/2025/02/05/enhanced-telegram-callback-data/) - Advanced encoding techniques (35% space savings but still hits limit with selections) + +## Metadata + +**Confidence breakdown:** +- Standard stack: HIGH - Telegram Bot API and n8n workflow are existing project dependencies with official docs +- Architecture: HIGH - Session token pattern is standard workaround documented in python-telegram-bot and multiple sources; existing "update all" code verified in workflow JSON +- Pitfalls: HIGH - n8n static data mutation pitfall directly from project CLAUDE.md; callback_data limit enforced by Telegram API + +**Research date:** 2026-02-08 +**Valid until:** 2026-03-08 (30 days - stable domain, Telegram Bot API 7.0 limit unchanged since 2023)