Files
unraid-docker-manager/.planning/phases/11-update-all-callback-limits/11-RESEARCH.md
T
2026-02-08 18:56:44 -05:00

506 lines
20 KiB
Markdown

# Phase 11: Update All & Callback Limits - Research
**Researched:** 2026-02-08
**Domain:** Telegram Bot API callback data optimization, n8n workflow state management
**Confidence:** HIGH
## Summary
Phase 11 adds "update all" functionality for :latest containers and fixes Telegram's 64-byte callback_data limit that currently restricts batch selection to ~2 containers. The main workflow already has partial "update all" implementation (text command routing, confirmation keyboard, :latest filtering) but lacks inline keyboard entry point. The critical blocker is the batch selection keyboard's callback_data format (`batch:toggle:0::plex` = 22 bytes + CSV of selected containers), which grows linearly with selection and hits the 64-byte limit after selecting 2-3 short-named containers.
**Primary recommendation:** Replace CSV-in-callback approach with server-side state storage using n8n workflow static data to track batch selection state, reducing callback_data to fixed-size tokens (e.g., `batch:toggle:0:abc123:plex` where `abc123` is a session key). Add "Update All" button to container list keyboard that triggers the existing update-all confirmation flow.
## Standard Stack
### Core
| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| Telegram Bot API | 7.0+ | Inline keyboard, callback queries | Official Telegram bot interface, 64-byte callback_data limit enforced |
| n8n workflow | 1.x | Orchestration, sub-workflow execution | Project's existing automation platform |
| n8n static data | n8n built-in | Workflow-scoped persistence | n8n's native state storage (execution-scoped, not global) |
### Supporting
| Library | Version | Purpose | When to Use |
|---------|---------|---------|-------------|
| Docker API | 1.47 | Container list, image tags | Filtering :latest containers for update-all |
| JavaScript (n8n Code nodes) | ES6+ | Callback parsing, keyboard building | All workflow logic implemented in Code nodes |
### Alternatives Considered
| Instead of | Could Use | Tradeoff |
|------------|-----------|----------|
| n8n static data | External Redis/DB | n8n static data is execution-scoped (doesn't persist between executions), but workflow executions in this bot are short-lived (sub-minute), so state only needs to survive within a single conversation flow; Redis adds infrastructure complexity |
| Callback data tokens | Protobuf + base85 | Protobuf/base85 saves ~30% space but still hits 64-byte limit with 3+ selections; token approach eliminates linear growth |
| Session tokens | Callback data compression | Compression saves bytes but doesn't solve fundamental limit; tokens cap size at ~20 bytes regardless of selection count |
**Installation:**
No new dependencies. Changes confined to existing n8n workflow JSON files.
## Architecture Patterns
### Current State (Problematic)
**Batch selection callback data format:**
```
batch:toggle:{page}:{selectedCsv}:{containerName}
Example: batch:toggle:0:plex,sonarr,radarr:jellyfin
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ = 42 bytes (22 bytes overhead + 20 bytes names)
```
**Problem:** With 4 containers averaging 8 chars each (plex, sonarr, radarr, nzbget):
- Prefix overhead: `batch:toggle:0::` = 16 bytes
- Selection CSV: `plex,sonarr,radarr,nzbget` = 28 bytes
- Toggle name: `jellyfin` = 8 bytes
- **Total: 52 bytes** (leaves only 12 bytes headroom)
With 5th container (10 chars): `plex,sonarr,radarr,nzbget,jellyfin` = 38 bytes → **62 bytes total** (over limit)
### Recommended Pattern: Session-Based State Storage
```
Callback Data Format (Fixed Size):
batch:toggle:{sessionId}:{containerName}
Example: batch:toggle:a7f3d2:plex
^^^^^^^^^^^^^^^^^^^^^^^ = 23 bytes (fixed, regardless of selection size)
State Storage (n8n Static Data):
{
"batchSessions": {
"a7f3d2": {
"chatId": 563878771,
"page": 0,
"selected": ["plex", "sonarr", "radarr", "nzbget", "jellyfin"],
"action": "stop",
"created": 1738972800000,
"expires": 1738973100000 // 5 minutes TTL
}
}
}
```
**Benefits:**
- Callback data size constant at ~25 bytes (60% reduction from worst case)
- Supports unlimited container selections
- Session cleanup prevents static data bloat
### Pattern Implementation: Session Lifecycle
#### 1. Session Creation (Batch Mode Entry)
```javascript
// Code node: "Initialize Batch Session"
const staticData = $getWorkflowStaticData('global');
const sessions = JSON.parse(staticData._batchSessions || '{}');
// Generate 6-char session ID
const sessionId = Math.random().toString(36).substring(2, 8);
const now = Date.now();
sessions[sessionId] = {
chatId: $json.chatId,
page: 0,
selected: [],
action: $json.batchAction || 'stop',
created: now,
expires: now + 300000 // 5 minutes
};
// Clean expired sessions (prevent bloat)
Object.keys(sessions).forEach(id => {
if (sessions[id].expires < now) delete sessions[id];
});
staticData._batchSessions = JSON.stringify(sessions);
return { json: { sessionId, chatId: $json.chatId } };
```
#### 2. Session Update (Toggle Selection)
```javascript
// Code node: "Update Batch Session"
const staticData = $getWorkflowStaticData('global');
const sessions = JSON.parse(staticData._batchSessions || '{}');
const sessionId = $json.sessionId;
const toggleName = $json.toggleName;
if (!sessions[sessionId]) {
return { json: { error: 'Session expired', chatId: $json.chatId } };
}
const session = sessions[sessionId];
const selected = new Set(session.selected);
// Toggle selection
if (selected.has(toggleName)) {
selected.delete(toggleName);
} else {
selected.add(toggleName);
}
session.selected = Array.from(selected);
staticData._batchSessions = JSON.stringify(sessions);
return { json: {
sessionId,
selectedCount: selected.size,
selectedCsv: session.selected.join(',')
} };
```
#### 3. Keyboard Building (Retrieve Session)
```javascript
// Code node: "Build Batch Keyboard With Session"
const staticData = $getWorkflowStaticData('global');
const sessions = JSON.parse(staticData._batchSessions || '{}');
const sessionId = $json.sessionId;
const session = sessions[sessionId];
const selectedSet = new Set(session.selected);
// Build keyboard with fixed-size callbacks
const keyboard = displayContainers.map(c => {
const isSelected = selectedSet.has(c.name);
const icon = c.state === 'running' ? '🟢' : '⚪';
const checkmark = isSelected ? '✓ ' : '';
return [{
text: `${checkmark}${icon} ${c.name}`,
callback_data: `batch:toggle:${sessionId}:${c.name}` // Fixed size
}];
});
// Navigation buttons also use session ID
if (page > 0) {
navRow.push({
text: '◀️ Previous',
callback_data: `batch:nav:${sessionId}:${page - 1}`
});
}
```
### Pattern 2: Update All Entry Points
**Text Command (Already Implemented):**
```
User: "update all"
Keyword Router → "updateall" output
Get All Containers For Update All (HTTP: filter :latest)
Build Update All Confirmation (keyboard with uall:confirm:{timestamp})
Send confirmation message
```
**Inline Keyboard Entry Point (NEW):**
```
Container List keyboard:
[🟢 plex] [🟢 sonarr]
[🟢 radarr] [⚪ nzbget]
──────────────────────
[🔄 Update All :latest] ← NEW BUTTON
[◀️ Previous] [1/2] [Next ▶️]
```
Callback data: `uall:start` (10 bytes, no parameters needed — fetches :latest containers on click)
### Anti-Patterns to Avoid
- **Storing entire selection in callback_data:** Hits 64-byte limit after 2-3 containers
- **Using message ID as session key:** Message ID reused across conversations; use generated tokens
- **Global session store without TTL:** n8n static data persists indefinitely; must clean expired sessions
- **Session lookup without expiry check:** Old sessions can cause stale state bugs
## Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---------|-------------|-------------|-----|
| Callback data compression | Custom LZ4/zlib compression | Session tokens + static data | Compression can't bypass 64-byte hard limit; tokens eliminate size dependency |
| Session ID generation | Timestamp-based sequential IDs | Math.random().toString(36) | Sequential IDs leak execution count; random alphanumeric sufficient for short-lived sessions |
| Static data serialization | Custom binary format | JSON.stringify/parse | n8n static data already uses JSON internally; custom format adds complexity |
| Session cleanup | Background cron node | Inline cleanup on session access | n8n workflows don't support background tasks; cleanup-on-access prevents bloat |
**Key insight:** Telegram's 64-byte limit is a hard constraint enforced at the API level. The only viable workarounds are: (1) reduce callback_data to fixed-size tokens, or (2) use alternative callback methods (e.g., switch_inline_query). Token-based approach is simplest and requires no architecture changes beyond state management.
## Common Pitfalls
### Pitfall 1: n8n Static Data Scope Confusion
**What goes wrong:** Assuming `$getWorkflowStaticData('global')` persists across workflow activations or different workflow instances
**Why it happens:** "Global" means "workflow-scoped" (accessible to all nodes in the workflow), not "instance-global" (persists forever). From Phase 10.2 UAT: static data is execution-scoped in n8n cloud and may not persist between executions.
**How to avoid:**
- Document that sessions are conversation-scoped (survive single execution only)
- Implement TTL cleanup to prevent session bloat in long-running executions
- Test session persistence across multiple callback interactions in same execution
**Warning signs:**
- User reports "session expired" immediately after creating batch selection
- Static data object grows unbounded with old session IDs
- Session lookups fail after workflow re-activation
### Pitfall 2: Deep Nested Mutation of Static Data
**What goes wrong:** Modifying `staticData.sessions.abc123.selected.push('plex')` doesn't persist changes
**Why it happens:** n8n only tracks top-level property changes for static data persistence. Deep mutations are silently lost. (From CLAUDE.md: "Deep nested mutations are silently lost. Always use JSON serialization.")
**How to avoid:**
```javascript
// WRONG - deep mutation not persisted
staticData.sessions[sessionId].selected.push('plex');
// CORRECT - top-level assignment persisted
const sessions = JSON.parse(staticData._batchSessions || '{}');
sessions[sessionId].selected.push('plex');
staticData._batchSessions = JSON.stringify(sessions);
```
**Warning signs:**
- Session state reverts to initial state after toggle
- Selection list shows empty array despite successful toggles
- Debugging shows correct in-memory values but wrong persisted values
### Pitfall 3: Callback Data URL Encoding
**What goes wrong:** Container names with spaces or special chars exceed 64-byte limit after URL encoding
**Why it happens:** Telegram URL-encodes callback_data before enforcing 64-byte limit. `container name` becomes `container%20name` (+2 bytes per space).
**How to avoid:**
- Normalize container names to remove leading slash (Docker returns `/plex`, store as `plex`)
- Session tokens are alphanumeric only (no encoding needed)
- Test with containers that have spaces, dashes, underscores
**Warning signs:**
- Batch toggle works for `plex` but fails for `my-container-name-v2`
- Telegram API returns 400 Bad Request with no error details
- Callback data length looks under 64 bytes in code but fails at API
### Pitfall 4: Update All Without Confirmation
**What goes wrong:** Adding "Update All" button that immediately triggers batch update without confirmation
**Why it happens:** Copying pattern from batch start/stop exec buttons, which show confirmation for stop only
**How to avoid:**
- **ALWAYS show confirmation for update-all** (updates are destructive — image pull can fail, container recreation can break state)
- Reuse existing `Build Update All Confirmation` code node (already implemented at line 2810 in main workflow)
- Add inline keyboard entry point that routes to confirmation flow, not direct execution
**Warning signs:**
- User reports containers updated without confirmation prompt
- Update-all triggers immediately on button press
- No 30-second timeout check for update-all
## Code Examples
Verified patterns from existing implementation and Telegram Bot API:
### Session-Based Batch Toggle
```javascript
// Source: n8n-batch-ui.json + Telegram Bot API docs
// Modified from current CSV-in-callback to session-based approach
// Code node: "Handle Toggle With Session"
const triggerData = $('When executed by another workflow').item.json;
const sessionId = triggerData.sessionId;
const toggleName = triggerData.toggleName;
const chatId = triggerData.chatId;
// Load session state
const staticData = $getWorkflowStaticData('global');
const sessions = JSON.parse(staticData._batchSessions || '{}');
if (!sessions[sessionId]) {
return {
json: {
success: false,
action: 'expired',
queryId: triggerData.queryId,
chatId: chatId,
answerText: 'Session expired (5 min timeout)',
showAlert: true
}
};
}
const session = sessions[sessionId];
const selectedSet = new Set(session.selected);
// Toggle selection
if (selectedSet.has(toggleName)) {
selectedSet.delete(toggleName);
} else {
selectedSet.add(toggleName);
}
session.selected = Array.from(selectedSet);
// CRITICAL: Top-level assignment for persistence
staticData._batchSessions = JSON.stringify(sessions);
return {
json: {
success: true,
action: 'toggle_update',
sessionId: sessionId,
selectedCount: selectedSet.size,
selectedCsv: session.selected.join(','),
needsKeyboardUpdate: true
}
};
```
### Update All Inline Keyboard Entry
```javascript
// Source: n8n-status.json Build Container List node
// Add "Update All" button to container list keyboard
// Code node: "Build Container List" (modified)
// ... existing container list logic ...
// Add Update All button row after pagination
keyboard.push([
{
text: '🔄 Update All :latest',
callback_data: 'uall:start' // 10 bytes, triggers existing flow
}
]);
return {
json: {
success: true,
action: 'list',
chatId: chatId,
messageId: messageId,
text: message,
reply_markup: { inline_keyboard: keyboard }
}
};
```
### Session Cleanup on Access
```javascript
// Source: n8n best practices + project patterns
// Clean expired sessions every time static data is accessed
function getSessionsWithCleanup() {
const staticData = $getWorkflowStaticData('global');
const sessions = JSON.parse(staticData._batchSessions || '{}');
const now = Date.now();
let cleaned = false;
// Remove expired sessions (5-minute TTL)
Object.keys(sessions).forEach(id => {
if (sessions[id].expires < now) {
delete sessions[id];
cleaned = true;
}
});
// Persist cleanup
if (cleaned) {
staticData._batchSessions = JSON.stringify(sessions);
}
return sessions;
}
// Usage in any session-access code
const sessions = getSessionsWithCleanup();
```
### Callback Data Parser Update
```javascript
// Source: n8n-workflow.json "Parse Callback Data" node (line 589)
// Add session-based batch toggle parsing
// Existing: batch:toggle:{page}:{selectedCsv}:{containerName}
// New: batch:toggle:{sessionId}:{containerName}
if (rawData.startsWith('batch:toggle:')) {
const parts = rawData.substring(13).split(':');
const sessionId = parts[0]; // Changed from page number
const toggleName = parts.slice(1).join(':'); // Handle names with colons
return {
json: {
queryId,
chatId,
messageId,
isBatchToggle: true,
sessionId: sessionId, // NEW field
toggleName: toggleName,
// Removed: batchPage, selectedCsv (now in session state)
}
};
}
// NEW: Update All start button
if (rawData === 'uall:start') {
return {
json: {
queryId,
chatId,
messageId,
isUpdateAllStart: true, // Routes to existing confirmation flow
}
};
}
```
## State of the Art
| Old Approach | Current Approach | When Changed | Impact |
|--------------|------------------|--------------|--------|
| CSV in callback_data | Session tokens + server state | Telegram Bot API 7.0 (2023) enforced 64-byte limit strictly | Libraries like python-telegram-bot added CallbackDataCache in v20+ |
| Manual session cleanup | Inline cleanup on access | n8n lacks background tasks | Must clean on every session read to prevent bloat |
| Direct :latest image pull | Filter then confirm | Docker Hub rate limits (2020) | Always confirm batch operations to avoid wasted pulls |
| Batch exec without limit UI | Multi-select keyboard | Telegram inline keyboard UX (2018+) | Users expect checkbox-style interfaces for batch selection |
**Deprecated/outdated:**
- **Storing full selection in callback_data**: Python-telegram-bot deprecated this pattern in v20.0 (2022), introduced CallbackDataCache for server-side storage
- **Unlimited batch operations without confirmation**: Docker Hub introduced rate limits (100 pulls/6hrs for free tier) in November 2020 — always confirm before batch image pulls
- **Using message_id as state key**: Early Telegram bots used message ID for state lookup, but message IDs are reused across chats — use chat_id + random token
## Open Questions
1. **What is the maximum practical selection size?**
- What we know: Session state stored in n8n static data (execution-scoped JSON)
- What's unclear: n8n static data size limits, if any
- Recommendation: Cap batch selection at 50 containers (practical UX limit for 6 per page = 9 pages), document limit in keyboard message
2. **Should session TTL be configurable or hardcoded?**
- What we know: Telegram callback queries expire after message age threshold (unclear exact time)
- What's unclear: Optimal balance between UX (allow user to take time selecting) vs resource usage (cleanup frequency)
- Recommendation: Hardcode 5-minute TTL (matches Telegram confirmation timeout pattern already in workflow), revisit if users report timeout issues
3. **Does Update All inline keyboard need "Update All (N containers)" dynamic count?**
- What we know: Text command shows count in confirmation (`Update 12 containers?`)
- What's unclear: Whether button should show live count (requires Docker API call on every list render) or static text
- Recommendation: Static text `🔄 Update All :latest` in list keyboard, dynamic count shown in confirmation message after click (reduces API calls)
## Sources
### Primary (HIGH confidence)
- [Telegram Bot API Official Docs](https://core.telegram.org/bots/api) - InlineKeyboardButton callback_data 1-64 bytes limit
- [Telegram Limits Reference](https://limits.tginfo.me/en) - Comprehensive Bot API limits documentation
- Project codebase: n8n-workflow.json (lines 276-296, 589-1020, 2750-3074) - Existing "update all" implementation, callback parser, :latest filtering
- Project codebase: n8n-batch-ui.json (lines 236-251) - Current CSV-in-callback approach and 64-byte limit check
- Project codebase: CLAUDE.md - n8n static data deep mutation pitfall, JSON serialization requirement
### Secondary (MEDIUM confidence)
- [n8n getWorkflowStaticData Docs](https://docs.n8n.io/code/cookbook/builtin/get-workflow-static-data/) - Static data persistence behavior, execution scope
- [n8n Static Data Persistence GitHub Issue #17321](https://github.com/n8n-io/n8n/issues/17321) - Cloud execution-scoped behavior (may not persist between triggers)
- [python-telegram-bot CallbackDataCache](https://docs.python-telegram-bot.org/en/v21.9/telegram.ext.callbackdatacache.html) - Standard library pattern for callback_data workarounds
- [Telegram Inline Keyboard UX Guide](https://wyu-telegram.com/blogs/444/) - Best practices for multi-select interfaces
### Tertiary (LOW confidence)
- [Medium: Telegram bot inline buttons with large data](https://medium.com/@knock.nevis/telegram-bot-inline-buttons-with-large-data-950e818c1272) - Community workarounds for 64-byte limit using Redis
- [Enhanced callback_data with protobuf + base85](https://seroperson.me/2025/02/05/enhanced-telegram-callback-data/) - Advanced encoding techniques (35% space savings but still hits limit with selections)
## Metadata
**Confidence breakdown:**
- Standard stack: HIGH - Telegram Bot API and n8n workflow are existing project dependencies with official docs
- Architecture: HIGH - Session token pattern is standard workaround documented in python-telegram-bot and multiple sources; existing "update all" code verified in workflow JSON
- Pitfalls: HIGH - n8n static data mutation pitfall directly from project CLAUDE.md; callback_data limit enforced by Telegram API
**Research date:** 2026-02-08
**Valid until:** 2026-03-08 (30 days - stable domain, Telegram Bot API 7.0 limit unchanged since 2023)