Files
unraid-docker-manager/.planning/phases/11-update-all-callback-limits/11-RESEARCH.md
T
2026-02-08 18:56:44 -05:00

20 KiB

Phase 11: Update All & Callback Limits - Research

Researched: 2026-02-08 Domain: Telegram Bot API callback data optimization, n8n workflow state management Confidence: HIGH

Summary

Phase 11 adds "update all" functionality for :latest containers and fixes Telegram's 64-byte callback_data limit that currently restricts batch selection to ~2 containers. The main workflow already has partial "update all" implementation (text command routing, confirmation keyboard, :latest filtering) but lacks inline keyboard entry point. The critical blocker is the batch selection keyboard's callback_data format (batch:toggle:0::plex = 22 bytes + CSV of selected containers), which grows linearly with selection and hits the 64-byte limit after selecting 2-3 short-named containers.

Primary recommendation: Replace CSV-in-callback approach with server-side state storage using n8n workflow static data to track batch selection state, reducing callback_data to fixed-size tokens (e.g., batch:toggle:0:abc123:plex where abc123 is a session key). Add "Update All" button to container list keyboard that triggers the existing update-all confirmation flow.

Standard Stack

Core

Library Version Purpose Why Standard
Telegram Bot API 7.0+ Inline keyboard, callback queries Official Telegram bot interface, 64-byte callback_data limit enforced
n8n workflow 1.x Orchestration, sub-workflow execution Project's existing automation platform
n8n static data n8n built-in Workflow-scoped persistence n8n's native state storage (execution-scoped, not global)

Supporting

Library Version Purpose When to Use
Docker API 1.47 Container list, image tags Filtering :latest containers for update-all
JavaScript (n8n Code nodes) ES6+ Callback parsing, keyboard building All workflow logic implemented in Code nodes

Alternatives Considered

Instead of Could Use Tradeoff
n8n static data External Redis/DB n8n static data is execution-scoped (doesn't persist between executions), but workflow executions in this bot are short-lived (sub-minute), so state only needs to survive within a single conversation flow; Redis adds infrastructure complexity
Callback data tokens Protobuf + base85 Protobuf/base85 saves ~30% space but still hits 64-byte limit with 3+ selections; token approach eliminates linear growth
Session tokens Callback data compression Compression saves bytes but doesn't solve fundamental limit; tokens cap size at ~20 bytes regardless of selection count

Installation: No new dependencies. Changes confined to existing n8n workflow JSON files.

Architecture Patterns

Current State (Problematic)

Batch selection callback data format:

batch:toggle:{page}:{selectedCsv}:{containerName}
Example: batch:toggle:0:plex,sonarr,radarr:jellyfin
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ = 42 bytes (22 bytes overhead + 20 bytes names)

Problem: With 4 containers averaging 8 chars each (plex, sonarr, radarr, nzbget):

  • Prefix overhead: batch:toggle:0:: = 16 bytes
  • Selection CSV: plex,sonarr,radarr,nzbget = 28 bytes
  • Toggle name: jellyfin = 8 bytes
  • Total: 52 bytes (leaves only 12 bytes headroom)

With 5th container (10 chars): plex,sonarr,radarr,nzbget,jellyfin = 38 bytes → 62 bytes total (over limit)

Callback Data Format (Fixed Size):
batch:toggle:{sessionId}:{containerName}
Example: batch:toggle:a7f3d2:plex
         ^^^^^^^^^^^^^^^^^^^^^^^ = 23 bytes (fixed, regardless of selection size)

State Storage (n8n Static Data):
{
  "batchSessions": {
    "a7f3d2": {
      "chatId": 563878771,
      "page": 0,
      "selected": ["plex", "sonarr", "radarr", "nzbget", "jellyfin"],
      "action": "stop",
      "created": 1738972800000,
      "expires": 1738973100000  // 5 minutes TTL
    }
  }
}

Benefits:

  • Callback data size constant at ~25 bytes (60% reduction from worst case)
  • Supports unlimited container selections
  • Session cleanup prevents static data bloat

Pattern Implementation: Session Lifecycle

1. Session Creation (Batch Mode Entry)

// Code node: "Initialize Batch Session"
const staticData = $getWorkflowStaticData('global');
const sessions = JSON.parse(staticData._batchSessions || '{}');

// Generate 6-char session ID
const sessionId = Math.random().toString(36).substring(2, 8);
const now = Date.now();

sessions[sessionId] = {
  chatId: $json.chatId,
  page: 0,
  selected: [],
  action: $json.batchAction || 'stop',
  created: now,
  expires: now + 300000  // 5 minutes
};

// Clean expired sessions (prevent bloat)
Object.keys(sessions).forEach(id => {
  if (sessions[id].expires < now) delete sessions[id];
});

staticData._batchSessions = JSON.stringify(sessions);

return { json: { sessionId, chatId: $json.chatId } };

2. Session Update (Toggle Selection)

// Code node: "Update Batch Session"
const staticData = $getWorkflowStaticData('global');
const sessions = JSON.parse(staticData._batchSessions || '{}');
const sessionId = $json.sessionId;
const toggleName = $json.toggleName;

if (!sessions[sessionId]) {
  return { json: { error: 'Session expired', chatId: $json.chatId } };
}

const session = sessions[sessionId];
const selected = new Set(session.selected);

// Toggle selection
if (selected.has(toggleName)) {
  selected.delete(toggleName);
} else {
  selected.add(toggleName);
}

session.selected = Array.from(selected);
staticData._batchSessions = JSON.stringify(sessions);

return { json: {
  sessionId,
  selectedCount: selected.size,
  selectedCsv: session.selected.join(',')
} };

3. Keyboard Building (Retrieve Session)

// Code node: "Build Batch Keyboard With Session"
const staticData = $getWorkflowStaticData('global');
const sessions = JSON.parse(staticData._batchSessions || '{}');
const sessionId = $json.sessionId;
const session = sessions[sessionId];

const selectedSet = new Set(session.selected);

// Build keyboard with fixed-size callbacks
const keyboard = displayContainers.map(c => {
  const isSelected = selectedSet.has(c.name);
  const icon = c.state === 'running' ? '🟢' : '⚪';
  const checkmark = isSelected ? '✓ ' : '';
  return [{
    text: `${checkmark}${icon} ${c.name}`,
    callback_data: `batch:toggle:${sessionId}:${c.name}`  // Fixed size
  }];
});

// Navigation buttons also use session ID
if (page > 0) {
  navRow.push({
    text: '◀️ Previous',
    callback_data: `batch:nav:${sessionId}:${page - 1}`
  });
}

Pattern 2: Update All Entry Points

Text Command (Already Implemented):

User: "update all"
  ↓
Keyword Router → "updateall" output
  ↓
Get All Containers For Update All (HTTP: filter :latest)
  ↓
Build Update All Confirmation (keyboard with uall:confirm:{timestamp})
  ↓
Send confirmation message

Inline Keyboard Entry Point (NEW):

Container List keyboard:
[🟢 plex]   [🟢 sonarr]
[🟢 radarr] [⚪ nzbget]
──────────────────────
[🔄 Update All :latest] ← NEW BUTTON
[◀️ Previous] [1/2] [Next ▶️]

Callback data: uall:start (10 bytes, no parameters needed — fetches :latest containers on click)

Anti-Patterns to Avoid

  • Storing entire selection in callback_data: Hits 64-byte limit after 2-3 containers
  • Using message ID as session key: Message ID reused across conversations; use generated tokens
  • Global session store without TTL: n8n static data persists indefinitely; must clean expired sessions
  • Session lookup without expiry check: Old sessions can cause stale state bugs

Don't Hand-Roll

Problem Don't Build Use Instead Why
Callback data compression Custom LZ4/zlib compression Session tokens + static data Compression can't bypass 64-byte hard limit; tokens eliminate size dependency
Session ID generation Timestamp-based sequential IDs Math.random().toString(36) Sequential IDs leak execution count; random alphanumeric sufficient for short-lived sessions
Static data serialization Custom binary format JSON.stringify/parse n8n static data already uses JSON internally; custom format adds complexity
Session cleanup Background cron node Inline cleanup on session access n8n workflows don't support background tasks; cleanup-on-access prevents bloat

Key insight: Telegram's 64-byte limit is a hard constraint enforced at the API level. The only viable workarounds are: (1) reduce callback_data to fixed-size tokens, or (2) use alternative callback methods (e.g., switch_inline_query). Token-based approach is simplest and requires no architecture changes beyond state management.

Common Pitfalls

Pitfall 1: n8n Static Data Scope Confusion

What goes wrong: Assuming $getWorkflowStaticData('global') persists across workflow activations or different workflow instances

Why it happens: "Global" means "workflow-scoped" (accessible to all nodes in the workflow), not "instance-global" (persists forever). From Phase 10.2 UAT: static data is execution-scoped in n8n cloud and may not persist between executions.

How to avoid:

  • Document that sessions are conversation-scoped (survive single execution only)
  • Implement TTL cleanup to prevent session bloat in long-running executions
  • Test session persistence across multiple callback interactions in same execution

Warning signs:

  • User reports "session expired" immediately after creating batch selection
  • Static data object grows unbounded with old session IDs
  • Session lookups fail after workflow re-activation

Pitfall 2: Deep Nested Mutation of Static Data

What goes wrong: Modifying staticData.sessions.abc123.selected.push('plex') doesn't persist changes

Why it happens: n8n only tracks top-level property changes for static data persistence. Deep mutations are silently lost. (From CLAUDE.md: "Deep nested mutations are silently lost. Always use JSON serialization.")

How to avoid:

// WRONG - deep mutation not persisted
staticData.sessions[sessionId].selected.push('plex');

// CORRECT - top-level assignment persisted
const sessions = JSON.parse(staticData._batchSessions || '{}');
sessions[sessionId].selected.push('plex');
staticData._batchSessions = JSON.stringify(sessions);

Warning signs:

  • Session state reverts to initial state after toggle
  • Selection list shows empty array despite successful toggles
  • Debugging shows correct in-memory values but wrong persisted values

Pitfall 3: Callback Data URL Encoding

What goes wrong: Container names with spaces or special chars exceed 64-byte limit after URL encoding

Why it happens: Telegram URL-encodes callback_data before enforcing 64-byte limit. container name becomes container%20name (+2 bytes per space).

How to avoid:

  • Normalize container names to remove leading slash (Docker returns /plex, store as plex)
  • Session tokens are alphanumeric only (no encoding needed)
  • Test with containers that have spaces, dashes, underscores

Warning signs:

  • Batch toggle works for plex but fails for my-container-name-v2
  • Telegram API returns 400 Bad Request with no error details
  • Callback data length looks under 64 bytes in code but fails at API

Pitfall 4: Update All Without Confirmation

What goes wrong: Adding "Update All" button that immediately triggers batch update without confirmation

Why it happens: Copying pattern from batch start/stop exec buttons, which show confirmation for stop only

How to avoid:

  • ALWAYS show confirmation for update-all (updates are destructive — image pull can fail, container recreation can break state)
  • Reuse existing Build Update All Confirmation code node (already implemented at line 2810 in main workflow)
  • Add inline keyboard entry point that routes to confirmation flow, not direct execution

Warning signs:

  • User reports containers updated without confirmation prompt
  • Update-all triggers immediately on button press
  • No 30-second timeout check for update-all

Code Examples

Verified patterns from existing implementation and Telegram Bot API:

Session-Based Batch Toggle

// Source: n8n-batch-ui.json + Telegram Bot API docs
// Modified from current CSV-in-callback to session-based approach

// Code node: "Handle Toggle With Session"
const triggerData = $('When executed by another workflow').item.json;
const sessionId = triggerData.sessionId;
const toggleName = triggerData.toggleName;
const chatId = triggerData.chatId;

// Load session state
const staticData = $getWorkflowStaticData('global');
const sessions = JSON.parse(staticData._batchSessions || '{}');

if (!sessions[sessionId]) {
  return {
    json: {
      success: false,
      action: 'expired',
      queryId: triggerData.queryId,
      chatId: chatId,
      answerText: 'Session expired (5 min timeout)',
      showAlert: true
    }
  };
}

const session = sessions[sessionId];
const selectedSet = new Set(session.selected);

// Toggle selection
if (selectedSet.has(toggleName)) {
  selectedSet.delete(toggleName);
} else {
  selectedSet.add(toggleName);
}

session.selected = Array.from(selectedSet);

// CRITICAL: Top-level assignment for persistence
staticData._batchSessions = JSON.stringify(sessions);

return {
  json: {
    success: true,
    action: 'toggle_update',
    sessionId: sessionId,
    selectedCount: selectedSet.size,
    selectedCsv: session.selected.join(','),
    needsKeyboardUpdate: true
  }
};

Update All Inline Keyboard Entry

// Source: n8n-status.json Build Container List node
// Add "Update All" button to container list keyboard

// Code node: "Build Container List" (modified)
// ... existing container list logic ...

// Add Update All button row after pagination
keyboard.push([
  {
    text: '🔄 Update All :latest',
    callback_data: 'uall:start'  // 10 bytes, triggers existing flow
  }
]);

return {
  json: {
    success: true,
    action: 'list',
    chatId: chatId,
    messageId: messageId,
    text: message,
    reply_markup: { inline_keyboard: keyboard }
  }
};

Session Cleanup on Access

// Source: n8n best practices + project patterns
// Clean expired sessions every time static data is accessed

function getSessionsWithCleanup() {
  const staticData = $getWorkflowStaticData('global');
  const sessions = JSON.parse(staticData._batchSessions || '{}');
  const now = Date.now();
  let cleaned = false;

  // Remove expired sessions (5-minute TTL)
  Object.keys(sessions).forEach(id => {
    if (sessions[id].expires < now) {
      delete sessions[id];
      cleaned = true;
    }
  });

  // Persist cleanup
  if (cleaned) {
    staticData._batchSessions = JSON.stringify(sessions);
  }

  return sessions;
}

// Usage in any session-access code
const sessions = getSessionsWithCleanup();

Callback Data Parser Update

// Source: n8n-workflow.json "Parse Callback Data" node (line 589)
// Add session-based batch toggle parsing

// Existing: batch:toggle:{page}:{selectedCsv}:{containerName}
// New:      batch:toggle:{sessionId}:{containerName}

if (rawData.startsWith('batch:toggle:')) {
  const parts = rawData.substring(13).split(':');
  const sessionId = parts[0];  // Changed from page number
  const toggleName = parts.slice(1).join(':');  // Handle names with colons

  return {
    json: {
      queryId,
      chatId,
      messageId,
      isBatchToggle: true,
      sessionId: sessionId,  // NEW field
      toggleName: toggleName,
      // Removed: batchPage, selectedCsv (now in session state)
    }
  };
}

// NEW: Update All start button
if (rawData === 'uall:start') {
  return {
    json: {
      queryId,
      chatId,
      messageId,
      isUpdateAllStart: true,  // Routes to existing confirmation flow
    }
  };
}

State of the Art

Old Approach Current Approach When Changed Impact
CSV in callback_data Session tokens + server state Telegram Bot API 7.0 (2023) enforced 64-byte limit strictly Libraries like python-telegram-bot added CallbackDataCache in v20+
Manual session cleanup Inline cleanup on access n8n lacks background tasks Must clean on every session read to prevent bloat
Direct :latest image pull Filter then confirm Docker Hub rate limits (2020) Always confirm batch operations to avoid wasted pulls
Batch exec without limit UI Multi-select keyboard Telegram inline keyboard UX (2018+) Users expect checkbox-style interfaces for batch selection

Deprecated/outdated:

  • Storing full selection in callback_data: Python-telegram-bot deprecated this pattern in v20.0 (2022), introduced CallbackDataCache for server-side storage
  • Unlimited batch operations without confirmation: Docker Hub introduced rate limits (100 pulls/6hrs for free tier) in November 2020 — always confirm before batch image pulls
  • Using message_id as state key: Early Telegram bots used message ID for state lookup, but message IDs are reused across chats — use chat_id + random token

Open Questions

  1. What is the maximum practical selection size?

    • What we know: Session state stored in n8n static data (execution-scoped JSON)
    • What's unclear: n8n static data size limits, if any
    • Recommendation: Cap batch selection at 50 containers (practical UX limit for 6 per page = 9 pages), document limit in keyboard message
  2. Should session TTL be configurable or hardcoded?

    • What we know: Telegram callback queries expire after message age threshold (unclear exact time)
    • What's unclear: Optimal balance between UX (allow user to take time selecting) vs resource usage (cleanup frequency)
    • Recommendation: Hardcode 5-minute TTL (matches Telegram confirmation timeout pattern already in workflow), revisit if users report timeout issues
  3. Does Update All inline keyboard need "Update All (N containers)" dynamic count?

    • What we know: Text command shows count in confirmation (Update 12 containers?)
    • What's unclear: Whether button should show live count (requires Docker API call on every list render) or static text
    • Recommendation: Static text 🔄 Update All :latest in list keyboard, dynamic count shown in confirmation message after click (reduces API calls)

Sources

Primary (HIGH confidence)

  • Telegram Bot API Official Docs - InlineKeyboardButton callback_data 1-64 bytes limit
  • Telegram Limits Reference - Comprehensive Bot API limits documentation
  • Project codebase: n8n-workflow.json (lines 276-296, 589-1020, 2750-3074) - Existing "update all" implementation, callback parser, :latest filtering
  • Project codebase: n8n-batch-ui.json (lines 236-251) - Current CSV-in-callback approach and 64-byte limit check
  • Project codebase: CLAUDE.md - n8n static data deep mutation pitfall, JSON serialization requirement

Secondary (MEDIUM confidence)

Tertiary (LOW confidence)

Metadata

Confidence breakdown:

  • Standard stack: HIGH - Telegram Bot API and n8n workflow are existing project dependencies with official docs
  • Architecture: HIGH - Session token pattern is standard workaround documented in python-telegram-bot and multiple sources; existing "update all" code verified in workflow JSON
  • Pitfalls: HIGH - n8n static data mutation pitfall directly from project CLAUDE.md; callback_data limit enforced by Telegram API

Research date: 2026-02-08 Valid until: 2026-03-08 (30 days - stable domain, Telegram Bot API 7.0 limit unchanged since 2023)