From 652d877ce1005bdb6fb80c8e01b68a8faafe37cd Mon Sep 17 00:00:00 2001
From: Lucas Berger <me@lucasberger.ca>
Date: Wed, 4 Feb 2026 08:22:15 -0500
Subject: [PATCH] docs(09-04): add comprehensive test plan for batch operations

- 21 test cases across 5 suites (text commands, update all, multi-select, errors, regression)
- Deployment steps for n8n workflow import
- Success criteria mapping to tests
- Test execution log template with issue tracking
- Covers all BAT-01 through BAT-06 requirements
---
 .../DEPLOYMENT-TEST-PLAN.md                   | 499 ++++++++++++++++++
 1 file changed, 499 insertions(+)
 create mode 100644 .planning/phases/09-batch-operations/DEPLOYMENT-TEST-PLAN.md

diff --git a/.planning/phases/09-batch-operations/DEPLOYMENT-TEST-PLAN.md b/.planning/phases/09-batch-operations/DEPLOYMENT-TEST-PLAN.md
new file mode 100644
index 0000000..91caba9
--- /dev/null
+++ b/.planning/phases/09-batch-operations/DEPLOYMENT-TEST-PLAN.md
@@ -0,0 +1,499 @@
+# Phase 09-04 Deployment and Test Plan
+
+**Generated:** 2026-02-04
+**Purpose:** Verification testing for batch operations implementation
+
+## Deployment Steps
+
+### 1. Import Updated Workflow
+
+1. Open your n8n instance (typically at http://your-server:5678)
+2. Navigate to Workflows
+3. Select "Docker Manager Bot" workflow
+4. Click the three-dot menu → Export
+5. Save current version as backup: `n8n-workflow-backup-20260204.json`
+6. Return to Workflows → Import from File
+7. Select the updated `n8n-workflow.json` from this repository
+8. Confirm credential mapping (should use existing "Telegram API" credential)
+9. Save and activate the workflow
+
+### 2. Verify Workflow Health
+
+Before testing, confirm:
+- ✅ Workflow is "Active" (toggle in top-right)
+- ✅ No error indicators on nodes
+- ✅ Telegram Trigger shows connected
+- ✅ Your user ID is still configured in IF User/Callback Authenticated nodes
+
+## Test Plan
+
+### Test Suite A: Batch Text Commands
+
+#### Test A1: Multi-Container Update
+**Objective:** Verify batch update with space-separated names
+
+**Steps:**
+1. Send to bot: `update plex sonarr` (use 2-3 of your actual container names)
+2. Observe behavior
+
+**Expected Results:**
+- ✅ Message shows "Updating N containers..."
+- ✅ Progress updates appear for each container individually
+- ✅ Shows "Pulling image..." → "Stopping..." → "Starting..." per container
+- ✅ Final summary message appears with success count
+- ✅ If any container fails, shows failure with reason but continues batch
+- ✅ Summary emphasizes failures (if any) over successes
+
+**Notes:**
+- Record actual execution time for performance assessment
+- Screenshot final summary for documentation
+
+---
+
+#### Test A2: Multi-Container Start
+**Objective:** Verify batch start executes immediately (no confirmation)
+
+**Preparation:**
+1. Manually stop 2-3 containers via existing single commands: `stop plex`, `stop sonarr`
+2. Confirm containers are stopped via `status`
+
+**Steps:**
+1. Send to bot: `start plex sonarr` (use your stopped container names)
+2. Observe behavior
+
+**Expected Results:**
+- ✅ No confirmation prompt (starts immediately)
+- ✅ Progress shows for each container
+- ✅ Summary shows successful starts
+- ✅ Verify containers are running via `status`
+
+---
+
+#### Test A3: Multi-Container Stop with Confirmation
+**Objective:** Verify batch stop requires confirmation (safety measure)
+
+**Steps:**
+1. Send to bot: `stop plex sonarr` (use 2+ running containers)
+2. Wait for confirmation prompt
+3. Tap "Confirm" button
+4. Observe execution
+
+**Expected Results:**
+- ✅ Confirmation message appears: "Stop 2 containers?"
+- ✅ Lists container names in confirmation
+- ✅ Has "Confirm" and "Cancel" buttons
+- ✅ After confirm: batch execution proceeds with progress
+- ✅ Summary shows containers stopped
+
+**Follow-up Test:**
+1. Repeat but tap "Cancel" button
+2. Expected: Confirmation deleted, no action taken
+
+**Follow-up Test 2:**
+1. Repeat but don't respond for 30+ seconds
+2. Expected: Confirmation expires with message
+
+---
+
+#### Test A4: Fuzzy Matching with Exact Match Priority
+**Objective:** Verify exact match takes priority over partial matches
+
+**Scenario 1: Exact match exists**
+1. Send: `update plex` (when both "plex" and "jellyplex" exist)
+2. Expected: Only "plex" container updates (no disambiguation)
+
+**Scenario 2: Only partial matches**
+1. Send: `update jelly` (matches "jellyplex" but not exact)
+2. Expected: If only one match, proceeds; if multiple, shows disambiguation
+
+**Note:** This test depends on your actual container names. Adjust to match your server.
+
+---
+
+#### Test A5: Disambiguation for Ambiguous Names
+**Objective:** Verify disambiguation prompt appears when multiple containers match
+
+**Steps:**
+1. Send command with ambiguous partial match (e.g., `update lin` if you have multiple "lin*" containers)
+2. Wait for disambiguation prompt
+3. Select intended container
+
+**Expected Results:**
+- ✅ Shows "Multiple containers match: lin"
+- ✅ Lists matching containers with buttons
+- ✅ Selecting one proceeds with single-container action
+- ✅ Batch not triggered (user clarified intent)
+
+**Note:** If no ambiguous names on your server, document "Cannot test - no ambiguous container names"
+
+---
+
+### Test Suite B: Update All Command
+
+#### Test B1: Update All with Available Updates
+**Objective:** Verify "update all" targets only :latest containers with updates
+
+**Steps:**
+1. Send: `update all` (or `updateall`)
+2. Observe confirmation prompt
+3. Note which containers are listed
+4. Tap "Confirm"
+5. Observe execution
+
+**Expected Results:**
+- ✅ Command recognized and routed
+- ✅ Confirmation shows: "Update N containers?"
+- ✅ Lists containers (max 10 displayed in message)
+- ✅ Only includes containers using :latest tag
+- ✅ 30-second timeout on confirmation
+- ✅ After confirm: batch execution with progress per container
+- ✅ Summary shows results
+
+**Verification:**
+- Check that only :latest containers were updated
+- Containers with specific tags (e.g., `:1.2.3`) should not appear in list
+
+---
+
+#### Test B2: Update All When No Updates Available
+**Objective:** Verify appropriate message when all containers are current
+
+**Preparation:**
+1. Update all containers manually first OR wait until all are current
+
+**Steps:**
+1. Send: `update all`
+2. Observe response
+
+**Expected Results:**
+- ✅ Shows: "All containers are up to date!" (or similar message)
+- ✅ No confirmation prompt
+- ✅ No batch execution attempted
+
+---
+
+#### Test B3: Update All Cancel
+**Objective:** Verify cancel works
+
+**Steps:**
+1. Send: `update all`
+2. Wait for confirmation
+3. Tap "Cancel"
+
+**Expected Results:**
+- ✅ Confirmation message deleted
+- ✅ Shows cancellation feedback
+- ✅ No containers updated
+
+---
+
+#### Test B4: Update All Timeout
+**Objective:** Verify expiration behavior
+
+**Steps:**
+1. Send: `update all`
+2. Wait for confirmation
+3. Don't respond for 30+ seconds
+4. Try tapping button after expiry
+
+**Expected Results:**
+- ✅ After 30s: Shows expiry message
+- ✅ Confirmation becomes inactive
+- ✅ Tapping expired button shows alert
+
+---
+
+### Test Suite C: Inline Keyboard Multi-Select
+
+#### Test C1: Enter Multi-Select Mode
+**Objective:** Verify multi-select keyboard appears
+
+**Steps:**
+1. Send: `/status`
+2. Locate "Select Multiple" button (may need to be added in future plan)
+3. OR send callback manually: Use bot command that triggers `batch:mode`
+
+**Note:** If no entry point exists yet, test by:
+- Temporarily adding "Select Multiple" button to status keyboard
+- OR testing via n8n "Execute Node" with callback_query data: `batch:mode`
+
+**Expected Results:**
+- ✅ Keyboard shows container list with state icons
+- ✅ Running containers: 🟢
+- ✅ Stopped containers: ⚪
+- ✅ Each button shows container name
+- ✅ No checkmarks initially
+- ✅ Bottom row has "Cancel" button
+
+---
+
+#### Test C2: Toggle Selection
+**Objective:** Verify checkmarks toggle on/off
+
+**Steps:**
+1. Enter multi-select mode (from C1)
+2. Tap a container button
+3. Observe keyboard update
+4. Tap same container again
+5. Tap different container
+
+**Expected Results:**
+- ✅ First tap: Checkmark (✓) appears before container name
+- ✅ Second tap: Checkmark disappears
+- ✅ Multiple containers can have checkmarks
+- ✅ Action buttons appear when any container selected
+- ✅ "Clear Selection" button appears with selection
+
+---
+
+#### Test C3: Execute Batch Update from Multi-Select
+**Objective:** Verify batch execution from inline keyboard
+
+**Steps:**
+1. Enter multi-select mode
+2. Select 2-3 containers (tap to add checkmarks)
+3. Tap "Update Selected (N)" button
+4. Observe execution
+
+**Expected Results:**
+- ✅ Immediate execution (no confirmation for update)
+- ✅ Progress shows for each selected container
+- ✅ Summary shows results
+- ✅ Selection message deleted after execution starts
+
+---
+
+#### Test C4: Execute Batch Stop with Confirmation
+**Objective:** Verify stop requires confirmation from multi-select
+
+**Steps:**
+1. Enter multi-select mode
+2. Select 2+ running containers
+3. Tap "Stop Selected (N)" button
+4. Wait for confirmation
+5. Tap "Confirm"
+
+**Expected Results:**
+- ✅ Confirmation prompt appears (doesn't execute immediately)
+- ✅ Lists selected containers
+- ✅ After confirm: batch stop executes
+- ✅ Summary shows stopped containers
+
+---
+
+#### Test C5: Selection Limit Enforcement
+**Objective:** Verify callback size limit prevents overflow
+
+**Steps:**
+1. Enter multi-select mode
+2. Select containers one by one
+3. Attempt to select 9+ containers
+
+**Expected Results:**
+- ✅ Selection works smoothly for first ~8 containers
+- ✅ At limit: Alert appears "Selection limit reached"
+- ✅ Cannot select additional containers
+- ✅ Can deselect and select different containers
+- ✅ Guidance shown (e.g., "Use 'update all' for larger batches")
+
+**Note:** Exact limit depends on container name lengths. Shorter names = more selections possible.
+
+---
+
+#### Test C6: Clear Selection
+**Objective:** Verify clear button resets selection
+
+**Steps:**
+1. Select 3+ containers
+2. Tap "Clear Selection" button
+
+**Expected Results:**
+- ✅ All checkmarks removed
+- ✅ Action buttons disappear
+- ✅ Only "Cancel" button remains
+- ✅ Can start new selection
+
+---
+
+#### Test C7: Cancel Multi-Select
+**Objective:** Verify cancel exits cleanly
+
+**Steps:**
+1. Enter multi-select mode (with or without selection)
+2. Tap "Cancel" button
+
+**Expected Results:**
+- ✅ Selection message deleted
+- ✅ Returns to previous state
+- ✅ No actions executed
+
+---
+
+### Test Suite D: Error Handling
+
+#### Test D1: Failure Isolation
+**Objective:** Verify one failure doesn't abort batch
+
+**Steps:**
+1. Create a batch with intentional failure (e.g., non-existent container mixed with real ones)
+2. Example: `update plex nonexistent sonarr`
+3. Observe execution
+
+**Expected Results:**
+- ✅ First container processes
+- ✅ Failed container shows error message with reason
+- ✅ Remaining containers still process (batch continues)
+- ✅ Summary shows: "2 succeeded, 1 failed"
+- ✅ Failure details prominent in summary
+
+---
+
+#### Test D2: Warning vs Error Classification
+**Objective:** Verify non-critical warnings don't show as errors
+
+**Setup:**
+1. Stop a container: `stop plex`
+2. Try stopping again: `stop plex`
+
+**Expected Results:**
+- ✅ Shows as warning, not error
+- ✅ Message: "Already stopped" or similar
+- ✅ Summary distinguishes warnings from errors
+
+**Similar tests:**
+- Update container with no update available → Warning, not error
+- Start already-running container → Warning
+
+---
+
+### Test Suite E: Regression Tests
+
+#### Test E1: Single-Container Commands Still Work
+**Objective:** Verify no regression in existing functionality
+
+**Commands to test:**
+1. `status` → Shows container list keyboard
+2. `start plex` → Starts single container
+3. `stop plex` → Shows confirmation, then stops
+4. `restart plex` → Restarts with progress
+5. `update plex` → Updates single container
+6. `logs plex` → Shows last 50 lines
+7. `logs plex 100` → Shows last 100 lines
+
+**Expected Results:**
+- ✅ All single-container commands work exactly as before
+- ✅ No batch behavior triggered for single containers
+- ✅ Confirmation behavior unchanged (stop requires confirm, others don't)
+
+---
+
+#### Test E2: Inline Keyboard Actions Still Work
+**Objective:** Verify phase 8 keyboard functionality intact
+
+**Steps:**
+1. Send: `status`
+2. Tap a container name button
+3. Tap an action button (e.g., "▶️ Start", "Update 🔄")
+4. Complete action
+
+**Expected Results:**
+- ✅ Container detail view appears
+- ✅ Action buttons work
+- ✅ Actions execute correctly
+- ✅ No interference from batch infrastructure
+
+---
+
+#### Test E3: Pagination Still Works
+**Objective:** Verify container list pagination for many containers
+
+**Steps:**
+1. Send: `status` (if you have 10+ containers)
+2. Navigate with Previous/Next buttons
+
+**Expected Results:**
+- ✅ Pagination works correctly
+- ✅ Page numbers accurate
+- ✅ No batch selection interference
+
+**Note:** If < 10 containers, document "Cannot test - insufficient containers"
+
+---
+
+## Success Criteria Verification
+
+After completing all tests, verify these criteria are met:
+
+- [ ] **BAT-01:** User can update multiple containers in one command
+  - Tests: A1, C3
+
+- [ ] **BAT-02:** Batch updates execute sequentially with per-container feedback
+  - Tests: A1, A2, C3
+
+- [ ] **BAT-03:** "Update all" updates only containers with updates available
+  - Tests: B1, B2
+
+- [ ] **BAT-04:** "Update all" requires confirmation
+  - Tests: B1, B3, B4
+
+- [ ] **BAT-05:** One failure doesn't abort remaining batch
+  - Tests: D1
+
+- [ ] **BAT-06:** Final summary shows success/failure count
+  - Tests: A1, C3, D1
+
+- [ ] **Inline keyboard batch selection works**
+  - Tests: C1-C7
+
+- [ ] **No regression in existing commands**
+  - Tests: E1, E2, E3
+
+## Test Execution Log
+
+**Date:** ___________
+**Tester:** ___________
+**n8n Version:** ___________
+**Workflow Import Time:** ___________
+
+### Results Summary
+
+| Test | Status | Notes |
+|------|--------|-------|
+| A1: Multi-container update | ⬜ Pass / ⬜ Fail | |
+| A2: Multi-container start | ⬜ Pass / ⬜ Fail | |
+| A3: Multi-container stop | ⬜ Pass / ⬜ Fail | |
+| A4: Fuzzy matching | ⬜ Pass / ⬜ Fail / ⬜ N/A | |
+| A5: Disambiguation | ⬜ Pass / ⬜ Fail / ⬜ N/A | |
+| B1: Update all with updates | ⬜ Pass / ⬜ Fail | |
+| B2: Update all (none available) | ⬜ Pass / ⬜ Fail / ⬜ N/A | |
+| B3: Update all cancel | ⬜ Pass / ⬜ Fail | |
+| B4: Update all timeout | ⬜ Pass / ⬜ Fail | |
+| C1: Enter multi-select | ⬜ Pass / ⬜ Fail | |
+| C2: Toggle selection | ⬜ Pass / ⬜ Fail | |
+| C3: Batch update from multi-select | ⬜ Pass / ⬜ Fail | |
+| C4: Batch stop with confirm | ⬜ Pass / ⬜ Fail | |
+| C5: Selection limit | ⬜ Pass / ⬜ Fail | |
+| C6: Clear selection | ⬜ Pass / ⬜ Fail | |
+| C7: Cancel multi-select | ⬜ Pass / ⬜ Fail | |
+| D1: Failure isolation | ⬜ Pass / ⬜ Fail | |
+| D2: Warning vs error | ⬜ Pass / ⬜ Fail | |
+| E1: Single commands regression | ⬜ Pass / ⬜ Fail | |
+| E2: Inline keyboard regression | ⬜ Pass / ⬜ Fail | |
+| E3: Pagination regression | ⬜ Pass / ⬜ Fail / ⬜ N/A | |
+
+### Issues Found
+
+| Issue # | Test | Description | Severity | Status |
+|---------|------|-------------|----------|--------|
+| | | | | |
+
+### Additional Observations
+
+_Record any unexpected behavior, performance notes, UX feedback, etc._
+
+---
+
+**Overall Assessment:** ⬜ Ready for Production / ⬜ Issues Need Resolution
+
+**Notes for Phase 10:**
+_Record any polish/improvement ideas discovered during testing_