Files
unraid-docker-manager/.planning/phases/12-polish-audit/12-RESEARCH.md
T
Lucas Berger 1ef726942a docs(12): plan phase with Unraid badge research and UAT
Research found Unraid badge issue is architectural (bot bypasses
Unraid's XML template system). Updated plans to document limitation
with workaround instead of attempting programmatic fix. Plan 01
covers docs/env/debt, Plan 02 covers deferred Update All UAT.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 18:56:44 -05:00

22 KiB

Phase 12: Polish & Audit - Research

Researched: 2026-02-08 Domain: Unraid Docker update tracking, environment variable documentation, UAT testing Confidence: MEDIUM

Summary

Phase 12 focuses on three polish areas: (1) clearing Unraid update badges after bot-initiated container updates, (2) documenting environment variable requirements, and (3) completing deferred UAT tests from Phase 11.

The Unraid badge clearing problem is well-documented in the community but lacks official API documentation. Unraid tracks container update state separately from Docker image digests using /var/lib/docker/unraid-update-status.json. Third-party tools like Watchtower face the same issue: containers update successfully but Unraid continues showing "update available" until the user manually clicks "Check for Updates" or applies the update through Unraid UI.

Environment variable documentation should follow industry standards: clearly distinguish required vs. optional variables, document defaults, and explain when credentials can be hardcoded vs. externalized.

UAT tests deferred from Phase 11 (Update All via text command and inline keyboard button) should be executed as structured validation rather than exploratory testing, with clear pass/fail criteria.

Primary recommendation: For Unraid badge clearing, trigger Unraid's "Check for Updates" API endpoint after successful bot updates to refresh the UI state. If no API exists, document the limitation with workaround instructions. For environment variables and UAT, implement using industry-standard patterns documented below.

<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

Unraid badge clearing:

  • Unraid UI shows "update available" badge that persists after bot updates a container
  • Badge persists until user manually applies the update through Unraid UI
  • When user applies through Unraid UI after bot already pulled the image, it completes fast (just recreates container — image already cached)
  • Root cause unknown: Unraid likely has its own update tracking mechanism (not just image digest comparison) that needs to be notified
  • Research needed: Investigate how Unraid tracks container update state internally (XML files, database, API) and whether the bot can clear/notify it after a successful update
  • The bot's image pull IS working correctly — Unraid just doesn't know about it

Claude's Discretion

  • Environment config documentation approach (TELEGRAM_USERID, TELEGRAM_BOT_TOKEN clarity)
  • Curl --max-time flag consolidation strategy
  • Phase 11 deferred UAT test execution approach (Update All text command + inline keyboard)

Deferred Ideas (OUT OF SCOPE)

None — discussion stayed within phase scope

</user_constraints>

Standard Stack

Core

Library Version Purpose Why Standard
n8n Current Workflow automation platform Project foundation, no changes needed
Docker API v1.47 Container management Already in use via docker-socket-proxy
Telegram Bot API Current Bot interface Already in use with n8n credential

Supporting

Library Version Purpose When to Use
curl Static binary HTTP requests to Docker/Telegram APIs Already mounted in n8n container
Unraid API Unknown Potential update status management If programmatic badge clearing is possible

Alternatives Considered

N/A — This is a polish phase, not introducing new dependencies.

Installation:

No new libraries required. All tooling already in place.

Architecture Patterns

Unraid Update Status Tracking

UPDATED based on user testing (2026-02-08):

User tested with code-server container. Key observations:

  1. Bot pulled update for code-server that wasn't yet showing in Unraid UI
  2. After clicking "Check for Updates" in Unraid, it went from "up to date" to "apply update"
  3. Clicking "Apply Update" just restarted the container (image was already cached)
  4. "Check for Updates" does NOT clear the badge — it actually CREATES it

Root cause (revised): Unraid compares the running container's image ID against the latest pulled image. When the bot pulls a new image and recreates the container via Docker API, Unraid's template system doesn't register the recreation. On next "Check for Updates", Unraid sees the new image is available and flags the container — even though the container may already be running the new image.

The issue is that Unraid manages containers through its own XML template system (/boot/config/plugins/dockerMan/templates-user/). When Unraid "applies" an update, it:

  1. Removes the old container
  2. Pulls the image (already cached = instant)
  3. Recreates the container FROM ITS XML TEMPLATE
  4. Updates its internal tracking

The bot bypasses this template system entirely — it uses Docker API directly. Unraid doesn't know the container was recreated.

Unraid update tracking mechanism (revised):
├── /boot/config/plugins/dockerMan/templates-user/my-*.xml
│   └── Container creation templates (ports, volumes, env vars)
├── /var/lib/docker/unraid-update-status.json
│   └── Stores image digest comparison state
├── "Check for Updates" button
│   └── Compares running container image ID vs registry latest
│   └── If different → shows "apply update" (even if bot already updated)
├── "Apply Update" button
│   └── Removes container, pulls image (cached), recreates from XML template
│   └── Updates tracking state
└── Key insight: "Check for Updates" does NOT clear badges, it CREATES them

Pattern 1: Update via Unraid's Template System

What: After pulling the image, use Unraid's own update mechanism to recreate the container

When to use: If Unraid exposes an API endpoint to apply updates

Potential approach: Unraid's web UI makes HTTP calls to its own emhttp backend. The "Apply Update" button likely hits an endpoint like /plugins/dynamix.docker.manager/include/CreateDocker.php or similar. If we can replicate that call, Unraid would properly track the update.

Caveat: The bot would need network access to the Unraid web UI (port 80/443), which is separate from the Docker socket proxy.

Pattern 2: Workaround Documentation (RECOMMENDED)

What: Document the limitation honestly and explain why "Apply Update" is instant

When to use: Default approach — programmatic solution requires Unraid API access which adds significant complexity

## Known Limitation: Unraid Update Badges

After the bot updates a container, Unraid's UI may show "apply update"
on the next update check. This is expected — Unraid tracks container
updates through its own template system, which the bot bypasses.

**Why this happens:** The bot uses the Docker API directly to pull images
and recreate containers. Unraid doesn't know the container was updated
because it wasn't done through Unraid's template system.

**What to do:** Click "Apply Update" in Unraid's Docker tab. It completes
instantly because the image is already cached — Unraid just recreates the
container from its template to sync its tracking state.

**Note:** "Check for Updates" does NOT clear the badge. It may actually
cause a badge to appear if the bot updated a container that Unraid hadn't
checked yet.

Environment Variable Documentation

Pattern 3: Required vs. Optional Classification

What: Clear documentation distinguishing required variables (fail-fast if missing) from optional variables (have defaults)

Standard structure:

## Environment Variables

### Required

| Variable | Description | Example |
|----------|-------------|---------|
| N8N_HOST | n8n instance URL | `http://localhost:5678` |
| N8N_API_KEY | n8n API key for workflow management | `abc123...` |

**Note:** Application fails on startup if required variables are missing.

### Optional

| Variable | Description | Default |
|----------|-------------|---------|
| TELEGRAM_BOT_TOKEN | Telegram bot token (if not using n8n credential) | None — uses n8n credential |
| TELEGRAM_USERID | Authorized user ID (if not hardcoded in workflow) | Hardcoded in IF nodes |

**Current implementation:** Both TELEGRAM_BOT_TOKEN and TELEGRAM_USERID
are optional:
- TELEGRAM_BOT_TOKEN: Bot token stored in n8n credential (ID: I0xTTiASl7C1NZhJ),
  OR provided via environment variable for HTTP Request nodes
- TELEGRAM_USERID: Hardcoded in workflow IF nodes (User Authenticated,
  Callback Authenticated), OR could be externalized via environment variable

**Recommendation:** Keep current approach (n8n credential + hardcoded user ID)
and document clearly.

UAT Test Execution

Pattern 4: Structured Deferred Test Execution

What: Execute deferred UAT tests with clear pass/fail criteria, not exploratory testing

Test structure:

## UAT Test: Update All via Text Command

**Preconditions:**
- Workflow active and deployed
- At least 2 containers with :latest tag
- At least 1 container with available update

**Test Steps:**
1. Send "update all" to bot
2. Verify confirmation message appears with container count
3. Tap "Confirm" button
4. Verify progress messages during batch execution
5. Verify final summary message with success/failure counts

**Expected Results:**
- Confirmation lists all :latest containers
- Progress updates every N seconds
- Summary shows N succeeded, 0 failed
- Each container shows new digest in /status

**Pass Criteria:** All steps complete without errors

---

## UAT Test: Update All via Inline Keyboard

**Preconditions:**
- Same as above

**Test Steps:**
1. Send "/status" or "/list" to bot
2. Tap "Update All" button in inline keyboard
3. (Same verification as text command test)

**Expected Results:**
- (Same as text command test)

**Pass Criteria:** All steps complete without errors

Anti-Patterns to Avoid

  • Modifying unraid-update-status.json without schema documentation: Risk breaking Unraid's tracking system
  • Using environment variables without documenting defaults: Users don't know what's required vs. optional
  • Exploratory UAT without pass/fail criteria: Can't determine if tests passed or issues need fixing

Don't Hand-Roll

Problem Don't Build Use Instead Why
Unraid update tracking Custom JSON file parser/writer Unraid API (if exists) or documented workaround File schema undocumented, could break with Unraid updates
Environment variable validation Ad-hoc checks in workflow Standard documentation pattern Industry standard, users expect clear required/optional distinction
UAT test management Informal checklist Structured test cases with pass/fail criteria Ensures completeness, provides evidence of validation

Key insight: Polish phases are about finishing well-understood work, not inventing new mechanisms. Use standard patterns and document limitations honestly.

Common Pitfalls

Pitfall 1: Assuming Unraid API Exists

What goes wrong: Spending time searching for an official Unraid API endpoint for update status when it may not exist

Why it happens: Unraid is a commercial product with incomplete public API documentation

How to avoid:

  1. Search Unraid forums and GitHub for "update status API" or "docker update notification"
  2. Check /var/lib/docker/unraid-update-status.json schema by examining actual file
  3. If no clear solution in 1-2 hours of research, pivot to documentation workaround

Warning signs: No official Unraid API docs mention update status management, community discussions focus on manual workarounds

Pitfall 2: Over-Engineering Environment Variable Documentation

What goes wrong: Creating complex validation logic or multiple configuration modes when simple documentation suffices

Why it happens: Desire for "proper" configuration management without considering current implementation simplicity

How to avoid: Current implementation (n8n credential + hardcoded user ID) works fine. Document what exists, recommend keeping it simple.

Warning signs: Considering adding .env file parsing, validation layers, or multi-mode configuration

Pitfall 3: Treating UAT as Open-Ended Testing

What goes wrong: Deferred tests become exploratory sessions without clear completion criteria

Why it happens: "Testing" sounds exploratory, but these are specific feature validations

How to avoid: Write structured test cases BEFORE executing. Each test needs: preconditions, steps, expected results, pass/fail criteria.

Warning signs: Starting testing without written test plan, unsure when testing is "done"

Pitfall 4: Confusing DEBT-02 Status

What goes wrong: Attempting to fix "duplicate --max-time flags" when the issue may already be resolved

Why it happens: Requirements doc still lists DEBT-02 as pending, but code investigation shows only one --max-time 600 flag

How to avoid: Verify current state BEFORE planning fixes. Grep the codebase to confirm the issue exists.

Warning signs: Planning tasks to fix non-existent problems

Code Examples

Unraid Update Status File Investigation

# Check if file exists and examine structure
cat /var/lib/docker/unraid-update-status.json | python3 -c "
import json, sys
data = json.load(sys.stdin)
print('Keys:', list(data.keys()))
print('Sample entry:', json.dumps(list(data.values())[0] if data else {}, indent=2))
"

# Test safe deletion (Unraid regenerates on next check)
mv /var/lib/docker/unraid-update-status.json /var/lib/docker/unraid-update-status.json.backup
# Trigger Unraid "Check for Updates" manually
# Verify file regenerated

Environment Variable Documentation (README.md pattern)

## Configuration

### Telegram Credentials

**Option 1: n8n Credential (Recommended)**

Store bot token in n8n's credential manager:
1. Settings > Credentials > Add Credential
2. Type: Telegram API
3. Name: `Telegram API`
4. Access Token: Your bot token from @BotFather

This is the current implementation. HTTP Request nodes use
`{{ $env.TELEGRAM_BOT_TOKEN }}` as fallback only.

**Option 2: Environment Variable**

If not using n8n credential, set:

TELEGRAM_BOT_TOKEN=your_bot_token_here


### User Authorization

The workflow is hardcoded to respond only to Telegram user ID
configured in these nodes:
- IF User Authenticated
- IF Callback Authenticated

**To change authorized user:**
1. Get your Telegram user ID from @userinfobot
2. Edit workflow, find IF User Authenticated node
3. Change `rightValue` from current ID to yours
4. Repeat for IF Callback Authenticated node
5. Save workflow

**Alternative:** Could be externalized to `TELEGRAM_USERID` environment
variable, but current hardcoded approach is simpler and explicit.

UAT Test Case Template

# UAT Test Case: [Feature Name]

**Test ID:** UAT-12-01
**Feature:** Update All via Text Command
**Priority:** High (deferred from Phase 11)

## Preconditions

- [ ] Workflow deployed and active
- [ ] Test environment has 3+ containers with :latest tag
- [ ] At least 1 container has available update (verify with /status)

## Test Data

Containers for testing:
- Container A: [name], status: [up-to-date/update available]
- Container B: [name], status: [up-to-date/update available]
- Container C: [name], status: [up-to-date/update available]

## Test Steps

| Step | Action | Expected Result | Actual Result | Pass/Fail |
|------|--------|----------------|---------------|-----------|
| 1 | Send "update all" to bot | Confirmation message appears listing all :latest containers | | |
| 2 | Count containers in confirmation | Count matches actual :latest container count | | |
| 3 | Tap "Confirm" button | Progress message appears | | |
| 4 | Wait for batch execution | Progress updates every ~5 seconds | | |
| 5 | Wait for completion | Final summary shows N succeeded, 0 failed | | |
| 6 | Send "/status [container]" for each | Each shows updated digest or "up to date" | | |

## Pass Criteria

- All steps complete without errors
- All :latest containers updated or reported as up-to-date
- No timeout errors or missing responses
- Summary matches actual results

## Defects Found

[Log any issues discovered during testing]

## Test Result

- [ ] PASS — All criteria met
- [ ] FAIL — Issues found (see defects)
- [ ] BLOCKED — Cannot test (explain)

**Tested by:** [Name]
**Date:** [Date]

State of the Art

Unraid Docker Update Tracking (as of 2026)

Aspect Current Approach Community Workarounds Impact
Update detection Unraid checks Docker registries via "Check for Updates" button Third-party tools (Watchtower) face same badge persistence issue Bot updates work correctly, UI just doesn't reflect it
Status persistence /var/lib/docker/unraid-update-status.json stores sha256 hashes Delete file to force refresh, or manually apply update in UI Known limitation with documented workarounds
API availability No official public API for update status management Community relies on file manipulation or manual UI actions Must document limitation rather than automate

Deprecated/outdated:

  • Direct /var/run/docker.sock mounting: Replaced with docker-socket-proxy for security (already implemented in project)

Open Questions

Question 1: Can Unraid's update status be refreshed programmatically?

What we know:

  • File location: /var/lib/docker/unraid-update-status.json
  • Community reports deleting file forces refresh
  • Unraid regenerates file on "Check for Updates" click
  • Third-party updaters (Watchtower) can't solve this either

What's unclear:

  • Is there a safe API endpoint to trigger update check?
  • What's the exact JSON schema of the status file?
  • Would modifying the file directly break Unraid's integrity checks?
  • Is there a file-based trigger (like a flag file) Unraid monitors?

Recommendation:

  1. Examine actual unraid-update-status.json on test system
  2. Search Unraid GitHub/forums for "update status refresh API"
  3. If no clear solution after 2 hours research → document as known limitation
  4. Provide workaround instructions in README (manual "Apply Update" is fast since image cached)

Question 2: Is DEBT-02 already fixed?

What we know:

  • DEBT-02: "Fix duplicate --max-time flags in image pull command"
  • Current Pull Image command: curl -s --max-time 600 -X POST ...
  • Only ONE --max-time flag found in n8n-update.json

What's unclear:

  • Was this fixed in a previous phase and not marked complete?
  • Did the requirement description mean something else?
  • Should we verify all HTTP Request node timeout settings?

Recommendation:

  1. Grep all n8n-*.json files for --max-time occurrences
  2. If only one occurrence exists, mark DEBT-02 as already complete
  3. Update REQUIREMENTS.md traceability section
  4. No code changes needed

Question 3: How comprehensive should UAT tests be?

What we know:

  • Two specific tests deferred: "Update All text command" and "Update All inline keyboard"
  • Code already deployed in Phase 11
  • These are validation tests, not exploratory

What's unclear:

  • Should we test only happy path or include error cases?
  • Should we test with 0 available updates, 1 update, multiple updates?
  • Should we verify Unraid badge status after update (ties to Question 1)?

Recommendation:

  • Focus on happy path: 2+ containers with updates available
  • Test both entry points (text command + inline keyboard button)
  • Verify success criteria from Phase 11: confirmation, progress, summary
  • Error cases already covered by Phase 10/11 UAT
  • Unraid badge verification: document as known limitation if no solution found

Sources

Primary (HIGH confidence)

  • Current project codebase: n8n-workflow.json, n8n-update.json, README.md, DEPLOY-SUBWORKFLOWS.md (examined directly)
  • Requirements: .planning/REQUIREMENTS.md, ROADMAP.md (examined directly)

Secondary (MEDIUM confidence)

Tertiary (LOW confidence)

  • Unraid API documentation search results — No official public API docs found for update status management (negative finding, but valuable)

Metadata

Confidence breakdown:

  • Unraid update badge clearing: MEDIUM — Community workarounds documented, but no official API found. File location and behavior confirmed by multiple sources.
  • Environment variable documentation: HIGH — Industry standards well-established, current implementation understood from codebase examination.
  • UAT test execution: HIGH — Standard structured testing patterns, test scope clearly defined in Phase 11 requirements.
  • DEBT-02 status: HIGH — Code examination confirms only one --max-time flag exists, likely already resolved.

Research date: 2026-02-08

Valid until: 30 days (stable domain — Unraid update mechanism unlikely to change rapidly, env var patterns are industry standards)

Research completion notes:

  • Unraid badge clearing requires further investigation of actual server file system
  • DEBT-02 appears already fixed, needs verification task only
  • UAT tests are well-scoped and ready for execution
  • Environment variable documentation needs clarification, not code changes