Files
unraid-docker-manager/.planning/phases/12-polish-audit/12-RESEARCH.md
T
2026-02-08 18:56:44 -05:00

20 KiB

Phase 12: Polish & Audit - Research

Researched: 2026-02-08 Domain: Unraid Docker update tracking, environment variable documentation, UAT testing Confidence: MEDIUM

Summary

Phase 12 focuses on three polish areas: (1) clearing Unraid update badges after bot-initiated container updates, (2) documenting environment variable requirements, and (3) completing deferred UAT tests from Phase 11.

The Unraid badge clearing problem is well-documented in the community but lacks official API documentation. Unraid tracks container update state separately from Docker image digests using /var/lib/docker/unraid-update-status.json. Third-party tools like Watchtower face the same issue: containers update successfully but Unraid continues showing "update available" until the user manually clicks "Check for Updates" or applies the update through Unraid UI.

Environment variable documentation should follow industry standards: clearly distinguish required vs. optional variables, document defaults, and explain when credentials can be hardcoded vs. externalized.

UAT tests deferred from Phase 11 (Update All via text command and inline keyboard button) should be executed as structured validation rather than exploratory testing, with clear pass/fail criteria.

Primary recommendation: For Unraid badge clearing, trigger Unraid's "Check for Updates" API endpoint after successful bot updates to refresh the UI state. If no API exists, document the limitation with workaround instructions. For environment variables and UAT, implement using industry-standard patterns documented below.

<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

Unraid badge clearing:

  • Unraid UI shows "update available" badge that persists after bot updates a container
  • Badge persists until user manually applies the update through Unraid UI
  • When user applies through Unraid UI after bot already pulled the image, it completes fast (just recreates container — image already cached)
  • Root cause unknown: Unraid likely has its own update tracking mechanism (not just image digest comparison) that needs to be notified
  • Research needed: Investigate how Unraid tracks container update state internally (XML files, database, API) and whether the bot can clear/notify it after a successful update
  • The bot's image pull IS working correctly — Unraid just doesn't know about it

Claude's Discretion

  • Environment config documentation approach (TELEGRAM_USERID, TELEGRAM_BOT_TOKEN clarity)
  • Curl --max-time flag consolidation strategy
  • Phase 11 deferred UAT test execution approach (Update All text command + inline keyboard)

Deferred Ideas (OUT OF SCOPE)

None — discussion stayed within phase scope

</user_constraints>

Standard Stack

Core

Library Version Purpose Why Standard
n8n Current Workflow automation platform Project foundation, no changes needed
Docker API v1.47 Container management Already in use via docker-socket-proxy
Telegram Bot API Current Bot interface Already in use with n8n credential

Supporting

Library Version Purpose When to Use
curl Static binary HTTP requests to Docker/Telegram APIs Already mounted in n8n container
Unraid API Unknown Potential update status management If programmatic badge clearing is possible

Alternatives Considered

N/A — This is a polish phase, not introducing new dependencies.

Installation:

No new libraries required. All tooling already in place.

Architecture Patterns

Unraid Update Status Tracking

Current understanding:

Unraid update tracking mechanism:
├── /var/lib/docker/unraid-update-status.json
│   └── Stores local sha256 hash information
├── "Check for Updates" button
│   └── Queries Docker registries for new digests
│   └── Updates unraid-update-status.json
└── Container XML templates
    └── <Registry> element stores Docker Registry URL

Pattern 1: Post-Update Notification

What: After bot successfully pulls image and recreates container, trigger Unraid's update check mechanism

When to use: If Unraid exposes an API endpoint or file-based trigger

Implementation options:

# Option A: Delete update status file to force refresh
rm /var/lib/docker/unraid-update-status.json
# Unraid regenerates on next check

# Option B: Call Unraid API (if exists)
curl -X POST http://localhost/api/docker/check-updates

# Option C: Update JSON directly
# Read unraid-update-status.json, modify specific container entry, write back
# RISK: Schema unknown, could break Unraid's tracking

Pattern 2: Workaround Documentation

What: If no programmatic solution exists, document the limitation and provide user instructions

When to use: If Unraid doesn't expose APIs and file manipulation is too risky

Example documentation:

## Known Limitation: Unraid Update Badges

When the bot updates a container, Unraid's UI may continue showing
"update available" until you manually click "Apply Update" in the
Unraid Docker tab. This is safe — the image is already cached, so
Unraid will recreate the container instantly.

**Why this happens:** Unraid tracks updates separately from Docker
image digests. The bot pulls the image correctly, but Unraid's
tracking file isn't notified.

**Workaround:** Click "Apply Update" in Unraid UI, or click "Check
for Updates" to refresh the status.

Environment Variable Documentation

Pattern 3: Required vs. Optional Classification

What: Clear documentation distinguishing required variables (fail-fast if missing) from optional variables (have defaults)

Standard structure:

## Environment Variables

### Required

| Variable | Description | Example |
|----------|-------------|---------|
| N8N_HOST | n8n instance URL | `http://localhost:5678` |
| N8N_API_KEY | n8n API key for workflow management | `abc123...` |

**Note:** Application fails on startup if required variables are missing.

### Optional

| Variable | Description | Default |
|----------|-------------|---------|
| TELEGRAM_BOT_TOKEN | Telegram bot token (if not using n8n credential) | None — uses n8n credential |
| TELEGRAM_USERID | Authorized user ID (if not hardcoded in workflow) | Hardcoded in IF nodes |

**Current implementation:** Both TELEGRAM_BOT_TOKEN and TELEGRAM_USERID
are optional:
- TELEGRAM_BOT_TOKEN: Bot token stored in n8n credential (ID: I0xTTiASl7C1NZhJ),
  OR provided via environment variable for HTTP Request nodes
- TELEGRAM_USERID: Hardcoded in workflow IF nodes (User Authenticated,
  Callback Authenticated), OR could be externalized via environment variable

**Recommendation:** Keep current approach (n8n credential + hardcoded user ID)
and document clearly.

UAT Test Execution

Pattern 4: Structured Deferred Test Execution

What: Execute deferred UAT tests with clear pass/fail criteria, not exploratory testing

Test structure:

## UAT Test: Update All via Text Command

**Preconditions:**
- Workflow active and deployed
- At least 2 containers with :latest tag
- At least 1 container with available update

**Test Steps:**
1. Send "update all" to bot
2. Verify confirmation message appears with container count
3. Tap "Confirm" button
4. Verify progress messages during batch execution
5. Verify final summary message with success/failure counts

**Expected Results:**
- Confirmation lists all :latest containers
- Progress updates every N seconds
- Summary shows N succeeded, 0 failed
- Each container shows new digest in /status

**Pass Criteria:** All steps complete without errors

---

## UAT Test: Update All via Inline Keyboard

**Preconditions:**
- Same as above

**Test Steps:**
1. Send "/status" or "/list" to bot
2. Tap "Update All" button in inline keyboard
3. (Same verification as text command test)

**Expected Results:**
- (Same as text command test)

**Pass Criteria:** All steps complete without errors

Anti-Patterns to Avoid

  • Modifying unraid-update-status.json without schema documentation: Risk breaking Unraid's tracking system
  • Using environment variables without documenting defaults: Users don't know what's required vs. optional
  • Exploratory UAT without pass/fail criteria: Can't determine if tests passed or issues need fixing

Don't Hand-Roll

Problem Don't Build Use Instead Why
Unraid update tracking Custom JSON file parser/writer Unraid API (if exists) or documented workaround File schema undocumented, could break with Unraid updates
Environment variable validation Ad-hoc checks in workflow Standard documentation pattern Industry standard, users expect clear required/optional distinction
UAT test management Informal checklist Structured test cases with pass/fail criteria Ensures completeness, provides evidence of validation

Key insight: Polish phases are about finishing well-understood work, not inventing new mechanisms. Use standard patterns and document limitations honestly.

Common Pitfalls

Pitfall 1: Assuming Unraid API Exists

What goes wrong: Spending time searching for an official Unraid API endpoint for update status when it may not exist

Why it happens: Unraid is a commercial product with incomplete public API documentation

How to avoid:

  1. Search Unraid forums and GitHub for "update status API" or "docker update notification"
  2. Check /var/lib/docker/unraid-update-status.json schema by examining actual file
  3. If no clear solution in 1-2 hours of research, pivot to documentation workaround

Warning signs: No official Unraid API docs mention update status management, community discussions focus on manual workarounds

Pitfall 2: Over-Engineering Environment Variable Documentation

What goes wrong: Creating complex validation logic or multiple configuration modes when simple documentation suffices

Why it happens: Desire for "proper" configuration management without considering current implementation simplicity

How to avoid: Current implementation (n8n credential + hardcoded user ID) works fine. Document what exists, recommend keeping it simple.

Warning signs: Considering adding .env file parsing, validation layers, or multi-mode configuration

Pitfall 3: Treating UAT as Open-Ended Testing

What goes wrong: Deferred tests become exploratory sessions without clear completion criteria

Why it happens: "Testing" sounds exploratory, but these are specific feature validations

How to avoid: Write structured test cases BEFORE executing. Each test needs: preconditions, steps, expected results, pass/fail criteria.

Warning signs: Starting testing without written test plan, unsure when testing is "done"

Pitfall 4: Confusing DEBT-02 Status

What goes wrong: Attempting to fix "duplicate --max-time flags" when the issue may already be resolved

Why it happens: Requirements doc still lists DEBT-02 as pending, but code investigation shows only one --max-time 600 flag

How to avoid: Verify current state BEFORE planning fixes. Grep the codebase to confirm the issue exists.

Warning signs: Planning tasks to fix non-existent problems

Code Examples

Unraid Update Status File Investigation

# Check if file exists and examine structure
cat /var/lib/docker/unraid-update-status.json | python3 -c "
import json, sys
data = json.load(sys.stdin)
print('Keys:', list(data.keys()))
print('Sample entry:', json.dumps(list(data.values())[0] if data else {}, indent=2))
"

# Test safe deletion (Unraid regenerates on next check)
mv /var/lib/docker/unraid-update-status.json /var/lib/docker/unraid-update-status.json.backup
# Trigger Unraid "Check for Updates" manually
# Verify file regenerated

Environment Variable Documentation (README.md pattern)

## Configuration

### Telegram Credentials

**Option 1: n8n Credential (Recommended)**

Store bot token in n8n's credential manager:
1. Settings > Credentials > Add Credential
2. Type: Telegram API
3. Name: `Telegram API`
4. Access Token: Your bot token from @BotFather

This is the current implementation. HTTP Request nodes use
`{{ $env.TELEGRAM_BOT_TOKEN }}` as fallback only.

**Option 2: Environment Variable**

If not using n8n credential, set:

TELEGRAM_BOT_TOKEN=your_bot_token_here


### User Authorization

The workflow is hardcoded to respond only to Telegram user ID
configured in these nodes:
- IF User Authenticated
- IF Callback Authenticated

**To change authorized user:**
1. Get your Telegram user ID from @userinfobot
2. Edit workflow, find IF User Authenticated node
3. Change `rightValue` from current ID to yours
4. Repeat for IF Callback Authenticated node
5. Save workflow

**Alternative:** Could be externalized to `TELEGRAM_USERID` environment
variable, but current hardcoded approach is simpler and explicit.

UAT Test Case Template

# UAT Test Case: [Feature Name]

**Test ID:** UAT-12-01
**Feature:** Update All via Text Command
**Priority:** High (deferred from Phase 11)

## Preconditions

- [ ] Workflow deployed and active
- [ ] Test environment has 3+ containers with :latest tag
- [ ] At least 1 container has available update (verify with /status)

## Test Data

Containers for testing:
- Container A: [name], status: [up-to-date/update available]
- Container B: [name], status: [up-to-date/update available]
- Container C: [name], status: [up-to-date/update available]

## Test Steps

| Step | Action | Expected Result | Actual Result | Pass/Fail |
|------|--------|----------------|---------------|-----------|
| 1 | Send "update all" to bot | Confirmation message appears listing all :latest containers | | |
| 2 | Count containers in confirmation | Count matches actual :latest container count | | |
| 3 | Tap "Confirm" button | Progress message appears | | |
| 4 | Wait for batch execution | Progress updates every ~5 seconds | | |
| 5 | Wait for completion | Final summary shows N succeeded, 0 failed | | |
| 6 | Send "/status [container]" for each | Each shows updated digest or "up to date" | | |

## Pass Criteria

- All steps complete without errors
- All :latest containers updated or reported as up-to-date
- No timeout errors or missing responses
- Summary matches actual results

## Defects Found

[Log any issues discovered during testing]

## Test Result

- [ ] PASS — All criteria met
- [ ] FAIL — Issues found (see defects)
- [ ] BLOCKED — Cannot test (explain)

**Tested by:** [Name]
**Date:** [Date]

State of the Art

Unraid Docker Update Tracking (as of 2026)

Aspect Current Approach Community Workarounds Impact
Update detection Unraid checks Docker registries via "Check for Updates" button Third-party tools (Watchtower) face same badge persistence issue Bot updates work correctly, UI just doesn't reflect it
Status persistence /var/lib/docker/unraid-update-status.json stores sha256 hashes Delete file to force refresh, or manually apply update in UI Known limitation with documented workarounds
API availability No official public API for update status management Community relies on file manipulation or manual UI actions Must document limitation rather than automate

Deprecated/outdated:

  • Direct /var/run/docker.sock mounting: Replaced with docker-socket-proxy for security (already implemented in project)

Open Questions

Question 1: Can Unraid's update status be refreshed programmatically?

What we know:

  • File location: /var/lib/docker/unraid-update-status.json
  • Community reports deleting file forces refresh
  • Unraid regenerates file on "Check for Updates" click
  • Third-party updaters (Watchtower) can't solve this either

What's unclear:

  • Is there a safe API endpoint to trigger update check?
  • What's the exact JSON schema of the status file?
  • Would modifying the file directly break Unraid's integrity checks?
  • Is there a file-based trigger (like a flag file) Unraid monitors?

Recommendation:

  1. Examine actual unraid-update-status.json on test system
  2. Search Unraid GitHub/forums for "update status refresh API"
  3. If no clear solution after 2 hours research → document as known limitation
  4. Provide workaround instructions in README (manual "Apply Update" is fast since image cached)

Question 2: Is DEBT-02 already fixed?

What we know:

  • DEBT-02: "Fix duplicate --max-time flags in image pull command"
  • Current Pull Image command: curl -s --max-time 600 -X POST ...
  • Only ONE --max-time flag found in n8n-update.json

What's unclear:

  • Was this fixed in a previous phase and not marked complete?
  • Did the requirement description mean something else?
  • Should we verify all HTTP Request node timeout settings?

Recommendation:

  1. Grep all n8n-*.json files for --max-time occurrences
  2. If only one occurrence exists, mark DEBT-02 as already complete
  3. Update REQUIREMENTS.md traceability section
  4. No code changes needed

Question 3: How comprehensive should UAT tests be?

What we know:

  • Two specific tests deferred: "Update All text command" and "Update All inline keyboard"
  • Code already deployed in Phase 11
  • These are validation tests, not exploratory

What's unclear:

  • Should we test only happy path or include error cases?
  • Should we test with 0 available updates, 1 update, multiple updates?
  • Should we verify Unraid badge status after update (ties to Question 1)?

Recommendation:

  • Focus on happy path: 2+ containers with updates available
  • Test both entry points (text command + inline keyboard button)
  • Verify success criteria from Phase 11: confirmation, progress, summary
  • Error cases already covered by Phase 10/11 UAT
  • Unraid badge verification: document as known limitation if no solution found

Sources

Primary (HIGH confidence)

  • Current project codebase: n8n-workflow.json, n8n-update.json, README.md, DEPLOY-SUBWORKFLOWS.md (examined directly)
  • Requirements: .planning/REQUIREMENTS.md, ROADMAP.md (examined directly)

Secondary (MEDIUM confidence)

Tertiary (LOW confidence)

  • Unraid API documentation search results — No official public API docs found for update status management (negative finding, but valuable)

Metadata

Confidence breakdown:

  • Unraid update badge clearing: MEDIUM — Community workarounds documented, but no official API found. File location and behavior confirmed by multiple sources.
  • Environment variable documentation: HIGH — Industry standards well-established, current implementation understood from codebase examination.
  • UAT test execution: HIGH — Standard structured testing patterns, test scope clearly defined in Phase 11 requirements.
  • DEBT-02 status: HIGH — Code examination confirms only one --max-time flag exists, likely already resolved.

Research date: 2026-02-08

Valid until: 30 days (stable domain — Unraid update mechanism unlikely to change rapidly, env var patterns are industry standards)

Research completion notes:

  • Unraid badge clearing requires further investigation of actual server file system
  • DEBT-02 appears already fixed, needs verification task only
  • UAT tests are well-scoped and ready for execution
  • Environment variable documentation needs clarification, not code changes