Files
unraid-docker-manager/.planning/research/ARCHITECTURE.md
T

501 lines
23 KiB
Markdown

# Architecture Research: Unraid Update Status Sync Integration
**Domain:** Telegram Bot Docker Management Extension
**Researched:** 2026-02-08
**Confidence:** MEDIUM
## Integration Overview
This research focuses on integrating Unraid update status sync into the existing 287-node n8n workflow system (1 main + 7 sub-workflows). The goal is to clear Unraid's "update available" badges after the bot successfully updates a container.
### Current Architecture (Baseline)
```
┌─────────────────────────────────────────────────────────────┐
│ Telegram User │
└───────────────────────┬─────────────────────────────────────┘
┌───────────────────────▼─────────────────────────────────────┐
│ n8n-workflow.json (Main) │
│ 166 nodes │
├─────────────────────────────────────────────────────────────┤
│ Telegram Trigger → Auth → Correlation ID → Keyword Router │
│ │ │
│ Execute Workflow nodes (17) │
│ │ │
│ ┌────────────┼────────────┐ │
│ ▼ ▼ ▼ │
│ n8n-update n8n-actions n8n-status │
│ (34 nodes) (11 nodes) (11 nodes) │
│ │ │ │ │
│ [5 more sub-workflows: logs, batch-ui, │
│ confirmation, matching] │
└───────────────────────┬─────────────────────────────────────┘
┌───────────────────────▼─────────────────────────────────────┐
│ docker-socket-proxy:2375 │
│ (Tecnativa/LinuxServer) │
└───────────────────────┬─────────────────────────────────────┘
┌───────────────────────▼─────────────────────────────────────┐
│ Docker Engine (Unraid Host) │
└─────────────────────────────────────────────────────────────┘
```
### Target Architecture (With Unraid Sync)
```
┌─────────────────────────────────────────────────────────────┐
│ Telegram User │
└───────────────────────┬─────────────────────────────────────┘
┌───────────────────────▼─────────────────────────────────────┐
│ n8n-workflow.json (Main) │
│ 166 nodes │
├─────────────────────────────────────────────────────────────┤
│ Telegram Trigger → Auth → Correlation ID → Keyword Router │
│ │ │
│ Execute Workflow nodes (17) │
│ │ │
│ ┌────────────┼────────────┐ │
│ ▼ ▼ ▼ │
│ n8n-update n8n-actions n8n-status │
│ (34 nodes) (11 nodes) (11 nodes) │
│ │ │ │ │
│ [5 more sub-workflows] │
└───────────┬───────────────────────┬─────────────────────────┘
│ │
│ ┌───▼──────────────────────┐
│ │ NEW: Clear Unraid │
│ │ Update Status (node) │
│ └───┬──────────────────────┘
│ │
┌───────────▼───────────────────────▼─────────────────────────┐
│ docker-socket-proxy:2375 │
│ (Tecnativa/LinuxServer) │
└───────────────────────┬─────────────────────────────────────┘
┌───────────────────────▼─────────────────────────────────────┐
│ Docker Engine (Unraid Host) │
│ │
│ /var/lib/docker/unraid-update-status.json ← DELETE HERE │
└─────────────────────────────────────────────────────────────┘
```
## Integration Points
### Where to Add Sync Logic
**Option A: Extend n8n-update.json sub-workflow (RECOMMENDED)**
| Pros | Cons |
|------|------|
| Single responsibility: update sub-workflow owns all update-related actions | Couples Unraid-specific logic to generic update flow |
| Minimal changes to main workflow | Breaks if called from non-Unraid systems (not a concern here) |
| Sync executes immediately after successful update | None significant |
| Easier to test in isolation | |
**Option B: Add to main workflow after Execute Update returns**
| Pros | Cons |
|------|------|
| Keeps Unraid logic separate from generic update | More complex routing in main workflow |
| Could conditionally enable based on environment | Requires checking sub-workflow success result |
| | Adds latency between update and sync |
| | Harder to test (requires full workflow execution) |
**Option C: New sub-workflow (n8n-unraid-sync.json)**
| Pros | Cons |
|------|------|
| Complete separation of concerns | Overkill for single operation (file deletion) |
| Reusable if other Unraid integrations added | Adds 8th sub-workflow to manage |
| | Main workflow needs new Execute Workflow node |
| | Extra complexity for minimal benefit |
**Recommendation:** Option A (extend n8n-update.json) because:
1. Sync is tightly coupled to update success
2. Single responsibility: "update a container AND clear its Unraid status"
3. Minimal architectural impact
4. Easiest to test and maintain
### Modification to n8n-update.json
**Current End State (Return Success node, line 499):**
```javascript
return {
json: {
success: true,
updated: true,
message: data.message,
oldDigest: data.oldDigest,
newDigest: data.newDigest,
correlationId: data.correlationId || ''
}
};
```
**New Flow:**
```
Remove Old Image (Success) → Clear Unraid Status → Return Success
(Execute Command node)
rm -f /var/lib/docker/unraid-update-status.json
```
**Node Addition (1 new node):**
- **Type:** Execute Command
- **Position:** After "Remove Old Image (Success)", before "Return Success"
- **Command:** `rm -f /var/lib/docker/unraid-update-status.json`
- **Error Handling:** `continueRegularOutput` (don't fail update if sync fails)
- **Total Nodes:** 34 → 35
## Data Flow
### Update Completion to Unraid Status Clear
```
1. User: "update plex"
2. Main workflow → n8n-update.json
3. Update sub-workflow:
Inspect Container → Parse Config → Pull Image → Check Pull Success
→ Inspect New Image → Compare Digests → Stop Container
→ Remove Container → Build Create Body → Create Container
→ Start Container → Format Update Success → Send Response
→ Remove Old Image (Success)
4. NEW: Clear Unraid Status
Execute Command: rm -f /var/lib/docker/unraid-update-status.json
5. Return Success (existing)
6. Main workflow routes result to Telegram response
```
**Key Properties:**
- Sync happens AFTER container is updated and old image removed
- Sync happens BEFORE sub-workflow returns (ensures completion)
- Sync failure does NOT fail the update (onError: continueRegularOutput)
- User receives success message regardless of sync status
### Why Delete the Entire Status File?
**Unraid's Update Tracking Mechanism:**
1. Unraid stores update status in `/var/lib/docker/unraid-update-status.json`
2. File contains: `{"containerName": "true|false", ...}` (true = updated, false = needs update)
3. When bot updates externally, Unraid's file is stale
4. Unraid only checks for **newly available** updates, not "containers now current"
5. Deleting file forces Unraid to recheck ALL containers on next "Check for Updates"
**Why Not Modify the JSON?**
- File format is internal to Unraid's DockerClient.php
- Could change between Unraid versions
- Parsing/modifying JSON from Execute Command is fragile
- Deletion is simpler and forces full recheck (HIGH confidence)
**User Impact:**
- After bot update, Unraid badge may show "outdated" for ~30 seconds until next UI refresh
- User can manually click "Check for Updates" in Unraid Docker tab to force immediate recheck
- Next automatic Unraid check will rebuild status file correctly
## Container Access Requirements
### How n8n Accesses Unraid Host Filesystem
**Current n8n Container Configuration:**
The n8n container must be able to delete `/var/lib/docker/unraid-update-status.json` on the Unraid host.
**Access Pattern Options:**
| Method | Implementation | Pros | Cons |
|--------|----------------|------|------|
| **Volume Mount (RECOMMENDED)** | `-v /var/lib/docker:/host/var/lib/docker:rw` | Direct filesystem access, simple | Grants access to entire /var/lib/docker |
| **Docker API via exec** | `docker exec unraid-host rm -f /var/lib/docker/...` | No volume mount needed | Requires exec API (security risk) |
| **SSH into host** | Execute Command with SSH | No volume mount | Requires SSH credentials in workflow |
| **Unraid API (future)** | GraphQL mutation to clear status | Proper API layer | Requires Unraid 7.2+, API key setup |
**Recommendation:** Volume mount `/var/lib/docker` as read-write because:
1. n8n already accesses Docker via socket proxy (security boundary established)
2. Unraid status file is Docker-internal data (reasonable scope)
3. No additional credentials or services needed
4. Direct file operations are faster than API calls
5. Works on all Unraid versions (no version dependency)
**Security Consideration:**
- `/var/lib/docker` contains Docker data, not general host filesystem
- Socket proxy already limits Docker API access
- File deletion is least-privilege operation (no read of sensitive data)
- Alternative is exec API (worse security than filesystem mount)
### Volume Mount Configuration
**Add to n8n Container:**
```yaml
services:
n8n:
# ... existing config ...
volumes:
- /var/lib/docker:/host/var/lib/docker:rw # NEW
# ... existing volumes ...
```
**Execute Command Node:**
```bash
# Path accessible from inside n8n container
rm -f /host/var/lib/docker/unraid-update-status.json
```
**Why `/host/` prefix:**
- Inside container, `/var/lib/docker` is container's own filesystem
- Volume mount at `/host/var/lib/docker` is Unraid host's filesystem
- Prevents accidental deletion of n8n's own Docker files
## Component Modifications
### Modified Components
| Component | Type | Change | Impact |
|-----------|------|--------|--------|
| **n8n container** | Infrastructure | Add volume mount `/var/lib/docker:/host/var/lib/docker:rw` | MEDIUM - requires container recreation |
| **n8n-update.json** | Sub-workflow | Add 1 Execute Command node after "Remove Old Image (Success)" | LOW - workflow edit only |
| **Clear Unraid Status (NEW)** | Node | Execute Command: `rm -f /host/var/lib/docker/unraid-update-status.json` | NEW - single operation |
### Unchanged Components
- Main workflow (n8n-workflow.json): No changes
- Other 6 sub-workflows: No changes
- Socket proxy configuration: No changes
- Docker socket access pattern: No changes
- Telegram integration: No changes
## Build Order
Based on dependencies and risk:
### Phase 1: Infrastructure Setup
**Delivers:** n8n container has host filesystem access
**Tasks:**
1. Add volume mount to n8n container configuration
2. Recreate n8n container with new mount
3. Verify mount accessible: `docker exec n8n ls /host/var/lib/docker`
4. Test file deletion: `docker exec n8n rm -f /host/var/lib/docker/test-file`
**Rationale:** Infrastructure change first, before workflow modifications. Ensures mount works before relying on it.
**Risks:**
- Container recreation causes brief downtime (~10 seconds)
- Mount path typo breaks functionality
**Mitigation:**
- Schedule during low-traffic window
- Test mount manually before workflow change
- Document rollback: remove volume mount, recreate container
### Phase 2: Workflow Modification
**Delivers:** Update sub-workflow clears Unraid status
**Tasks:**
1. Read n8n-update.json via n8n API
2. Add Execute Command node after "Remove Old Image (Success)"
3. Configure command: `rm -f /host/var/lib/docker/unraid-update-status.json`
4. Set error handling: `continueRegularOutput`
5. Connect to "Return Success" node
6. Push updated workflow via n8n API
7. Test with single container update
**Rationale:** Modify workflow only after infrastructure proven working. Single node addition is minimal risk.
**Risks:**
- File path wrong (typo in command)
- Permissions issue (mount is read-only)
- Delete fails silently
**Mitigation:**
- Test command manually first (Phase 1 testing)
- Verify mount is `:rw` not `:ro`
- Check execution logs for errors
### Phase 3: Validation
**Delivers:** Confirmed end-to-end functionality
**Tasks:**
1. Update container via bot
2. Check Unraid UI - badge should still show "update available" (file deleted)
3. Click "Check for Updates" in Unraid Docker tab
4. Verify badge clears (Unraid rechecked and found container current)
5. Verify workflow execution logs show no errors
**Rationale:** Prove the integration works before considering it complete.
**Success Criteria:**
- Container updates successfully
- Status file deleted (verify via `ls /var/lib/docker/unraid-update-status.json` returns "not found")
- Unraid recheck clears badge
- No errors in n8n execution log
## Architectural Patterns
### Pattern 1: Post-Action Sync
**What:** Execute external system sync after primary operation completes successfully
**When to use:** When primary operation (update container) should trigger related state updates (clear Unraid cache)
**Trade-offs:**
- PRO: Keeps systems consistent
- PRO: User doesn't need manual sync step
- CON: Couples unrelated systems
- CON: Sync failure can confuse (update worked, but Unraid shows stale state)
**Example (this implementation):**
```
Update Container → Remove Old Image → Clear Unraid Status → Return Success
```
**Error Handling Strategy:** Sync failure does NOT fail the primary operation. Use `continueRegularOutput` to log error but continue to success return.
### Pattern 2: Filesystem Access from Containerized Workflow
**What:** Mount host filesystem into container to enable file operations from workflow
**When to use:** When workflow needs to manipulate host-specific files (e.g., clear cache, trigger recheck)
**Trade-offs:**
- PRO: Direct access, no additional services
- PRO: Works regardless of API availability
- CON: Breaks container isolation
- CON: Path changes between environments (host vs container)
**Example (this implementation):**
```yaml
# Host path -> Container path
-v /var/lib/docker:/host/var/lib/docker:rw
```
**Security Boundary:** Limit mount scope to minimum required directory. Here: `/var/lib/docker` (Docker-internal) not `/var/lib` (too broad).
### Pattern 3: Best-Effort External Sync
**What:** Attempt sync with external system but don't fail primary operation if sync fails
**When to use:** When sync is nice-to-have but not critical to primary operation success
**Trade-offs:**
- PRO: Primary operation reliability unaffected
- PRO: Degrades gracefully (manual sync still possible)
- CON: Silent failures can go unnoticed
- CON: Systems drift out of sync
**Example (this implementation):**
```javascript
// Execute Command node configuration
{
"onError": "continueRegularOutput", // Don't throw on sync failure
"command": "rm -f /host/var/lib/docker/unraid-update-status.json"
}
```
**Monitoring:** Log sync failures but return success. User can manually sync if needed (Unraid "Check for Updates").
## Anti-Patterns
### Anti-Pattern 1: Parsing Unraid Status File
**What people might do:** Read `/var/lib/docker/unraid-update-status.json`, parse JSON, update only the changed container's status, write back
**Why it's wrong:**
- File format is internal to Unraid's DockerClient.php implementation
- Could change between Unraid versions without notice
- Parsing JSON in Execute Command (bash) is fragile
- Risk of corrupting file if concurrent Unraid writes happen
**Do this instead:** Delete entire file to force full recheck. Simpler, more robust, version-agnostic.
### Anti-Pattern 2: Using Docker exec for File Deletion
**What people might do:** `docker exec unraid-host rm -f /var/lib/docker/unraid-update-status.json`
**Why it's wrong:**
- Requires EXEC API access on socket proxy (major security risk)
- `unraid-host` container doesn't exist (Unraid itself is host, not container)
- More complex than direct filesystem access
**Do this instead:** Volume mount for direct filesystem access (more secure than exec API).
### Anti-Pattern 3: Blocking Update on Sync Failure
**What people might do:** Fail entire update if Unraid status file deletion fails
**Why it's wrong:**
- Update already completed (container recreated with new image)
- Failing at this point leaves system in inconsistent state (container updated, user told it failed)
- User can manually sync (click "Check for Updates")
**Do this instead:** Log sync failure, return success, document manual sync option.
## Scaling Considerations
| Scale | Approach | Notes |
|-------|----------|-------|
| **1-10 containers** | Current approach works | File deletion is <1ms operation |
| **10-50 containers** | Current approach works | Unraid recheck time increases linearly but still <10s |
| **50+ containers** | Current approach works | Deleting status file forces full recheck (may take 30-60s) but acceptable as one-time cost |
**Optimization Not Needed:**
- File deletion is instant regardless of container count
- Unraid recheck is user-initiated (not blocking bot operation)
- No performance bottleneck identified
**Alternative for Many Containers (future):**
If Unraid provides GraphQL API to selectively clear single container status (not found in research), could optimize to:
- Clear only updated container's status
- Avoid forcing full recheck
- Requires Unraid 7.2+ and API discovery
## Integration Testing Strategy
### Test Cases
| Test | Expected Behavior | Verification |
|------|-------------------|--------------|
| **Update single container** | Container updates, status file deleted | Check file gone: `ls /var/lib/docker/unraid-update-status.json` |
| **Update container, sync fails** | Update succeeds, error logged | Check execution log for error, container still updated |
| **Batch update multiple containers** | Each update clears status file | File deleted after first update, remains deleted |
| **Update with no status file** | Update succeeds, no error | `rm -f` tolerates missing file |
| **Mount not accessible** | Update succeeds, sync error logged | Execution log shows file not found error |
### Rollback Plan
If integration causes issues:
1. **Quick rollback (workflow only):**
- Revert n8n-update.json to previous version via n8n API
- Status sync stops happening
- Core update functionality unaffected
2. **Full rollback (infrastructure):**
- Remove volume mount from n8n container config
- Recreate n8n container
- Revert workflow
- Manual Unraid sync only
**Rollback triggers:**
- Sync consistently fails (execution log errors)
- Permissions issues prevent file deletion
- Unraid behavior changes unexpectedly
## Sources
### Primary (HIGH confidence)
- [Unraid Docker Manager Source Code](https://github.com/limetech/dynamix/blob/master/plugins/dynamix.docker.manager/include/DockerClient.php) - Update status file location and format
- [Unraid Managing Containers](https://docs.unraid.net/unraid-os/using-unraid-to/run-docker-containers/managing-and-customizing-containers/) - Volume mount patterns
### Secondary (MEDIUM confidence)
- [Unraid Forum: Incorrect Update Notification](https://forums.unraid.net/bug-reports/stable-releases/regression-incorrect-docker-update-notification-r2807/) - Status file deletion workaround
- [Watchtower + Unraid Discussion](https://github.com/containrrr/watchtower/discussions/1389) - External update tracking issues
- [Docker Volume Mounting](https://docs.unraid.net/unraid-os/using-unraid-to/run-docker-containers/managing-and-customizing-containers/) - Best practices
### Tertiary (LOW confidence)
- n8n Execute Command documentation (operational patterns)
- Community reports of Unraid update badge behavior (anecdotal)
---
*Architecture research for: Unraid Update Status Sync Integration*
*Researched: 2026-02-08*