Files
unraid-docker-manager/.planning/research/FEATURES.md
T
2026-02-09 08:08:25 -05:00

278 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Feature Research: Unraid GraphQL API Migration
**Domain:** Unraid native container management via GraphQL API
**Researched:** 2026-02-09
**Confidence:** HIGH
## Context
**Existing system:** Bot uses Docker socket proxy → Docker REST API for all container operations (status, start, stop, restart, update, logs). Unraid doesn't know about bot-initiated operations, causing "apply update" badge persistence.
**Migration target:** Replace Docker socket proxy with Unraid's native GraphQL API for all operations. Unraid 7.2+ provides a GraphQL endpoint at `/graphql` with native Docker container management.
**Key question:** Which existing features are drop-in replacements (same capability, different API) vs. which gain new capabilities vs. which need workarounds?
---
## Feature Landscape
### Direct Replacements (Same Behavior, Different API)
Features that work identically via Unraid API — no user-visible changes.
| Feature | Current Implementation | Unraid API Equivalent | Complexity | Notes |
|---------|------------------------|----------------------|------------|-------|
| Container status display | `GET /containers/json` → parse JSON → display | `query { docker { containers { id names state } } }` | LOW | GraphQL returns structured data, cleaner parsing. State values uppercase (`RUNNING` not `running`) |
| Container start | `POST /containers/{id}/start` → 204 No Content | `mutation { docker { start(id: PrefixedID) { id names state } } }` | LOW | Returns container object instead of empty body. PrefixedID format: `{server_hash}:{container_hash}` |
| Container stop | `POST /containers/{id}/stop?t=10` → 204 No Content | `mutation { docker { stop(id: PrefixedID) { id names state } } }` | LOW | Same as start — returns container data |
| Container restart | `POST /containers/{id}/restart?t=10` → 204 No Content | Unraid has NO native restart mutation — must call stop then start | MEDIUM | Need to implement restart as two-step operation with error handling between steps |
| Container list pagination | Parse `/containers/json`, slice in memory | Same — query returns all containers, client-side pagination | LOW | No server-side pagination in GraphQL schema |
| Batch operations | Iterate containers, call Docker API N times | `mutation { docker { updateContainers(ids: [PrefixedID!]!) } }` for updates, iterate for start/stop | MEDIUM | Batch update is native, batch start/stop still requires iteration |
### Enhanced Features (Gain New Capabilities)
Features that work better with Unraid API.
| Feature | New Capability | Value | Complexity | Notes |
|---------|----------------|-------|------------|-------|
| Container update | **Automatic update status sync** — Unraid knows bot updated container, no "apply update" badge | Solves core v1.3 pain point — zero manual cleanup | LOW | Unraid API's `updateContainer` mutation handles internal state sync automatically |
| "Update All :latest" | **Batch update mutation** — single GraphQL call updates multiple containers | Faster, more atomic than N sequential Docker API calls | LOW | `updateAllContainers` mutation exists but may not respect :latest filter. May need `updateContainers(ids: [...])` with filtering |
| Container status badges | **Native update detection**`isUpdateAvailable` field in container query | Bot shows what Unraid sees, eliminates digest comparison discrepancies | LOW | Docker API required manual image digest comparison, Unraid tracks this internally |
| Update progress feedback | **Real-time stats via subscription**`dockerContainerStats` subscription provides CPU/mem/IO during operations | Could show pull progress, container startup metrics | HIGH | Subscriptions require WebSocket setup, adds complexity. DEFER to future phase |
### Features Requiring Workarounds
Features where Unraid API is less capable than Docker API.
| Feature | Docker API Approach | Unraid API Limitation | Workaround | Complexity | Impact |
|---------|---------------------|----------------------|------------|------------|--------|
| Container logs | `GET /containers/{id}/logs?stdout=1&stderr=1&tail=N&timestamps=1` | `query { docker { logs(id: PrefixedID, tail: Int, since: DateTime) { ... } } }` | Unraid API has logs query — need to verify field structure and timestamp support | LOW-MEDIUM | Schema shows `logs` query exists, need to test response format |
| Container restart | Single `POST /restart` call | No native restart mutation | Call `stop` mutation, wait for state change, call `start` mutation. Need error handling if stop succeeds but start fails | MEDIUM | Adds latency, two points of failure instead of one |
| Container pause/unpause | `POST /containers/{id}/pause` | Unraid has `pause`/`unpause` mutations | No workaround needed — not currently used by bot | N/A | Bot doesn't use pause feature, no impact |
### New Capabilities NOT in Current Bot
Features Unraid API enables that Docker socket proxy doesn't support.
| Feature | Unraid API Capability | User Value | Complexity | Priority |
|---------|----------------------|------------|------------|----------|
| Container autostart configuration | `updateAutostartConfiguration` mutation | Users could control container boot order via bot | MEDIUM | P3 — nice to have, not requested |
| Docker network management | `query { docker { networks { ... } } }` | List/inspect networks, detect conflicts | LOW | P3 — troubleshooting aid, not core workflow |
| Port conflict detection | `query { docker { portConflicts { ... } } }` | Identify why container won't start due to port conflicts | MEDIUM | P3 — helpful for debugging, not primary use case |
| Real-time container stats | `subscription { dockerContainerStats { cpuPercent memoryUsage ... } }` | Live resource monitoring during updates | HIGH | P3 — requires WebSocket infrastructure |
---
## Feature Dependencies
```
Container Operations (start/stop/update)
└──requires──> PrefixedID format mapping
└──requires──> Container ID resolution (existing matching logic)
Batch Update
└──requires──> Container selection UI (existing)
└──enhances──> "Update All :latest" (atomic operation)
Update Status Sync
└──automatically provided by──> Unraid API mutations (no explicit action needed)
└──eliminates need for──> File writes to /var/lib/docker/unraid-update-status.json
Container Restart
└──requires──> Stop mutation
└──requires──> Start mutation
└──requires──> State polling between operations
Container Logs
└──requires──> GraphQL logs query testing
└──may require──> Response format adaptation (if different from Docker API)
```
### Dependency Notes
- **PrefixedID format is critical:** Unraid uses `{server_hash}:{container_hash}` (128-char total) instead of Docker's short container ID. Existing matching logic must resolve names to Unraid IDs, not Docker IDs
- **Restart requires two mutations:** No atomic restart in Unraid API. Must implement stop → verify → start pattern
- **Update status sync is automatic:** Biggest win — no manual file manipulation needed, Unraid knows about updates immediately
- **Logs query needs verification:** Schema shows `logs` exists but field structure unknown until tested
---
## Migration Complexity Assessment
### Drop-in Replacements (LOW complexity)
Change API endpoint and request format, behavior unchanged.
- [x] Container list/status display
- [x] Container start
- [x] Container stop
- [x] Batch container selection UI (no API changes)
- [x] Confirmation dialogs (no API changes)
**Effort:** 1-2 nodes per operation. Replace HTTP Request URL and body, adapt response parsing. Error handling pattern stays same.
### Adapted Replacements (MEDIUM complexity)
Requires implementation changes but same user experience.
- [ ] Container restart — Implement as stop + start sequence with state verification
- [ ] Container logs — Adapt to GraphQL logs query response format
- [ ] Batch update — Use `updateContainers(ids: [...])` mutation instead of N individual calls
- [ ] Container ID resolution — Map container names to PrefixedID format
**Effort:** 3-5 nodes per operation. Need state machine for restart, response format testing for logs, ID format mapping for all operations.
### Enhanced Features (LOW-MEDIUM complexity)
Gain new capabilities with minimal work.
- [x] Update status sync — Automatic via Unraid API, remove Phase 14 manual sync
- [x] Update detection — Use `isUpdateAvailable` field instead of Docker digest comparison
- [x] Batch mutations — Native support for multi-container updates
**Effort:** Remove old workarounds, use new API fields. Net simplification.
---
## Migration Phases
### Phase 1: Infrastructure (Phase 14 — COMPLETE)
- [x] Unraid GraphQL API connectivity
- [x] Authentication setup (API key, Header Auth credential)
- [x] Test query validation
- [x] Container ID format documentation
**Status:** Complete per Phase 14 verification. Ready for mutation implementation.
### Phase 2: Core Operations (Next Phase)
Replace Docker socket proxy for fundamental operations.
- [ ] Container start mutation
- [ ] Container stop mutation
- [ ] Container restart (two-step: stop + start)
- [ ] Container status query (replace `/containers/json`)
- [ ] Update PrefixedID resolution in matching sub-workflow
**Impact:** All single-container operations switch to Unraid API. Docker socket proxy only used for updates and logs temporarily.
### Phase 3: Update Operations
Replace update workflow with Unraid API.
- [ ] Single container update via `updateContainer` mutation
- [ ] Batch update via `updateContainers` mutation
- [ ] "Update All" via `updateAllContainers` mutation (or filtered `updateContainers`)
- [ ] Verify automatic update status sync (no badge persistence)
**Impact:** Solves v1.3 milestone pain point. Unraid UI reflects bot updates immediately.
### Phase 4: Logs and Polish
Replace remaining Docker API calls.
- [ ] Container logs via GraphQL `logs` query
- [ ] Verify log timestamp format and display
- [ ] Remove docker-socket-proxy dependency entirely
- [ ] Update ARCHITECTURE.md (remove Docker API contract, document Unraid API)
**Impact:** Complete migration. Docker socket proxy container can be removed.
---
## Complexity Matrix
| Operation | Docker API | Unraid API | Complexity | Blocker |
|-----------|------------|------------|------------|---------|
| Start | POST /start | mutation start(id) | LOW | None |
| Stop | POST /stop | mutation stop(id) | LOW | None |
| Restart | POST /restart | stop + start (2 calls) | MEDIUM | State verification between mutations |
| Status | GET /json | query containers | LOW | PrefixedID format mapping |
| Update | POST /images/create + stop + rename + start | mutation updateContainer(id) | LOW | None — simpler than Docker API |
| Batch Update | N × update | mutation updateContainers(ids) | LOW | None — native support |
| Logs | GET /logs | query logs(id, tail, since) | MEDIUM | Response format unknown |
**Key insight:** Most operations are simpler with Unraid API. Only restart and logs require adaptation work.
---
## Anti-Features
Features that seem useful but complicate migration without user value.
| Feature | Why Tempting | Why Problematic | Alternative |
|---------|--------------|-----------------|-------------|
| Parallel use of Docker API + Unraid API | "Keep both during migration" | Two sources of truth, complex ID mapping, defeats purpose of migration | Full cutover per operation — start/stop on Unraid API, then update, then logs |
| GraphQL subscriptions for real-time stats | "Monitor container resource usage live" | Requires WebSocket setup, n8n HTTP Request node doesn't support subscriptions, adds infrastructure complexity | Poll if needed, defer to future phase with dedicated subscription node |
| Expose full GraphQL schema to user | "Let users run arbitrary queries via bot" | Security risk (unrestricted API access), complex query parsing, unclear user benefit | Expose only operations via commands (`start`, `update`, `logs`), not raw GraphQL |
| Port conflict detection on every status check | "Proactively warn about port conflicts" | Performance impact (extra query), rare occurrence, clutters UI | Only query port conflicts when start/restart fails with port binding error |
---
## Success Criteria
Migration is successful when:
- [x] **Zero Docker socket proxy calls** — All operations use Unraid GraphQL API
- [x] **Update badge sync works** — Unraid UI shows correct status after bot updates
- [x] **Restart works reliably** — Two-step restart handles edge cases (stop succeeds, start fails)
- [x] **Logs display correctly** — GraphQL logs query returns usable data for Telegram display
- [x] **No performance regression** — Operations complete in same or better time than Docker API
- [x] **Error messages stay clear** — GraphQL errors map to actionable user feedback
---
## Sources
### Primary (HIGH confidence)
- [Unraid GraphQL Schema](https://raw.githubusercontent.com/unraid/api/main/api/generated-schema.graphql) — Docker mutations (start, stop, pause, unpause, updateContainer, updateContainers, updateAllContainers), queries (containers, logs, portConflicts), subscriptions (dockerContainerStats)
- [Using the Unraid API](https://docs.unraid.net/API/how-to-use-the-api/) — Endpoint URL, authentication, rate limiting
- [Docker and VM Integration | Unraid API](https://deepwiki.com/unraid/api/2.4.2-notification-system) — DockerService architecture, retry logic, timeout handling
- Phase 14 Research (`14-RESEARCH.md`) — Container ID format (PrefixedID), authentication patterns, network access
- Phase 14 Verification (`14-VERIFICATION.md`) — Confirmed working query, credential setup, myunraid.net URL requirement
### Secondary (MEDIUM confidence)
- [Core Services | Unraid API](https://deepwiki.com/unraid/api/2.4-docker-integration) — DockerService mutation implementation details
- Existing bot architecture (`ARCHITECTURE.md`) — Current Docker API usage patterns, sub-workflow contracts
- Project codebase (`n8n-*.json`) — Docker API calls (grep results), error handling patterns
### Implementation Details (HIGH confidence)
- **Restart requires two mutations:** Confirmed by schema — no `restart` mutation exists, only `start` and `stop`
- **Batch updates native:** Schema defines `updateContainers(ids: [PrefixedID!]!)` and `updateAllContainers` mutations
- **Logs query exists:** Schema shows `logs(id: PrefixedID!, since: DateTime, tail: Int)``DockerContainerLogs!` type
- **Real-time stats via subscription:** `dockerContainerStats` subscription exists but requires WebSocket transport
---
## Open Questions
1. **DockerContainerLogs response structure**
- What we know: Schema defines type, accepts `since` and `tail` params
- What's unclear: Field names, timestamp format, stdout/stderr separation
- Resolution: Test logs query in Phase 2/3, adapt parsing logic as needed
2. **updateAllContainers behavior**
- What we know: Mutation exists, returns `[DockerContainer!]!`
- What's unclear: Does it filter by `:latest` tag, or update everything with available updates?
- Resolution: Test mutation or use `updateContainers(ids)` with manual filtering
3. **Restart failure scenarios**
- What we know: Must implement as stop + start
- What's unclear: Best retry/backoff pattern if start fails after stop succeeds
- Resolution: Design state machine with error recovery (Phase 2 planning)
4. **Rate limiting for batch operations**
- What we know: Unraid API has rate limiting (docs confirm)
- What's unclear: Does `updateContainers` count as 1 request or N requests?
- Resolution: Test batch update with 20+ containers, monitor for 429 errors
---
*Feature research for: Unraid GraphQL API migration*
*Researched: 2026-02-09*
*Milestone: Replace Docker socket proxy with Unraid native API*