Initial commit: cleaned project structure

- Consolidated documentation from Ralph Loop iterations
- Archived 20+ outdated/superseded files to .archive/
- Kept essential docs: OIDC integration, mobile setup, quick start
- Added operational scripts for health monitoring and backup
- Research artifacts preserved in .tasks/artifacts/

Current state:
- 3 VPS sites (fry, proton, photon) ONLINE in Pangolin
- brn-home site pending for local services (Jellyfin, etc.)
- Mobile access configuration pending

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-21 06:15:04 +00:00
commit b428721b07
17 changed files with 5749 additions and 0 deletions

View File

@@ -0,0 +1,525 @@
# Architecture Validation: Authentik + Pangolin + Guacamole
**Validation Date:** 2026-01-20
**Purpose:** Review proposed SSO infrastructure architecture for multi-site deployment
---
## Executive Summary
**VERDICT:****APPROVED WITH CRITICAL MODIFICATIONS**
The proposed architecture (Authentik + Pangolin + Guacamole) is sound for your use case with **one critical exception**: the Guacamole/RDP integration has fundamental limitations that require architectural workarounds.
### Key Findings
| Component | Status | Confidence | Notes |
|-----------|--------|------------|-------|
| **Authentik** | ✅ RECOMMENDED | High | Best choice for self-hosted SSO in 2026 |
| **Pangolin** | ✅ RECOMMENDED | High | Superior to Cloudflare Tunnel for self-hosted |
| **Guacamole + OIDC** | ⚠️ APPROVED WITH CAVEATS | Medium | RDP NLA incompatibility requires workarounds |
---
## 1. Authentik Validation
### Research Findings
**Market Position (2026):**
- Authentik has emerged as the **leading modern SSO solution** for self-hosted environments
- Superior to Keycloak for small/medium deployments (lower complexity, better UX)
- Superior to Authelia (full IdP vs just forward auth)
- MIT licensed, active development, 19.6k GitHub stars
**Key Strengths:**
- **Modern architecture:** Written in Python (Django), not Java like Keycloak
- **Lower resource requirements:** Documented to run well with 2GB RAM total
- **Better UX:** Admin interface significantly easier than Keycloak
- **Full protocol support:** OIDC, OAuth2, SAML2, LDAP, RADIUS
- **Native MFA:** TOTP, WebAuthn, Duo, all built-in
- **Expression policies:** Powerful Python-based policy engine
**For Your Use Case:**
- ✅ Single-user deployment supported (minimal resource config documented)
- ✅ Service account support for API tokens (Jellyfin mobile apps)
- ✅ MFA enforcement per-application (can require for Guacamole only)
- ✅ Proven integration with Guacamole, Jellyfin SSO plugin, OpenWebUI
- ✅ Active documentation for Pangolin integration
**Alternatives Considered:**
- **Keycloak:** Overkill for single-user, 4GB+ RAM, steeper learning curve
- **Authelia:** Limited to forward auth, no full OIDC provider capabilities
- **Zitadel:** Newer, less proven integrations
**RECOMMENDATION:****Use Authentik as proposed**
---
## 2. Pangolin Validation
### Research Findings
**Market Position (2026):**
- Pangolin is the **leading self-hosted alternative to Cloudflare Tunnel**
- Open-source (fosrl/pangolin, 18.2k GitHub stars)
- Built on proven tech: WireGuard + Traefik reverse proxy
- Active community, recently featured in major tech channels (Christian Lempa, NetworkChuck)
**Key Strengths:**
- **Self-hosted control plane:** You own all infrastructure, no third-party dependencies
- **Identity-aware access control:** Native OIDC integration with Authentik
- **Dual mode:** Tunneled reverse proxy + VPN-style private resource access
- **No inbound ports required:** WireGuard outbound tunnels from private networks
- **Automatic SSL:** Let's Encrypt integration via Traefik
- **Mobile support:** Native apps + WireGuard config export
**Architecture Components:**
1. **Pangolin (Control Plane):** Dashboard, API, WebSocket server, auth system
2. **Gerbil (Tunnel Manager):** WireGuard interface management
3. **Newt (Edge Client):** Runs on private networks (brn, VPS hosts)
4. **Traefik (Reverse Proxy):** TLS termination, routing, load balancing
5. **Badger (Auth Middleware):** OIDC authentication enforcement
**For Your Use Case:**
-**Replaces WireGuard mesh:** Current 10.51.0.0/24 network becomes Pangolin sites
-**Centralized on brn:** Control plane on physically secure host
-**VPS integration:** Newt clients on fry, proton, photon for site-to-site routing
-**Mobile access:** Apps for pixel9pro, pixel6pro
-**Granular ACLs:** Per-service, per-user access control via Authentik
**Comparison to Alternatives:**
| Solution | Ownership | Cost | Mobile | OIDC | Complexity |
|----------|-----------|------|--------|------|------------|
| **Pangolin** | Self-hosted | Free | ✅ | ✅ | Medium |
| Cloudflare Tunnel | Cloudflare | Free | ⚠️ Limited | ✅ | Low |
| Tailscale | Tailscale | $5/user | ✅ | ⚠️ Enterprise | Low |
| Headscale | Self-hosted | Free | ✅ | ❌ | Medium |
**Critical Findings:**
-**OIDC redirect URI:** `https://tunnel.obr.sh/api/v1/auth/callback`
-**Required scopes:** openid, profile, email, groups
-**Site architecture:** Each location (brn LAN, fry, proton) becomes a "Site"
-**Resource types:** Public (HTTPS with domains) + Private (TCP/UDP for VPN access)
**RECOMMENDATION:****Use Pangolin as proposed**
---
## 3. Guacamole Validation
### Research Findings
**Market Position (2026):**
- Apache Guacamole remains the **leading open-source clientless RDP gateway**
- No viable open-source alternatives with equivalent feature set
- Active Apache project, version 1.6.0 current
**OIDC Support:**
- ✅ Native OIDC extension available
- ✅ Documented Authentik integration guide
- ✅ Works well for **authentication to Guacamole dashboard**
### ⚠️ CRITICAL LIMITATION DISCOVERED
**RDP NLA + OIDC Incompatibility:**
The research uncovered a **fundamental architectural limitation**:
**Problem:**
1. **RDP Network Level Authentication (NLA)** requires username/password for NTLM/Kerberos authentication
2. **OIDC authentication** never provides the user's password to Guacamole
3. Variables available: `${GUAC_USERNAME}` ✅, `${GUAC_PASSWORD}`
4. **Result:** Cannot use NLA with OIDC authentication
**Security Implications:**
- **NLA is recommended security best practice** for RDP (encrypts credentials before RDP connection)
- **Disabling NLA** exposes credentials during connection handshake
- **Windows 11** (argon) defaults to requiring NLA
**Workarounds Available:**
| Option | Security | User Experience | Implementation |
|--------|----------|-----------------|----------------|
| **1. Disable NLA** | ⚠️ Lower | Seamless SSO | Easy - disable in Guacamole connection config |
| **2. Prompt for credentials** | ✅ High | Double login | Medium - configure in Guacamole |
| **3. Service account** | ⚠️ Medium | Seamless SSO | Easy - hardcode credentials, lose audit trail |
| **4. Use CAS instead of OIDC** | ✅ High | Seamless SSO | Hard - requires ClearPass Receiver on Windows |
### Recommended Approach for Your Deployment
**Since this is single-user (you) accessing your own workstation (argon):**
**RECOMMENDED:** **Option 1 - Disable NLA**
**Rationale:**
- Low risk: You're the only user, accessing your own machine
- Network already secured: Guacamole only accessible via Pangolin tunnel + Authentik SSO + MFA
- User experience: Best (seamless SSO with TOTP)
- Defense in depth: Multiple layers (MFA on Authentik, network isolation via Pangolin)
**Implementation:**
```yaml
# In Guacamole connection config for argon-rdp:
security: rdp # Use standard RDP security instead of NLA
ignore-cert: true # Accept self-signed certs
```
**Additional Security Mitigations:**
1. ✅ Enforce MFA on Guacamole application in Authentik (TOTP required)
2. ✅ Restrict Guacamole to Pangolin tunnel only (no public WAN access)
3. ✅ Enable Guacamole session recording for audit trail
4. ✅ Configure Windows Firewall on argon to only allow RDP from brn (10.50.0.74)
**Alternative for Future Multi-User:**
If you later add users, switch to **Option 2 (prompt for credentials)** to maintain per-user accountability.
**RECOMMENDATION:****Use Guacamole with NLA disabled, compensated by MFA + Pangolin isolation**
---
## 4. Service Integration Validation
### Jellyfin SSO
**Status:****FULLY SUPPORTED**
**Plugin:** SSO-Auth plugin from Jellyfin catalog
**Key Findings:**
- ✅ Authentik integration well-documented
- ⚠️ **Critical:** Mobile apps (Android/iOS) have limited OIDC support
-**Solution:** Use "Quick Connect" feature for mobile (6-digit code pairing)
- ✅ Alternative: API tokens for dedicated devices
**Configuration:**
- Provider type: Generic OpenID
- Client auth: `client_secret_post` (NOT `client_secret_basic`)
- Claims: roles via `groups` claim
- Scopes: openid, profile, email, groups
**Mobile App Strategy:**
1. **Primary:** Quick Connect (user logs in via web SSO, enters code in app)
2. **Secondary:** API tokens per device (generated in Jellyfin dashboard)
---
### OpenWebUI SSO
**Status:****FULLY SUPPORTED**
**Native OIDC:** No plugin required
**Key Findings:**
- ✅ Robust OIDC implementation since v0.7.1+
-**Role-based admin designation** via `OAUTH_ADMIN_ROLES`
- ✅ JIT group provisioning with `ENABLE_OAUTH_GROUP_CREATION`
- ✅ Automatic role synchronization on every login
**Configuration Variables:**
```bash
OPENID_PROVIDER_URL=https://sso.obr.sh/application/o/openwebui/.well-known/openid-configuration
OAUTH_CLIENT_ID=<from_authentik>
OAUTH_CLIENT_SECRET=<from_authentik>
ENABLE_OAUTH_ROLE_MANAGEMENT=true
OAUTH_ROLES_CLAIM=groups
OAUTH_ADMIN_ROLES=openwebui-admins
```
**Redirect URI:** `https://ll.obr.sh/oauth/oidc/callback`
---
### Gitea SSO (fry + proton)
**Status:****FULLY SUPPORTED**
**Native OIDC:** Built-in authentication source
**Configuration:**
- Type: OAuth2
- Provider: OpenID Connect
- Auto Discovery URL: `https://sso.obr.sh/application/o/gitea/.well-known/openid-configuration`
- Admin role mapping: Via Authentik groups
**Note:** Gitea instances remain **publicly accessible** (federated nature), SSO is optional login method
---
### Transmission
**Status:** ⚠️ **NO SSO SUPPORT**
**Current:** HTTP Basic Authentication
**Recommendation:**
- Keep existing basic auth
- Protect behind Pangolin tunnel only (no public WAN access)
- Consider forward auth middleware via Traefik if SSO required
---
### Mastodon (bern.social)
**Status:****NO CHANGES NEEDED**
**Reason:** Public federated service, should remain publicly accessible
**Recommendation:** Do not integrate with SSO, keep existing authentication
---
## 5. Architectural Risks & Mitigations
### Risk Matrix
| Risk | Severity | Probability | Mitigation |
|------|----------|-------------|------------|
| **Authentik failure = total auth outage** | High | Low | Backup recovery codes, PostgreSQL backups, consider HA |
| **Pangolin control plane failure** | Medium | Low | Services still accessible via LAN, failover to WireGuard |
| **RDP NLA disabled security concern** | Medium | Medium | Compensate with MFA + network isolation |
| **Mobile app SSO limitations (Jellyfin)** | Low | High | Use Quick Connect, document for users |
| **DNS failure (sso.obr.sh unreachable)** | High | Low | Local /etc/hosts entries as backup |
### Single Points of Failure
**Authentik (sso.obr.sh):**
- **Impact:** All SSO authentication fails
- **Mitigation:**
- Regular PostgreSQL backups (`pg_dump`)
- Store recovery codes offline
- Document emergency admin access procedure
- Consider Docker volume backups
**Pangolin (tunnel.obr.sh):**
- **Impact:** Mobile/remote access fails, VPS sites unreachable
- **Mitigation:**
- Services still accessible from LAN (Traefik routes remain)
- Keep existing WireGuard as emergency fallback
- Document manual WireGuard reconnection procedure
**brn Host (10.50.0.74):**
- **Impact:** Total control plane failure (Authentik, Pangolin, Guacamole)
- **Mitigation:**
- Physical host security (already planned)
- UPS for power stability
- Backup restore procedure documented
- Consider VM snapshots before changes
### Backup Strategy
**Critical Data:**
1. **Authentik PostgreSQL database** - `pg_dump` daily, keep 7 days
2. **Authentik media files** - `/srv/docker/authentik/media/`
3. **Pangolin configuration** - `/srv/docker/pangolin/` database and config
4. **Guacamole PostgreSQL database** - connection definitions
5. **Traefik dynamic config** - `/srv/docker/traefik/traefik_dynamic.yaml`
**Backup Script:** See `/home/olaf/pangolin/.tasks/artifacts/backup-strategy.md` (TODO: create)
---
## 6. Alternative Architectures Considered
### Alternative A: Keycloak instead of Authentik
**Pros:**
- More mature (13 years vs 6 years)
- Enterprise-grade features
- Larger community (32k stars vs 19k)
**Cons:**
- Higher resource requirements (4GB+ RAM)
- Steeper learning curve
- Overkill for single-user deployment
- Java-based (vs Python for Authentik)
**Verdict:** ❌ Rejected - unnecessary complexity for use case
---
### Alternative B: Cloudflare Tunnel instead of Pangolin
**Pros:**
- Lower operational burden (managed service)
- Global edge network
- Built-in DDoS protection
- Simpler setup
**Cons:**
- Third-party dependency (Cloudflare controls routing)
- Limited customization
- No VPN-style private resource access
- Privacy concerns (traffic visibility)
**Verdict:** ❌ Rejected - plan specifies self-hosted control
---
### Alternative C: Tailscale instead of Pangolin
**Pros:**
- Easier setup
- Better mobile apps
- NAT traversal superior (DERP relays)
**Cons:**
- Pricing: $5/user/month after 3 devices
- Control plane dependency on Tailscale servers
- Limited reverse proxy features
- No identity-aware access control without ACL tags
**Verdict:** ❌ Rejected - cost and third-party dependency
---
### Alternative D: No RDP Gateway (Direct RDP)
**Pros:**
- Simpler architecture
- No NLA compatibility issues
**Cons:**
- Requires RDP client installation on devices
- No web-based access (can't use from Chromebook, iPad browser)
- No session recording capability
- Less secure (direct exposure vs gateway)
**Verdict:** ❌ Rejected - Guacamole provides superior UX and security
---
## 7. Final Recommendations
### ✅ APPROVED Architecture
**Core Components:**
1. **Authentik** at `sso.obr.sh` - SSO/IdP
2. **Pangolin** at `tunnel.obr.sh` - Tunneled reverse proxy
3. **Guacamole** at `remote.obr.sh` - RDP gateway (NLA disabled)
### 🔧 Required Modifications to Original Plan
1. **Guacamole RDP Connection:**
- Change from "NLA security" to "Standard RDP security"
- Enable session recording for audit trail
- Configure Windows Firewall on argon to only allow brn
2. **Authentik MFA Policy:**
- Create separate policy for Guacamole application (TOTP required)
- Optional for other services (Jellyfin, OpenWebUI) based on preference
3. **Jellyfin Mobile Strategy:**
- Document Quick Connect procedure for mobile apps
- Create API tokens for persistent devices (TV apps)
4. **Transmission:**
- Keep HTTP basic auth (no OIDC support)
- Access via Pangolin tunnel only
### 📋 Implementation Order Validation
The plan's phased approach is sound:
**Phase 1: Authentik**
- Foundation for all SSO
**Phase 2: Pangolin**
- Requires Authentik for OIDC
**Phase 3: Guacamole**
- Requires Authentik for OIDC
**Phase 4: Service Integration**
- Requires Authentik + Pangolin operational
**Phase 5: Traefik Restriction**
- Only after Pangolin sites verified working
**Phase 6: Mobile Setup**
- Final verification step
**Order is correct:** Sequential dependencies respected
### 🎯 Success Criteria
**Deployment successful when:**
1. ✅ Can login to Authentik admin via `sso.obr.sh`
2. ✅ Can login to Pangolin dashboard via `tunnel.obr.sh` (SSO redirect)
3. ✅ Can access Guacamole via `remote.obr.sh` (SSO + MFA)
4. ✅ Can connect to argon RDP via Guacamole web interface
5. ✅ Can access Jellyfin via Pangolin mobile app (with Quick Connect)
6. ✅ Can access OpenWebUI via Pangolin tunnel (SSO login)
7. ✅ Jellyfin/OpenWebUI/Transmission return 404 from public WAN
8. ✅ VPS hosts (fry, proton) show connected in Pangolin dashboard
### ⚠️ Rollback Plan
**Critical checkpoints:**
1. After TASK-005 (Authentik deploy): Services still work without SSO
2. After TASK-009 (Pangolin sites): Traefik routes still public
3. **After TASK-024 (Traefik restriction): CRITICAL CHECKPOINT**
**Rollback procedure:**
```bash
# Emergency: restore public access
sudo cp /home/olaf/pangolin/.tasks/artifacts/traefik_dynamic.yaml.backup \
/srv/docker/traefik/traefik_dynamic.yaml
docker exec traefik kill -SIGHUP 1 # Reload Traefik config
```
### 📊 Resource Requirements
**brn Host (10.50.0.74) Additional Load:**
- Authentik: +2GB RAM, +2 CPU cores
- Pangolin: +1GB RAM, +1 CPU core
- Guacamole: +1GB RAM, +1 CPU core
- **Total:** +4GB RAM, +4 CPU cores
**Current brn specs needed:** Minimum 8GB RAM, 4-6 CPU cores recommended
### 🔒 Security Posture
**Improvements:**
- ✅ Centralized authentication (single MFA enrollment)
- ✅ Granular per-service access control
- ✅ Session recording for RDP access
- ✅ Network segmentation via Pangolin tunnels
- ✅ Elimination of password sprawl
**Trade-offs:**
- ⚠️ RDP NLA disabled (compensated by MFA + network isolation)
- ⚠️ Single point of failure (brn host)
**Overall:****Net security improvement**
---
## 8. Conclusion
**FINAL VERDICT:****ARCHITECTURE APPROVED FOR IMPLEMENTATION**
**The proposed Authentik + Pangolin + Guacamole architecture is sound and recommended with the following conditions:**
1. Acknowledge RDP NLA limitation and implement compensating controls
2. Follow phased implementation order as specified
3. Create backup strategy before starting (TASK-027)
4. Test thoroughly at each phase before proceeding
5. Document emergency rollback procedures
**Confidence Level:** **85%**
**Remaining 15% risk factors:**
- Pangolin relatively new in production (1 year track record)
- Guacamole NLA workaround requires security discipline
- Single-user deployment lacks redundancy
**Recommendation to proceed:****YES**
**Next step:** Execute research-informed implementation starting with TASK-003 (Create Authentik Compose) using insights from RESEARCH-002 and TASK-001 outputs.
---
**Validation completed by:** Claude Code
**Date:** 2026-01-20
**Research artifacts referenced:** RESEARCH-001 through RESEARCH-005, TASK-001