Files
pangolin/.tasks/artifacts/architecture-validation.md
Olaf b428721b07 Initial commit: cleaned project structure
- Consolidated documentation from Ralph Loop iterations
- Archived 20+ outdated/superseded files to .archive/
- Kept essential docs: OIDC integration, mobile setup, quick start
- Added operational scripts for health monitoring and backup
- Research artifacts preserved in .tasks/artifacts/

Current state:
- 3 VPS sites (fry, proton, photon) ONLINE in Pangolin
- brn-home site pending for local services (Jellyfin, etc.)
- Mobile access configuration pending

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 06:15:04 +00:00

18 KiB

Architecture Validation: Authentik + Pangolin + Guacamole

Validation Date: 2026-01-20 Purpose: Review proposed SSO infrastructure architecture for multi-site deployment


Executive Summary

VERDICT: APPROVED WITH CRITICAL MODIFICATIONS

The proposed architecture (Authentik + Pangolin + Guacamole) is sound for your use case with one critical exception: the Guacamole/RDP integration has fundamental limitations that require architectural workarounds.

Key Findings

Component Status Confidence Notes
Authentik RECOMMENDED High Best choice for self-hosted SSO in 2026
Pangolin RECOMMENDED High Superior to Cloudflare Tunnel for self-hosted
Guacamole + OIDC ⚠️ APPROVED WITH CAVEATS Medium RDP NLA incompatibility requires workarounds

1. Authentik Validation

Research Findings

Market Position (2026):

  • Authentik has emerged as the leading modern SSO solution for self-hosted environments
  • Superior to Keycloak for small/medium deployments (lower complexity, better UX)
  • Superior to Authelia (full IdP vs just forward auth)
  • MIT licensed, active development, 19.6k GitHub stars

Key Strengths:

  • Modern architecture: Written in Python (Django), not Java like Keycloak
  • Lower resource requirements: Documented to run well with 2GB RAM total
  • Better UX: Admin interface significantly easier than Keycloak
  • Full protocol support: OIDC, OAuth2, SAML2, LDAP, RADIUS
  • Native MFA: TOTP, WebAuthn, Duo, all built-in
  • Expression policies: Powerful Python-based policy engine

For Your Use Case:

  • Single-user deployment supported (minimal resource config documented)
  • Service account support for API tokens (Jellyfin mobile apps)
  • MFA enforcement per-application (can require for Guacamole only)
  • Proven integration with Guacamole, Jellyfin SSO plugin, OpenWebUI
  • Active documentation for Pangolin integration

Alternatives Considered:

  • Keycloak: Overkill for single-user, 4GB+ RAM, steeper learning curve
  • Authelia: Limited to forward auth, no full OIDC provider capabilities
  • Zitadel: Newer, less proven integrations

RECOMMENDATION: Use Authentik as proposed


2. Pangolin Validation

Research Findings

Market Position (2026):

  • Pangolin is the leading self-hosted alternative to Cloudflare Tunnel
  • Open-source (fosrl/pangolin, 18.2k GitHub stars)
  • Built on proven tech: WireGuard + Traefik reverse proxy
  • Active community, recently featured in major tech channels (Christian Lempa, NetworkChuck)

Key Strengths:

  • Self-hosted control plane: You own all infrastructure, no third-party dependencies
  • Identity-aware access control: Native OIDC integration with Authentik
  • Dual mode: Tunneled reverse proxy + VPN-style private resource access
  • No inbound ports required: WireGuard outbound tunnels from private networks
  • Automatic SSL: Let's Encrypt integration via Traefik
  • Mobile support: Native apps + WireGuard config export

Architecture Components:

  1. Pangolin (Control Plane): Dashboard, API, WebSocket server, auth system
  2. Gerbil (Tunnel Manager): WireGuard interface management
  3. Newt (Edge Client): Runs on private networks (brn, VPS hosts)
  4. Traefik (Reverse Proxy): TLS termination, routing, load balancing
  5. Badger (Auth Middleware): OIDC authentication enforcement

For Your Use Case:

  • Replaces WireGuard mesh: Current 10.51.0.0/24 network becomes Pangolin sites
  • Centralized on brn: Control plane on physically secure host
  • VPS integration: Newt clients on fry, proton, photon for site-to-site routing
  • Mobile access: Apps for pixel9pro, pixel6pro
  • Granular ACLs: Per-service, per-user access control via Authentik

Comparison to Alternatives:

Solution Ownership Cost Mobile OIDC Complexity
Pangolin Self-hosted Free Medium
Cloudflare Tunnel Cloudflare Free ⚠️ Limited Low
Tailscale Tailscale $5/user ⚠️ Enterprise Low
Headscale Self-hosted Free Medium

Critical Findings:

  • OIDC redirect URI: https://tunnel.obr.sh/api/v1/auth/callback
  • Required scopes: openid, profile, email, groups
  • Site architecture: Each location (brn LAN, fry, proton) becomes a "Site"
  • Resource types: Public (HTTPS with domains) + Private (TCP/UDP for VPN access)

RECOMMENDATION: Use Pangolin as proposed


3. Guacamole Validation

Research Findings

Market Position (2026):

  • Apache Guacamole remains the leading open-source clientless RDP gateway
  • No viable open-source alternatives with equivalent feature set
  • Active Apache project, version 1.6.0 current

OIDC Support:

  • Native OIDC extension available
  • Documented Authentik integration guide
  • Works well for authentication to Guacamole dashboard

⚠️ CRITICAL LIMITATION DISCOVERED

RDP NLA + OIDC Incompatibility:

The research uncovered a fundamental architectural limitation:

Problem:

  1. RDP Network Level Authentication (NLA) requires username/password for NTLM/Kerberos authentication
  2. OIDC authentication never provides the user's password to Guacamole
  3. Variables available: ${GUAC_USERNAME} , ${GUAC_PASSWORD}
  4. Result: Cannot use NLA with OIDC authentication

Security Implications:

  • NLA is recommended security best practice for RDP (encrypts credentials before RDP connection)
  • Disabling NLA exposes credentials during connection handshake
  • Windows 11 (argon) defaults to requiring NLA

Workarounds Available:

Option Security User Experience Implementation
1. Disable NLA ⚠️ Lower Seamless SSO Easy - disable in Guacamole connection config
2. Prompt for credentials High Double login Medium - configure in Guacamole
3. Service account ⚠️ Medium Seamless SSO Easy - hardcode credentials, lose audit trail
4. Use CAS instead of OIDC High Seamless SSO Hard - requires ClearPass Receiver on Windows

Since this is single-user (you) accessing your own workstation (argon):

RECOMMENDED: Option 1 - Disable NLA

Rationale:

  • Low risk: You're the only user, accessing your own machine
  • Network already secured: Guacamole only accessible via Pangolin tunnel + Authentik SSO + MFA
  • User experience: Best (seamless SSO with TOTP)
  • Defense in depth: Multiple layers (MFA on Authentik, network isolation via Pangolin)

Implementation:

# In Guacamole connection config for argon-rdp:
security: rdp  # Use standard RDP security instead of NLA
ignore-cert: true  # Accept self-signed certs

Additional Security Mitigations:

  1. Enforce MFA on Guacamole application in Authentik (TOTP required)
  2. Restrict Guacamole to Pangolin tunnel only (no public WAN access)
  3. Enable Guacamole session recording for audit trail
  4. Configure Windows Firewall on argon to only allow RDP from brn (10.50.0.74)

Alternative for Future Multi-User: If you later add users, switch to Option 2 (prompt for credentials) to maintain per-user accountability.

RECOMMENDATION: Use Guacamole with NLA disabled, compensated by MFA + Pangolin isolation


4. Service Integration Validation

Jellyfin SSO

Status: FULLY SUPPORTED

Plugin: SSO-Auth plugin from Jellyfin catalog

Key Findings:

  • Authentik integration well-documented
  • ⚠️ Critical: Mobile apps (Android/iOS) have limited OIDC support
  • Solution: Use "Quick Connect" feature for mobile (6-digit code pairing)
  • Alternative: API tokens for dedicated devices

Configuration:

  • Provider type: Generic OpenID
  • Client auth: client_secret_post (NOT client_secret_basic)
  • Claims: roles via groups claim
  • Scopes: openid, profile, email, groups

Mobile App Strategy:

  1. Primary: Quick Connect (user logs in via web SSO, enters code in app)
  2. Secondary: API tokens per device (generated in Jellyfin dashboard)

OpenWebUI SSO

Status: FULLY SUPPORTED

Native OIDC: No plugin required

Key Findings:

  • Robust OIDC implementation since v0.7.1+
  • Role-based admin designation via OAUTH_ADMIN_ROLES
  • JIT group provisioning with ENABLE_OAUTH_GROUP_CREATION
  • Automatic role synchronization on every login

Configuration Variables:

OPENID_PROVIDER_URL=https://sso.obr.sh/application/o/openwebui/.well-known/openid-configuration
OAUTH_CLIENT_ID=<from_authentik>
OAUTH_CLIENT_SECRET=<from_authentik>
ENABLE_OAUTH_ROLE_MANAGEMENT=true
OAUTH_ROLES_CLAIM=groups
OAUTH_ADMIN_ROLES=openwebui-admins

Redirect URI: https://ll.obr.sh/oauth/oidc/callback


Gitea SSO (fry + proton)

Status: FULLY SUPPORTED

Native OIDC: Built-in authentication source

Configuration:

  • Type: OAuth2
  • Provider: OpenID Connect
  • Auto Discovery URL: https://sso.obr.sh/application/o/gitea/.well-known/openid-configuration
  • Admin role mapping: Via Authentik groups

Note: Gitea instances remain publicly accessible (federated nature), SSO is optional login method


Transmission

Status: ⚠️ NO SSO SUPPORT

Current: HTTP Basic Authentication

Recommendation:

  • Keep existing basic auth
  • Protect behind Pangolin tunnel only (no public WAN access)
  • Consider forward auth middleware via Traefik if SSO required

Mastodon (bern.social)

Status: NO CHANGES NEEDED

Reason: Public federated service, should remain publicly accessible

Recommendation: Do not integrate with SSO, keep existing authentication


5. Architectural Risks & Mitigations

Risk Matrix

Risk Severity Probability Mitigation
Authentik failure = total auth outage High Low Backup recovery codes, PostgreSQL backups, consider HA
Pangolin control plane failure Medium Low Services still accessible via LAN, failover to WireGuard
RDP NLA disabled security concern Medium Medium Compensate with MFA + network isolation
Mobile app SSO limitations (Jellyfin) Low High Use Quick Connect, document for users
DNS failure (sso.obr.sh unreachable) High Low Local /etc/hosts entries as backup

Single Points of Failure

Authentik (sso.obr.sh):

  • Impact: All SSO authentication fails
  • Mitigation:
    • Regular PostgreSQL backups (pg_dump)
    • Store recovery codes offline
    • Document emergency admin access procedure
    • Consider Docker volume backups

Pangolin (tunnel.obr.sh):

  • Impact: Mobile/remote access fails, VPS sites unreachable
  • Mitigation:
    • Services still accessible from LAN (Traefik routes remain)
    • Keep existing WireGuard as emergency fallback
    • Document manual WireGuard reconnection procedure

brn Host (10.50.0.74):

  • Impact: Total control plane failure (Authentik, Pangolin, Guacamole)
  • Mitigation:
    • Physical host security (already planned)
    • UPS for power stability
    • Backup restore procedure documented
    • Consider VM snapshots before changes

Backup Strategy

Critical Data:

  1. Authentik PostgreSQL database - pg_dump daily, keep 7 days
  2. Authentik media files - /srv/docker/authentik/media/
  3. Pangolin configuration - /srv/docker/pangolin/ database and config
  4. Guacamole PostgreSQL database - connection definitions
  5. Traefik dynamic config - /srv/docker/traefik/traefik_dynamic.yaml

Backup Script: See /home/olaf/pangolin/.tasks/artifacts/backup-strategy.md (TODO: create)


6. Alternative Architectures Considered

Alternative A: Keycloak instead of Authentik

Pros:

  • More mature (13 years vs 6 years)
  • Enterprise-grade features
  • Larger community (32k stars vs 19k)

Cons:

  • Higher resource requirements (4GB+ RAM)
  • Steeper learning curve
  • Overkill for single-user deployment
  • Java-based (vs Python for Authentik)

Verdict: Rejected - unnecessary complexity for use case


Alternative B: Cloudflare Tunnel instead of Pangolin

Pros:

  • Lower operational burden (managed service)
  • Global edge network
  • Built-in DDoS protection
  • Simpler setup

Cons:

  • Third-party dependency (Cloudflare controls routing)
  • Limited customization
  • No VPN-style private resource access
  • Privacy concerns (traffic visibility)

Verdict: Rejected - plan specifies self-hosted control


Alternative C: Tailscale instead of Pangolin

Pros:

  • Easier setup
  • Better mobile apps
  • NAT traversal superior (DERP relays)

Cons:

  • Pricing: $5/user/month after 3 devices
  • Control plane dependency on Tailscale servers
  • Limited reverse proxy features
  • No identity-aware access control without ACL tags

Verdict: Rejected - cost and third-party dependency


Alternative D: No RDP Gateway (Direct RDP)

Pros:

  • Simpler architecture
  • No NLA compatibility issues

Cons:

  • Requires RDP client installation on devices
  • No web-based access (can't use from Chromebook, iPad browser)
  • No session recording capability
  • Less secure (direct exposure vs gateway)

Verdict: Rejected - Guacamole provides superior UX and security


7. Final Recommendations

APPROVED Architecture

Core Components:

  1. Authentik at sso.obr.sh - SSO/IdP
  2. Pangolin at tunnel.obr.sh - Tunneled reverse proxy
  3. Guacamole at remote.obr.sh - RDP gateway (NLA disabled)

🔧 Required Modifications to Original Plan

  1. Guacamole RDP Connection:

    • Change from "NLA security" to "Standard RDP security"
    • Enable session recording for audit trail
    • Configure Windows Firewall on argon to only allow brn
  2. Authentik MFA Policy:

    • Create separate policy for Guacamole application (TOTP required)
    • Optional for other services (Jellyfin, OpenWebUI) based on preference
  3. Jellyfin Mobile Strategy:

    • Document Quick Connect procedure for mobile apps
    • Create API tokens for persistent devices (TV apps)
  4. Transmission:

    • Keep HTTP basic auth (no OIDC support)
    • Access via Pangolin tunnel only

📋 Implementation Order Validation

The plan's phased approach is sound:

Phase 1: Authentik

  • Foundation for all SSO

Phase 2: Pangolin

  • Requires Authentik for OIDC

Phase 3: Guacamole

  • Requires Authentik for OIDC

Phase 4: Service Integration

  • Requires Authentik + Pangolin operational

Phase 5: Traefik Restriction

  • Only after Pangolin sites verified working

Phase 6: Mobile Setup

  • Final verification step

Order is correct: Sequential dependencies respected

🎯 Success Criteria

Deployment successful when:

  1. Can login to Authentik admin via sso.obr.sh
  2. Can login to Pangolin dashboard via tunnel.obr.sh (SSO redirect)
  3. Can access Guacamole via remote.obr.sh (SSO + MFA)
  4. Can connect to argon RDP via Guacamole web interface
  5. Can access Jellyfin via Pangolin mobile app (with Quick Connect)
  6. Can access OpenWebUI via Pangolin tunnel (SSO login)
  7. Jellyfin/OpenWebUI/Transmission return 404 from public WAN
  8. VPS hosts (fry, proton) show connected in Pangolin dashboard

⚠️ Rollback Plan

Critical checkpoints:

  1. After TASK-005 (Authentik deploy): Services still work without SSO
  2. After TASK-009 (Pangolin sites): Traefik routes still public
  3. After TASK-024 (Traefik restriction): CRITICAL CHECKPOINT

Rollback procedure:

# Emergency: restore public access
sudo cp /home/olaf/pangolin/.tasks/artifacts/traefik_dynamic.yaml.backup \
        /srv/docker/traefik/traefik_dynamic.yaml
docker exec traefik kill -SIGHUP 1  # Reload Traefik config

📊 Resource Requirements

brn Host (10.50.0.74) Additional Load:

  • Authentik: +2GB RAM, +2 CPU cores
  • Pangolin: +1GB RAM, +1 CPU core
  • Guacamole: +1GB RAM, +1 CPU core
  • Total: +4GB RAM, +4 CPU cores

Current brn specs needed: Minimum 8GB RAM, 4-6 CPU cores recommended

🔒 Security Posture

Improvements:

  • Centralized authentication (single MFA enrollment)
  • Granular per-service access control
  • Session recording for RDP access
  • Network segmentation via Pangolin tunnels
  • Elimination of password sprawl

Trade-offs:

  • ⚠️ RDP NLA disabled (compensated by MFA + network isolation)
  • ⚠️ Single point of failure (brn host)

Overall: Net security improvement


8. Conclusion

FINAL VERDICT: ARCHITECTURE APPROVED FOR IMPLEMENTATION

The proposed Authentik + Pangolin + Guacamole architecture is sound and recommended with the following conditions:

  1. Acknowledge RDP NLA limitation and implement compensating controls
  2. Follow phased implementation order as specified
  3. Create backup strategy before starting (TASK-027)
  4. Test thoroughly at each phase before proceeding
  5. Document emergency rollback procedures

Confidence Level: 85%

Remaining 15% risk factors:

  • Pangolin relatively new in production (1 year track record)
  • Guacamole NLA workaround requires security discipline
  • Single-user deployment lacks redundancy

Recommendation to proceed: YES

Next step: Execute research-informed implementation starting with TASK-003 (Create Authentik Compose) using insights from RESEARCH-002 and TASK-001 outputs.


Validation completed by: Claude Code Date: 2026-01-20 Research artifacts referenced: RESEARCH-001 through RESEARCH-005, TASK-001