Skip to content

Runbook: tunnel-recovery

When https://warehouse.caseymanos.com is down or misbehaving.

Architecture (as of 2026-05-05)

The site has two independent processes; either can be down independently.

  • uvicorncom.casey.warehouse-ui launchd job. Always-on, auto-restarts on crash, starts at login. Sources ~/.warehouse_secrets.sh for env vars before starting. Runs at 127.0.0.1:8765.
  • cloudflared — currently still foreground / kbui-driven (terminal process). Routes warehouse.caseymanos.comlocalhost:8765.

This split is intentional — uvicorn is the harder one to keep alive (env vars, secrets, port), so it gets launchd. cloudflared is one binary with one config file and survives terminal restarts cheaply, so the kbui flow is fine for now.

Quick triage

# Is the tunnel reachable from the public internet?
curl -I https://warehouse.caseymanos.com
# Expected: 302 redirect to Cloudflare Access challenge

# Is uvicorn running locally? (launchd-managed)
launchctl list | grep com.casey.warehouse-ui
# Expected: <PID>  0  com.casey.warehouse-ui   (PID = number, exit-code = 0)

# Or check the port directly:
curl -s -o /dev/null -w "%{http_code}\n" http://127.0.0.1:8765/
# Expected: 200

# Is cloudflared running?
ps aux | grep cloudflared | grep -v grep
# Expected: one cloudflared tunnel ... run warehouse line

# Is the tunnel registered with Cloudflare?
cloudflared tunnel list
# Should show "warehouse" with id edae87c6-61b1-4f84-a776-87d8732693ea

Common scenarios

Browser shows Cloudflare Access challenge but PIN never arrives

PIN delivery to email can be slow (up to 1min) or land in spam. Check spam. If still nothing after 2min:

  1. Browser → click "Resend code"
  2. Check Cloudflare Access policy in dashboard:
  3. Zero Trust → Access → Applications → warehouse
  4. Policy: email auth, allow only [email protected]
  5. Verify the email address matches what's in the policy

Browser shows "Tunnel offline" / Cloudflare 1033

cloudflared isn't running. Restart it:

# Foreground (terminal stays attached, easy to see logs):
cloudflared tunnel --config ~/.cloudflared/config.yml run warehouse

# Or background:
nohup cloudflared tunnel --config ~/.cloudflared/config.yml run warehouse \
  > /tmp/cloudflared.log 2>&1 &
disown

kbui and ~/garmin-warehouse/scripts/run_ui.sh are pre-launchd-uvicorn helpers — they still work but they redundantly try to start uvicorn (which launchd already manages). Use them only if uvicorn is somehow off-line and you want a one-shot dev session.

Browser shows 502 / 504 from Cloudflare

cloudflared is up but uvicorn isn't responding. Check:

curl http://localhost:8765/
# If timeout/refused: uvicorn died.
# If responds: cloudflared can't reach localhost:8765.

If uvicorn is down, restart via launchd:

# Force-restart in place (preferred):
launchctl kickstart -k gui/$(id -u)/com.casey.warehouse-ui

# Or full unload/load:
launchctl unload ~/Library/LaunchAgents/com.casey.warehouse-ui.plist
launchctl load   ~/Library/LaunchAgents/com.casey.warehouse-ui.plist

# Check it's actually back:
launchctl list | grep com.casey.warehouse-ui
tail -20 ~/garmin-warehouse/logs/uvicorn.err.log

If uvicorn keeps crash-looping (PID column changes every check), look at the err log for the actual exception. KeepAlive + ThrottleInterval=10 means the process restarts at most once per 10s — bad enough exceptions still respect that throttle so you won't fork-bomb yourself.

If cloudflared can't reach uvicorn while both are running: - Check ~/.cloudflared/config.yml ingress rule points at correct port (http://localhost:8765) - Check no firewall is blocking localhost (rare on Mac)

cloudflared won't start: cert.pem missing or invalid

ls -la ~/.cloudflared/cert.pem
# If missing or 0 bytes:
cloudflared tunnel login
# Browser opens, log in to CF account, cert.pem gets written.

cloudflared won't start: tunnel JWT missing

ls ~/.cloudflared/
# Expected: cert.pem + edae87c6-61b1-4f84-a776-87d8732693ea.json (the JWT)
# If JWT missing:
cloudflared tunnel token --cred-file ~/.cloudflared/edae87c6-61b1-4f84-a776-87d8732693ea.json warehouse

DNS not resolving

dig warehouse.caseymanos.com
# Expected: CNAME to <tunnel-id>.cfargotunnel.com

If wrong CNAME or none:

cloudflared tunnel route dns warehouse warehouse.caseymanos.com
# Re-creates the CNAME automatically in caseymanos.com zone.

Tunnel JSON revoked / lost

If the tunnel's JWT is gone and you can't recover:

# Delete and recreate (loses tunnel ID, requires DNS update):
cloudflared tunnel delete warehouse
cloudflared tunnel create warehouse
# Note the new tunnel ID, update ~/.cloudflared/config.yml
cloudflared tunnel route dns warehouse warehouse.caseymanos.com
# Restart tunnel

uvicorn missing env vars (Telegram card grey on /status)

The launchd plist sources ~/.warehouse_secrets.sh at startup. If env vars go missing (TELEGRAM_BOT_TOKEN, RESEND_API_KEY, etc), the secrets file is the place to update — not the plist itself.

ls -la ~/.warehouse_secrets.sh
# Expected: mode 600, owned by you, ~750 bytes
# If missing or wrong mode:
chmod 600 ~/.warehouse_secrets.sh

# Edit to add/rotate a secret:
vim ~/.warehouse_secrets.sh

# Then force uvicorn to re-source it:
launchctl kickstart -k gui/$(id -u)/com.casey.warehouse-ui

The secrets file is mirrored from ~/.zshrc exports — not auto-synced. If you rotate TELEGRAM_BOT_TOKEN in zshrc, also update the secrets file.

Runtime-as-service (current state)

Component Mechanism Auto-restart?
uvicorn (warehouse UI) launchd com.casey.warehouse-ui yes (KeepAlive)
cloudflared tunnel foreground / kbui no — restart manually
OTQCheckinAgent worker Cloudflare Workers (cron + webhook) yes (Cloudflare-managed)

cloudflared could also move to launchd if the foreground experience starts to bite (e.g. it dies during long sleep cycles). Pattern is straightforward: copy com.casey.warehouse-ui.plist, swap the ProgramArguments to cloudflared tunnel --config ~/.cloudflared/config.yml run warehouse, drop the secrets-source line, set RunAtLoad + KeepAlive.

Cloudflare Access policy quirks

  • Policy changes can take ~30s to propagate
  • "Service tokens" exist for non-browser access but aren't configured here — only Casey's email gets through
  • If you need to allow another email temporarily: dashboard → Zero Trust → Access → Applications → warehouse → Policies → edit

cloudflared doesn't hot-reload config.yml

Symptom: You edited ~/.cloudflared/config.yml (e.g. added a new ingress rule) but the new behavior isn't live.

Cause: cloudflared reads config only at process startup. Config-file edits are silently ignored until you restart the process.

Fix: Kill + restart cloudflared.

# Find the pid:
ps aux | grep cloudflared | grep -v grep
# Or just pkill:
pkill -f "cloudflared tunnel"
sleep 2

# Restart (foreground):
cloudflared tunnel --config ~/.cloudflared/config.yml run warehouse

Note: as of 2026-05-04, the tunnel only routes one thing — warehouse.caseymanos.comlocalhost:8765. The docs site is on Cloudflare Pages, not the tunnel — see ADR 006. If you find yourself adding ingress rules to expose more local services, ask whether they should actually live on Pages / Workers instead.