What We’re Testing
The local DNS resolver (pkg/dns/resolver.go) has a layered resolution strategy: tailnet queries are answered locally from the in-memory records map, and all other queries are forwarded upstream. When upstream DNS-over-TLS fails, the resolver’s behavior depends on the AllowPlaintextFallback flag. We test the resolver’s resilience: what happens when the local resolver cannot reach Quad9, when the resolver itself is not running, and when tailnet records are stale.
Key facts from source code:
- Resolution order in
handleQuery(resolver.golines 320-357):- Check DNS threat blocklist — if domain is blocked, return NXDOMAIN immediately
- Check if tailnet query (
isTailnetQuery) — resolve from local records map - Forward to upstream DNS (
forwardToUpstream)
- Upstream forwarding (
resolver.golines 420-434):- First: DNS-over-TLS to Quad9 (
9.9.9.9:853,149.112.112.112:853) with 3-second timeout per server - If DoT fails and
AllowPlaintextFallbackistrue: plaintext UDP to system DNS servers - If DoT fails and
AllowPlaintextFallbackisfalse(default): returns nil, resulting in no response to the client
- First: DNS-over-TLS to Quad9 (
- Port fallback: If port 53 is unavailable, the resolver binds to port 15353 (
resolver.golines 176-189) - Tailnet NXDOMAIN: If a tailnet hostname is not found in local records, the resolver returns NXDOMAIN (not forwarded upstream) (
resolver.golines 391-393) - System DNS detection:
getSystemDNS()reads platform-specific resolvers; defaults to["9.9.9.9:53", "149.112.112.112:53"]if none found (resolver.golines 602-620)
Your Test Setup
| Machine | Role |
|---|---|
| ⊞ Win-A | Peer machine — used as a resolution target |
| 🐧 Linux-C | Test machine — runs DNS queries, simulates failures |
Both machines must be connected (ztna up) with MagicDNS enabled.
ST1 — Tailnet Resolution Does Not Depend on Upstream DNS
What it verifies: Tailnet hostname resolution uses only the local records map and does not forward to upstream DNS servers. Even if upstream DNS is unreachable, tailnet names resolve.
Steps:
- On 🐧 Linux-C , confirm the local resolver is running and can resolve a tailnet name:
nslookup Win-A 127.0.0.53
Expected: Resolves to Win-A’s tailnet IP (e.g., 100.64.0.1).
- Simulate upstream DNS failure by temporarily blocking outbound port 853 (DNS-over-TLS) and port 53:
sudo iptables -A OUTPUT -p tcp --dport 853 -j DROP
sudo iptables -A OUTPUT -p udp --dport 53 -j DROP
- Retry the tailnet query:
nslookup Win-A 127.0.0.53
Expected: Still resolves to Win-A’s tailnet IP. The resolver answers tailnet queries from its in-memory map without contacting any upstream server.
- Verify that public DNS is indeed broken:
nslookup google.com 127.0.0.53
Expected: Times out or returns SERVFAIL (upstream unreachable).
- Restore network:
sudo iptables -D OUTPUT -p tcp --dport 853 -j DROP
sudo iptables -D OUTPUT -p udp --dport 53 -j DROP
Pass: Tailnet names resolve even when upstream DNS is completely blocked. Public DNS queries fail as expected.
Fail / Common issues:
- Tailnet name also fails — the local resolver may not be running. Check
ztna statusand verify port 53 or 15353 is bound. - The
iptablesrules block ALL DNS. Use this test carefully on a machine you can access via tailnet IP (not hostname).
ST2 — DNS-over-TLS Failover Between Quad9 Servers
What it verifies: The resolver tries both Quad9 servers (9.9.9.9:853 and 149.112.112.112:853) before giving up.
Steps:
- On 🐧 Linux-C , block only the primary Quad9 server:
sudo iptables -A OUTPUT -d 9.9.9.9 -p tcp --dport 853 -j DROP
- Query a public domain:
nslookup example.com 127.0.0.53
Expected: Resolves successfully. The resolver’s forwardDoT function (resolver.go lines 437-478) iterates over dotServers — when 9.9.9.9:853 fails (connection timeout after 3 seconds), it tries 149.112.112.112:853 which should succeed.
- Block the secondary server too:
sudo iptables -A OUTPUT -d 149.112.112.112 -p tcp --dport 853 -j DROP
- Query again:
nslookup example.com 127.0.0.53
Expected: Fails (timeout or SERVFAIL). Both DoT servers are unreachable. Since AllowPlaintextFallback defaults to false, the resolver does not fall back to plaintext UDP. The log message "DNS-over-TLS failed, plaintext fallback disabled -- returning SERVFAIL" would appear in the client logs.
- Restore:
sudo iptables -D OUTPUT -d 9.9.9.9 -p tcp --dport 853 -j DROP
sudo iptables -D OUTPUT -d 149.112.112.112 -p tcp --dport 853 -j DROP
Pass: With one Quad9 server blocked, DNS still works (failover to the second). With both blocked, DNS fails because plaintext fallback is disabled by default.
Fail / Common issues:
- Resolution succeeds even with both servers blocked —
AllowPlaintextFallbackmay be set totruein the client config, allowing fallback to plaintext UDP on system DNS servers.
ST3 — CLI DNS Query Bypasses Local Resolver
What it verifies: ztna dns query does NOT use the local DNS resolver. It calls the backend’s resolve action over HTTPS, so it works even when the local resolver is down.
Steps:
- On 🐧 Linux-C , stop the VPN tunnel (this stops the local resolver):
ztna down
- Try a system-level DNS query (should fail for tailnet names):
nslookup Win-A 127.0.0.53
Expected: Connection refused or timeout (the local resolver is no longer running).
- Try the CLI query (uses HTTPS to the backend):
ztna dns query Win-A
Expected output:
Win-A.yourorg.zt.net -> 100.64.0.1
The CLI sends an authenticated HTTPS POST to /api/dns-management with action: "resolve". This does not require the local DNS resolver to be running. It only requires the machine to be authenticated (has a valid JWT or saved tokens).
Pass: ztna dns query resolves the hostname even when the local resolver is stopped. nslookup fails because the local resolver is not running.
Fail / Common issues:
not authenticated. Run 'ztna login' first—ztna downdoes not log out. But if the config has noorg_id, the CLI query will fail. The check is incmd_dns.goline 63.- Network error — the machine needs internet access to reach
login.quickztna.com. The VPN tunnel being down does not affect HTTPS connectivity.
- Reconnect:
ztna up
ST4 — Port 53 Conflict Fallback to 15353
What it verifies: When port 53 is already occupied (common on Linux with systemd-resolved), the resolver falls back to port 15353.
Steps:
- On 🐧 Linux-C , check what is using port 53:
sudo ss -tlnp | grep :53
Common output on Ubuntu/Debian:
LISTEN 0 4096 127.0.0.53%lo:53 0.0.0.0:* users:(("systemd-resolve",pid=XXX,fd=14))
- If systemd-resolved occupies port 53, the QuickZTNA resolver will have started on port 15353 instead. Check which port the resolver is using by examining the client logs or trying both ports:
nslookup Win-A -port=53 127.0.0.53
nslookup Win-A -port=15353 127.0.0.53
One of these should succeed (whichever port the resolver is bound to).
- The resolver logs
"Cannot bind to port 53, trying high port"when the fallback occurs and"DNS resolver started"with the actual listen address.
Pass: The resolver gracefully falls back to port 15353 when port 53 is unavailable. DNS queries work on the fallback port.
Fail / Common issues:
- Neither port works — the resolver may have failed entirely. Check that
ztna upis running and that the client config hasdns_enabled: true. - The DNS manager (
manager_linux.go) usesresolvectlto configure systemd-resolved. If it points to port 53 but the resolver is on 15353, system-level resolution will fail. This is a known edge case where manualresolvectlconfiguration may be needed.
ST5 — Stale Local Records After Peer Disconnect
What it verifies: When a peer disconnects, the local resolver’s records may become stale until the next peer list update. The backend always has current data.
Steps:
- On 🐧 Linux-C , confirm ⊞ Win-A resolves locally:
nslookup Win-A 127.0.0.53
Expected: Resolves to Win-A’s tailnet IP.
- On ⊞ Win-A , disconnect:
ztna down
- Immediately on 🐧 Linux-C , query the local resolver again:
nslookup Win-A 127.0.0.53
Expected: May still resolve (stale record). The local resolver keeps records in memory until UpdateRecords() is called with a new peer list. The records map is only replaced when the peer manager pushes an update.
- Compare with the CLI query (goes to backend):
ztna dns query Win-A
Expected: Still resolves. The backend includes machines with status IN ('online', 'offline'), so Win-A appears even after disconnecting.
- Wait for the next heartbeat cycle (typically 30-60 seconds). Then retry the local resolver query:
nslookup Win-A 127.0.0.53
Expected: After the peer list update, if Win-A is removed from the active peer list, the local resolver returns NXDOMAIN. If Win-A remains in the peer list (offline peers are sometimes retained), it continues to resolve.
Pass: The local resolver may serve stale records briefly after a peer disconnects. The backend always returns current data. This demonstrates the difference between the two resolution paths.
Cleanup: Reconnect ⊞ Win-A :
ztna up
Summary
| Sub-test | What it proves | Pass condition |
|---|---|---|
| ST1 | Tailnet resolution is local-only | Tailnet names resolve even with upstream DNS blocked |
| ST2 | DoT failover between Quad9 servers | Blocking one Quad9 IP still allows resolution via the other |
| ST3 | CLI query bypasses local resolver | ztna dns query works even when ztna down stops the resolver |
| ST4 | Port 53 conflict fallback | Resolver falls back to port 15353 when 53 is occupied |
| ST5 | Stale local records | Local resolver may serve stale data briefly; backend is always current |