What We’re Testing
A machine can be detected as offline through three distinct paths:
Path 1 — Graceful shutdown (ztna down)
The client sends a final heartbeat with status: "offline" before stopping. machine-heartbeat.ts processes this normally, sets status = 'offline' and last_seen = NOW(), and broadcasts a WebSocket UPDATE event because statusChanged = true. Status transitions to offline within seconds.
Path 2 — Cleanup cron job (cleanup-machines.ts)
The handleCleanupMachines handler runs on a schedule and executes two SQL operations:
-- Delete ephemeral machines not seen for 3 minutes
DELETE FROM machines
WHERE ephemeral = TRUE AND last_seen < NOW() - INTERVAL '3 minutes'
RETURNING id, name, org_id
-- Mark non-ephemeral machines offline
UPDATE machines SET status = 'offline'
WHERE status = 'online' AND last_seen < NOW() - INTERVAL '3 minutes'
RETURNING id, name, org_id
For each machine marked offline or deleted, it calls broadcastEvent to push UPDATE or DELETE WebSocket events.
Path 3 — Hard kill (no graceful shutdown)
Same outcome as Path 2 but triggered by a process crash or SIGKILL. The machine continues to appear online in the DB until the cleanup job runs (up to ~3 minutes).
Edge cases that block offline detection:
- Quarantined machines:
machine-heartbeat.tsdetectsstatus = 'quarantined'and updateslast_seenwithout changing the status. Quarantine is only lifted by posture compliance resolution. The cleanup job’s SQLWHERE status = 'online'does not touch quarantined machines. - Admin-disabled machines: When
admin_disabled = TRUE, the heartbeat handler updateslast_seenbut returnsstatus: "offline"to the client. The DBstatusremains whatever it was. The cleanup job would mark such a machine offline if it wasonlinein the DB when the heartbeat stopped.
The data source for the Health page is:
GET /api/db/machines?org_id=<org_id>&select=id,name,tailnet_ip,os,status,last_seen,created_at,version
The Health page re-fetches this list whenever a WebSocket event arrives on the machines channel, so all three detection paths are reflected in real time.
Your Test Setup
| Machine | Role |
|---|---|
| ⊞ Win-A | Browser — Health page observation + API queries |
| 🐧 Linux-C | VPN target — controlled shutdown and kill for offline detection tests |
ST1 — Graceful Offline via ztna down
What it verifies: ztna down sends an offline heartbeat that immediately sets status = "offline" in the DB and pushes a WebSocket UPDATE event.
Steps:
-
On ⊞ Win-A , open
/health. Confirm 🐧 Linux-C showsonline(green dot). -
On 🐧 Linux-C , stop the VPN:
ztna down
Expected CLI output:
VPN stopped.
-
Within 5-10 seconds, observe the Health page on ⊞ Win-A . Linux-C’s row should update to
offline(grey dot, grey badge) without a manual page refresh. -
Confirm via API:
TOKEN="YOUR_ADMIN_TOKEN"
ORG_ID="YOUR_ORG_ID"
curl -s "https://login.quickztna.com/api/db/machines?org_id=$ORG_ID&select=name,status,last_seen" \
-H "Authorization: Bearer $TOKEN" | python3 -m json.tool
Expected:
{
"success": true,
"data": [
{
"name": "Linux-C",
"status": "offline",
"last_seen": "2026-03-17T10:35:12.456Z"
}
]
}
- Note that
last_seenreflects the time of the final offline heartbeat — it is set toNOW()inmachine-heartbeat.tseven for offline-status heartbeats.
Pass: Status transitions to offline within seconds of ztna down. Health page updates without manual refresh. last_seen is current.
Fail / Common issues:
- Status stays
onlinefor more than 30 seconds afterztna down— the offline heartbeat may have failed (e.g. network already down whenztna downran). The cleanup job will eventually mark it offline at the 3-minute mark. VPN not runningprinted byztna down— the client was already stopped. Status should already beofflineor will be set offline by the cleanup job.
ST2 — Hard Kill Offline Detection via Cleanup Job
What it verifies: When a machine is killed without a graceful shutdown, the cleanup cron job marks it offline after 3 minutes of missed heartbeats.
Steps:
-
On ⊞ Win-A , open
/health. Note the currentlast_seenfor Linux-C. -
On 🐧 Linux-C , force-kill the ztna process:
sudo pkill -9 ztna
No output is expected (hard kill).
-
On ⊞ Win-A , observe the Health page. Linux-C will initially still show
online. You may see it transition toStaleafter 10 minutes (yellow dot, Stale badge, 50% availability) before the cleanup job fires. -
Wait approximately 3 minutes. The cleanup job runs and executes:
UPDATE machines SET status = 'offline'
WHERE status = 'online' AND last_seen < NOW() - INTERVAL '3 minutes'
-
After the job runs, observe the Health page — Linux-C should transition to
offline. -
Verify timing by checking
last_seenrelative to current time:
curl -s "https://login.quickztna.com/api/db/machines?org_id=$ORG_ID&select=name,status,last_seen" \
-H "Authorization: Bearer $TOKEN" | python3 -m json.tool
The gap between last_seen and current UTC time should be at least 3 minutes when the status changes to offline.
Pass: Hard-killed machine transitions to offline after ~3 minutes. Health page updates via WebSocket UPDATE event from the cleanup job.
Fail / Common issues:
- Machine transitions to
offlinein under 1 minute — the cleanup job may have run more frequently than expected. Checkcron.tsfor the job interval. - Machine stays
onlinefor more than 10 minutes after kill — the cleanup job may not be running. Check backend logs:ssh root@172.99.189.211 "docker logs quickztna-api-1 --tail 50"and look for cron job execution entries.
ST3 — Ephemeral Machine Deletion
What it verifies: Machines registered with ephemeral = TRUE are deleted (not just marked offline) by the cleanup job after 3 minutes of no heartbeat. The WebSocket event type is DELETE, not UPDATE.
Steps:
- Register an ephemeral machine via auth key. First, create an ephemeral auth key:
curl -s -X POST https://login.quickztna.com/api/db/auth_keys \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d "{
\"org_id\": \"$ORG_ID\",
\"ephemeral\": true,
\"description\": \"Ephemeral test key\"
}" | python3 -m json.tool
Note the returned key value (format: tskey-auth-xxx).
- On 🐧 Linux-C , register and connect with the ephemeral key:
ztna up --auth-key=tskey-auth-xxx
-
On ⊞ Win-A , confirm the ephemeral machine appears on the Health page with
onlinestatus. -
Hard-kill the ztna process on 🐧 Linux-C :
sudo pkill -9 ztna
- Wait approximately 3 minutes. The cleanup job runs:
DELETE FROM machines
WHERE ephemeral = TRUE AND last_seen < NOW() - INTERVAL '3 minutes'
RETURNING id, name, org_id
-
On ⊞ Win-A , the ephemeral machine row should disappear entirely from the Health page (not just go grey). The WebSocket delivers a
DELETEevent which triggers the re-fetch, and the machine is no longer in the DB. -
Verify the machine is gone:
curl -s "https://login.quickztna.com/api/db/machines?org_id=$ORG_ID&select=name,status,last_seen" \
-H "Authorization: Bearer $TOKEN" | python3 -m json.tool
The ephemeral machine should not appear in the response.
Pass: Ephemeral machine row disappears from the Health page after ~3 minutes. No lingering offline entry remains in the DB.
Fail / Common issues:
- Ephemeral machine stays as
offlineentry — verify the machine was registered withephemeral = TRUE. The cleanup SQL targetsWHERE ephemeral = TRUEspecifically; non-ephemeral machines are only marked offline, never deleted by cleanup. tskey-auth-xxxkey has noephemeralflag set — check the auth key record:GET /api/db/auth_keys?org_id=$ORG_ID. Ephemeral machines require the auth key itself to haveephemeral = true.
ST4 — Quarantined Machine Does Not Go Offline
What it verifies: A quarantined machine keeps sending heartbeats (updating last_seen) but the cleanup job does not mark it offline because its status is not online.
Steps:
- If you have a machine that can be quarantined (requires a posture policy violation), put it into quarantine via the admin API:
curl -s -X POST https://login.quickztna.com/api/machine-admin \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d "{
\"action\": \"quarantine\",
\"machine_id\": \"LINUX_C_MACHINE_ID\",
\"org_id\": \"$ORG_ID\"
}" | python3 -m json.tool
-
On 🐧 Linux-C , ensure the VPN is still running. The machine will continue to send heartbeats. Each heartbeat to
machine-heartbeat.tsdetectsstatus = 'quarantined'and:- Updates
last_seen = NOW() - Returns
{ status: 'quarantined', quarantined: true }to the client - Does NOT change the status
- Updates
-
Wait 10 minutes. Verify via API that the machine is still in
quarantinedstatus (notoffline):
curl -s "https://login.quickztna.com/api/db/machines?org_id=$ORG_ID&select=name,status,last_seen" \
-H "Authorization: Bearer $TOKEN" | python3 -m json.tool
Expected:
{
"data": [
{
"name": "Linux-C",
"status": "quarantined",
"last_seen": "2026-03-17T10:45:00.000Z"
}
]
}
-
The cleanup job SQL is
WHERE status = 'online', so quarantined machines are safe. Confirm by checking the Health page — the machine should showquarantinedbadge (rendered as a secondary badge by the component due to thecapitalizeclass) with a freshlast_seen. -
Unquarantine to restore normal operation:
curl -s -X POST https://login.quickztna.com/api/machine-admin \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d "{
\"action\": \"unquarantine\",
\"machine_id\": \"LINUX_C_MACHINE_ID\",
\"org_id\": \"$ORG_ID\"
}" | python3 -m json.tool
Pass: Quarantined machine keeps last_seen current via heartbeats but is never marked offline by the cleanup job.
Fail / Common issues:
- Machine is marked offline despite being quarantined — this would mean the heartbeat is not updating
last_seen. Verify the machine is successfully authenticating the heartbeat (node_key is valid). - Quarantine action returns 403 — only org admins can quarantine machines.
ST5 — Recovery: Offline Machine Returns Online After ztna up
What it verifies: An offline machine returns to online status as soon as it sends a heartbeat with status: "online", and the Health page reflects this immediately.
Steps:
-
Ensure 🐧 Linux-C is currently
offline(either from ST1 or ST2 above). -
On ⊞ Win-A , keep the Health page open with DevTools WebSocket Messages visible.
-
On 🐧 Linux-C , restart the VPN:
ztna up
-
On ⊞ Win-A , within 5-10 seconds observe:
- DevTools receives a WebSocket
UPDATEevent:{ "event": "UPDATE", "payload": { "id": "...", "status": "online", "last_seen": "..." } } - Health page re-fetches machine list
- Linux-C row transitions to green dot,
onlinebadge, 100% availability bar
- DevTools receives a WebSocket
-
Confirm via API:
curl -s "https://login.quickztna.com/api/db/machines?org_id=$ORG_ID&select=name,status,last_seen" \
-H "Authorization: Bearer $TOKEN" | python3 -m json.tool
Expected:
{
"data": [
{
"name": "Linux-C",
"status": "online",
"last_seen": "2026-03-17T10:50:30.000Z"
}
]
}
- The WebSocket
UPDATEevent is broadcast becausestatusChanged = (machine.status !== resolvedStatus)inmachine-heartbeat.tsevaluates totrue(previous status wasoffline, new isonline).
Pass: Machine transitions back to online within seconds of ztna up. Health page updates automatically. Availability score jumps to 100%.
Fail / Common issues:
- Machine stays
offlinedespiteztna up— the heartbeat may be failing. Checkztna statuson Linux-C for the connection state. Look forPENDING_APPROVAL(403) if the machine was moved to pending status while offline. - Health page does not auto-update — WebSocket may have disconnected during the offline period. If the page was open for more than 90 seconds without a ping, the server may have evicted the connection. The client will reconnect within 5 seconds, after which it will receive subsequent events.
Summary
| Sub-test | What it proves | Pass condition |
|---|---|---|
| ST1 | Graceful offline via ztna down | Status = offline within seconds; last_seen updated; WebSocket UPDATE event delivered |
| ST2 | Hard kill offline via cleanup job | Status = offline after ~3 min; cleanup job SQL fires; WebSocket UPDATE event delivered |
| ST3 | Ephemeral machine deletion | Ephemeral machine row deleted after ~3 min; WebSocket DELETE event; no DB remnant |
| ST4 | Quarantine blocks offline marking | Quarantined machine keeps fresh last_seen but cleanup job ignores it (not online) |
| ST5 | Recovery to online | ztna up → heartbeat with status: online → WebSocket UPDATE → Health page shows online |