Lesson 6 of 6

ZTA Monitoring and Troubleshooting

Objective

In this lesson you will monitor Zero Trust Access (ZTA) sessions and troubleshoot common authentication failures and policy mismatches. We will use the management and monitoring primitives available in Catalyst SD-WAN / FTD monitoring and Catalyst Center to validate sessions, inspect identity and tag propagation, and correlate policy decisions to actual traffic. This matters in production because ZTA is only effective if identity and policy enforcement are observable — without monitoring you cannot detect failed authentications, tag loss across tunnels, or misapplied access policies that cause business outages.

Real-world scenario: a remote branch user reports they cannot reach an internal HR application after a ZTA rollout. You must determine whether the problem is authentication (AuthN), authorization (AuthZ/policy), or a network/tunnel issue that drops the Security Group Tag (SGT) carried over a VTI tunnel to the datacenter firewall.


Quick Recap

Reference the topology from Lesson 1 (no new devices or IP addresses are added in this lesson). We assume the following logical elements are already present from prior lessons:

  • Branch edge running Catalyst SD‑WAN / FTD acting as ZTA enforcement point.
  • Hub/datacenter firewall applying centralized ZTA policies.
  • Catalyst Center (or SD‑WAN Manager) for monitoring and policy visibility.
  • Applications and users authenticated via enterprise identity services; SGTs are used over VTIs to extend segmentation to the datacenter.

No new IPs or interfaces are introduced in this lesson.


Key Concepts (before hands-on)

  • Authentication (AuthN) vs Authorization (AuthZ): AuthN verifies identity (user, device). AuthZ maps that identity to an access level (SGT, role, or policy). When you see a session present but no application access, it is likely an AuthZ/policy issue rather than AuthN.

    Protocol behavior: AuthN events (successful or failed) generate identity records; when successful these records are associated with sessions and can be queried by monitoring tools.

  • Session state and telemetry: ZTA session state includes endpoint identity, device posture, assigned SGT/VPN-id, and the tunnel or egress interface. Monitoring shows both control-plane events (auth, tag assignment) and data-plane evidence (flows, counters).

    Packet flow: after AuthN succeeds, the enforcement device attaches the SGT or VPN-id and forwards traffic; if the SGT is lost en route (e.g., not preserved over a tunnel), DC policy will treat the session as unknown.

  • SGT over VTI and tag propagation: When SGT is used over a VTI, the tag must be preserved end-to-end (from branch to datacenter enforcement). If tags are missing at the datacenter, the firewall will either apply a default deny or a default-permit that is overly permissive.

    Important: Verify tunnel encapsulation and SGT mapping when troubleshooting cross-domain policy mismatches.

  • Correlation of logs and metrics: Authentication logs, policy hits, and application telemetry (RTT, jitter, packet loss) must be correlated to find the root cause. For example, a failed policy hit with zero packets indicates a policy mismatch; a policy hit with packets but no responses indicates downstream network or application problems.

  • Monitoring tools behavior: Catalyst Center / SD-WAN Manager provides summary dashboards (VPN topology, top applications, device health) and per-session details (identity, assigned tag, policy hit). Use these first for a high-level view, then fall back to device CLI for granular state.


Step-by-step configuration and troubleshooting

For each step below you will see the commands, why they matter, and verification output that shows what to expect.

Step 1: View ZTA session summary on the SD‑WAN / Catalyst Center

What we are doing: Gather a high-level view of active ZTA sessions and their identity attributes (identity, SGT, VPN-id). This helps narrow whether failures are due to missing sessions (AuthN failure) or present sessions with wrong tags (AuthZ/policy mismatch).

! On Catalyst SD-WAN Manager / FTD management CLI (management node)
show sdwan session summary

What just happened: The command requests the manager's aggregated view of active sessions from all monitored enforcement points. It returns per-session entries including username, device ID, assigned SGT or VPN-id, enforcement device, and session state. This differentiates control-plane authentication events from data-plane flow.

Real-world note: Use the management UI first to avoid touching production devices unless necessary — the UI often aggregates data across many devices for faster triage.

Verify:

show sdwan session summary
SessionID    User            DeviceID          EnfDevice       SGT    VPN-ID   State        BytesIn   BytesOut
-----------------------------------------------------------------------------------------------------------
1001         alice@lab.nhprep.com  device-branch-1  branch-ftd-1    100    10       Established  15324     4820
1002         bob@lab.nhprep.com    device-branch-2  branch-ftd-2    200    20       AuthFailed   0         0
1003         guest1@lab.nhprep.com guest-device-3   branch-ftd-1    -      30       Established  2048      1024
  • Expect to see sessions listed. Note the AuthFailed state for session 1002 — this indicates authentication did not complete and there will be no data-plane bytes.

Step 2: Inspect authentication failure details

What we are doing: For any session listed as AuthFailed, inspect the authentication logs and identity provider (IdP) result to determine why AuthN failed (wrong password, expired certificate, unreachable IdP, or posture check failure).

! On the enforcement device (branch FTD) CLI
show authentication sessions details user bob@lab.nhprep.com

What just happened: The enforcement device returns authentication attempt details for the specified user: timestamp, IdP response code, failure reason (invalid credentials, timeout), and any posture or certificate validation errors. This pinpoints whether AuthN failed locally or at the IdP.

Real-world note: Authentication failures are commonly caused by time skew (NTP issues) when certificates are used. Always check NTP on both enforcement device and IdP.

Verify:

show authentication sessions details user bob@lab.nhprep.com
User: bob@lab.nhprep.com
Attempt Time: 2026-03-15 09:12:44 UTC
Auth Method: EAP-TLS
IdP Response: 401 Unauthorized
Failure Reason: Certificate not trusted - CA chain incomplete
Enforcement: branch-ftd-2
Last Retry: 2026-03-15 09:13:10 UTC
  • The output shows a certificate validation failure (CA chain incomplete). Fixing the CA chain on the IdP or importing the root/intermediate on the enforcement device will allow successful AuthN.

Step 3: Verify SGT / VPN-id mapping on the enforcement device

What we are doing: For sessions that did authenticate but cannot reach resources, confirm the SGT or VPN-id assignment and mapping to local interfaces or tunnel attachments. A correct SGT must be present before traffic egresses to the datacenter.

! On enforcement device CLI
show zta sessions detail session 1001

What just happened: The device report displays the session's assigned SGT/VPN-id, source interface, assigned policy, and the VTI (if used) with tag propagation status. This shows whether the enforcement device attached the expected tag and whether it is configured to propagate tags over the tunnel.

Real-world note: If SGT is present at branch but missing at DC, inspect the VTI configuration and any intermediate routers that may drop tag metadata.

Verify:

show zta sessions detail session 1001
SessionID: 1001
User: alice@lab.nhprep.com
Device: device-branch-1
Assigned SGT: 100
Assigned VPN-ID: 10
Source Interface: GigabitEthernet0/1
Attached Policy: ZTA-Branch-User-Access
VTI Tunnel: vti-100 (peer: dc-fw-1)
SGT Propagation: Enabled
Last Policy Hit: 2026-03-15 09:05:22 UTC
BytesIn: 15324
BytesOut: 4820
  • Confirm that SGT Propagation: Enabled and that policy attached is ZTA-Branch-User-Access. If propagation is disabled, SGT will not reach DC.

Step 4: Check SGT at the datacenter firewall (policy enforcement point)

What we are doing: Verify whether the datacenter firewall received the SGT and which policy it applied. If the DC firewall sees an unknown or missing SGT, it may deny traffic or map it to a default role.

! On datacenter firewall CLI
show access-control sessions filter user alice@lab.nhprep.com

What just happened: The datacenter firewall returns per-session policy evaluation including received SGT, applied access rule, and the match count. This shows whether the DC policy decision aligns with the branch enforcement.

Real-world note: In production, mismatches are often due to version/config drift between branch and DC enforcement points — ensure both use the same SGT-to-role mapping.

Verify:

show access-control sessions filter user alice@lab.nhprep.com
User: alice@lab.nhprep.com
SrcIP: 10.10.10.25
DstIP: 172.16.100.10
Received SGT: 100
Applied Rule: Allow-ZTA-HR-App
PacketsMatched: 120
BytesMatched: 15324
Session State: Established
Last Seen: 2026-03-15 09:06:04 UTC
  • The DC sees Received SGT: 100 and applied Allow-ZTA-HR-App, indicating tag propagation and policy alignment are correct. If Received SGT were - or unknown, further tunnel/VTI checks are needed.

Step 5: Correlate application telemetry and flow counters

What we are doing: Use application monitoring to confirm traffic forward and to identify potential performance issues (jitter, RTT, packet loss) that might look like access problems but are network performance problems.

! On SD-WAN Manager / Catalyst Center CLI
show sdwan app-monitoring top application HR-App

What just happened: The manager shows application-level telemetry for the named application: RTT, packet loss, jitter, MOS, and path selection. If the app shows high packet loss or high RTT, users might experience perceived service failure even though access is permitted.

Real-world note: Application telemetry is critical for troubleshooting intermittent problems that are not policy related, e.g., a saturated ISP link causing high packet loss.

Verify:

show sdwan app-monitoring top application HR-App
Application: HR-App
Monitoring Window: 15 minutes
Average RTT: 45 ms
Average Packet Loss: 0.5%
Average Jitter: 2 ms
MOS Score: 4.2
Top Paths:
 - PathA (primary ISP): Loss 0.2% RTT 40 ms
 - PathB (secondary ISP): Loss 3.5% RTT 120 ms
Flows Observed: 12
Top Users:
 - alice@lab.nhprep.com (FlowID 1001) RTT 42 ms Loss 0.1%
  • Low loss and good MOS suggest the network is not the issue for Alice. If values were poor, investigate interface counters and WAN link health.

Step 6: Collect logs for root cause and fix configuration mismatches

What we are doing: If an issue persists after the above checks, collect the enforcement device logs and the management audit logs to create an evidence bundle for troubleshooting or support escalation.

! On enforcement device
show logging | include ZTA
! On SD-WAN Manager
show audit logs | include policy

What just happened: These commands extract logs related to ZTA events and policy audits, showing time-stamped entries for authentication, tag assignment, policy install, and errors. Correlate timestamps between endpoints to find where a tag or policy diverged.

Real-world note: Keep logs for at least 30 days in enterprise deployments for forensic analysis and compliance; log rotation and central collection are best practice.

Verify:

show logging | include ZTA
Mar 15 09:05:20 ZTA: Session 1001 AuthN success user=alice@lab.nhprep.com method=EAP-TLS
Mar 15 09:05:22 ZTA: Session 1001 Assigned SGT=100 VPN-ID=10 policy=ZTA-Branch-User-Access
Mar 15 09:05:25 ZTA: Session 1001 VTI vti-100 tag-propagation=enabled
Mar 15 09:07:01 ZTA: Session 1002 AuthN failed user=bob@lab.nhprep.com reason=CertChainInvalid

show audit logs | include policy
Mar 15 09:05:22 Policy: Installed policy ZTA-Branch-User-Access on branch-ftd-1
Mar 15 09:06:00 Policy: Matched Allow-ZTA-HR-App for Session 1001 on dc-fw-1
Mar 15 09:07:02 Policy: Auth failure trace for Session 1002 user=bob@lab.nhprep.com
  • These logs confirm the timeline and help identify configuration gaps such as missing CA certificates or disabled SGT propagation.

Verification Checklist

  • Check 1: Confirm session state is Established for the user — verify with show sdwan session summary and expect BytesIn/BytesOut > 0 for successful sessions.
  • Check 2: Validate AuthN success and, for failures, read the show authentication sessions details output to identify the failure reason (certificate, credential, timeout).
  • Check 3: Ensure SGT/VPN-id is attached by the branch and received by DC firewall — verify with show zta sessions detail on branch and show access-control sessions on DC.
  • Check 4: Correlate application telemetry (RTT, packet loss) using show sdwan app-monitoring to rule out data-plane performance issues.

Common Mistakes

SymptomCauseFix
Session shows AuthFailed; no bytes transferredMissing or invalid certificate chain at IdP or enforcement deviceImport correct root/intermediate CA certificates; verify NTP synchronization
SGT present on branch but missing at DCSGT propagation not enabled on VTI or intermediate device drops tagEnable SGT propagation on VTI; ensure intermediate routers support tag transport
Policy applied but user cannot reach application (packets in, zero responses)Downstream application unreachable or asymmetric routing causing responses to be droppedVerify application host reachability from DC; check NAT and return path; inspect route tables
Management shows policy installed but device shows different behaviorConfiguration drift or incomplete policy pushRe-push policy template from manager; verify policy version and confirm device reports success
High perceived access failure but monitoring shows AllowNetwork performance (high loss/jitter) rather than access controlUse application telemetry to identify poor path; inspect WAN link metrics and failover policies

Key Takeaways

  • Always distinguish AuthN (is the user/device authenticated?) from AuthZ (what the user/device is allowed to do). Monitoring tools show both; start with AuthN for missing sessions.
  • Verify tag propagation (SGT/VPN-id) end-to-end. A tag at the branch that is lost over a VTI will create authorization failures at the datacenter.
  • Correlate control-plane events (auth logs, policy installs) with data-plane evidence (bytes, application telemetry). This combination quickly separates policy problems from network performance problems.
  • In production, use centralized logging and retention (management + enforcement logs) and ensure NTP and certificate chains are healthy — these are frequent root causes for ZTA authentication problems.

Tip: When troubleshooting in production, always gather non‑disruptive data first (management dashboards, logs) before making configuration changes. Keep a clear timeline of events (timestamps are critical) to correlate actions across devices.


If you need, I can provide a checklist script to run the verification commands in sequence, or walk through a specific failure example (e.g., certificate chain error) with exact remediation steps and commands to import certificates and verify NTP.