AI Assistants for Network Ops
Objective
In this lesson you will configure and validate AI Assistants for Network Ops that use Network-Wide Path Insights (NWPI) and telemetry to troubleshoot, monitor, and recommend routing/policy changes. This matters in production because AI assistants can accelerate detection, provide deterministic path recommendations, and automate triage across SD‑WAN, ThousandEyes, and firewall telemetry—reducing mean time to repair (MTTR) for application issues. Real-world scenario: a remote branch (SJC-Branch) reports poor Microsoft 365 performance; an AI assistant uses NWPI traces, SD‑WAN telemetry, and ThousandEyes data to find the problem hop and recommend a path change.
Quick Recap
Reference topology (from earlier lessons) is reused here. No new physical routers are added in this lesson — we integrate the existing SD‑WAN fabric with monitoring agents and enable AI assistant capabilities on the management plane.
ASCII topology (interfaces show management / control IPs used for telemetry and NWPI collection):
SJC-Branch (Router) RTP-Hub1 (Router) NYK-Branch (Router) Gi0/0 10.10.10.1/24 ------------------- Gi0/0 10.10.20.1/24 ------------------- Gi0/0 10.10.30.1/24 | | | | | | +-- Internet / DIA -----------------------+-- MPLS / Backhaul ------------------------+
SD-WAN Manager (vManage) 192.168.100.10 ThousandEyes Enterprise Agent 192.168.200.10
Device table
| Device | Role | mgmt IP |
|---|---|---|
| vManage (SD‑WAN Manager) | Central controller / NWPI UI | 192.168.100.10 |
| SJC-Branch Router | SD‑WAN Edge (branch) | 10.10.10.1 |
| RTP-Hub1 Router | SD‑WAN Hub / transit | 10.10.20.1 |
| NYK-Branch Router | SD‑WAN Edge (branch) | 10.10.30.1 |
| ThousandEyes Agent | SaaS / WAN test agent | 192.168.200.10 |
Tip: Throughout this lesson we use the domain lab.nhprep.com for any server names and Lab@123 as an example password where needed.
Key Concepts
- NWPI (Network‑Wide Path Insights): collects flow-level metadata and reconstructs end-to-end path, including SD‑WAN policy steps. In production, NWPI helps pinpoint which segment (branch→hub, hub→internet, server side) causes degradation.
- Protocol behavior: NWPI triggers on telemetry and flow metadata; when a trace is requested it aggregates flow records from edge controllers and shows hop-by-hop metrics (loss, jitter, delay).
- Telemetry ingestion: devices send flow telemetry (NetFlow/IPFIX/streaming telemetry) to the manager or collector. The manager correlates telemetry with policy and topology to produce recommendations.
- Practical impact: Accurate and timely telemetry reduces detective work—operators can see where packets are dropped or queued.
- AI Assistant native skills: pre-built analytic routines that summarize logs, detect anomalies, and recommend actions. They operate on ingested telemetry, configuration, and policy.
- In production, AI Assistants can be used for runbooks, suggested policy changes, or automated ticketing.
- Integration with Third‑Party Agents (ThousandEyes): external synthetic tests provide active probe measurements that complement passive NWPI telemetry. When combined, you get both user-experience (synthetic) and path-level telemetry.
- Actionability & Safety: recommendations are suggestions unless explicitly approved. In a live environment, always validate recommendations in a maintenance window before applying to production.
Step-by-step configuration
Step 1: Enable flow telemetry on each SD‑WAN edge router
What we are doing: Configure the edge routers to export flow telemetry (IPFIX/NetFlow) to the SD‑WAN Manager so NWPI can ingest flow metadata. This is essential because NWPI relies on telemetry; without it the AI assistant has no flow data to analyze.
! On SJC-Branch
configure terminal
ip flow-export destination 192.168.100.10 4739
ip flow-export version 10
ip flow-cache timeout active 60
end
! On RTP-Hub1
configure terminal
ip flow-export destination 192.168.100.10 4739
ip flow-export version 10
ip flow-cache timeout active 60
end
! On NYK-Branch
configure terminal
ip flow-export destination 192.168.100.10 4739
ip flow-export version 10
ip flow-cache timeout active 60
end
What just happened: Each router is told to export IP flow records (IPFIX/NetFlow v10) to the collector at 192.168.100.10 (vManage). The active timeout sets how often flows are exported; 60 seconds balances timeliness versus overhead. These records provide flow tuples (src/dst, ports, protocol, bytes, packets, timestamps) that NWPI correlates with path information.
Real-world note: On heavily loaded routers, tune export sampling or use deduplicated telemetry to limit CPU/memory impact.
Verify:
! Verify on SJC-Branch
show ip flow export
Exporting flows to 192.168.100.10 (vrf default)
Version: 10
Destination port: 4739
Exporting via source address: 10.10.10.1
Active timeout: 60 secs
Packet count: 14321
Flow records sent: 712
Last error: None
Step 2: Register ThousandEyes agent with SD‑WAN Manager
What we are doing: Add the ThousandEyes synthetic agent to the SD‑WAN Manager so the AI assistant can combine synthetic test results with NWPI telemetry. Synthetic tests validate application performance from branch vantage points.
! On vManage (or via its API/UI — here shown as config block)
configure terminal
thousandeyes agent add id 01 address 192.168.200.10 name TE-Agent-01 domain lab.nhprep.com
thousandeyes agent enable 01
end
What just happened: The management plane now knows about an external ThousandEyes probe at 192.168.200.10. When AI assistant requests combined insight, vManage will pull the probe results and correlate with flow telemetry and policy.
Real-world note: Ensure probes are placed in the appropriate path segments (branch, hub, DC) to get meaningful coverage.
Verify:
! Verify on vManage
show thousandeyes agents
Agent ID Name Address Status Last Seen
01 TE-Agent-01 192.168.200.10 Enabled 00:01:14
Step 3: Enable NWPI and AI Assistant native skills on SD‑WAN Manager
What we are doing: Turn on NWPI collection and enable AI Assistant native skills for path recommendations and anomaly detection. This provides the analytical engine that ingests telemetry and generates suggestions.
! On vManage
configure terminal
nwpi enable
ai-assistant native-skill enable nwpi path-recommendations anomaly-detection
end
What just happened: NWPI collection is started on the manager; the AI assistant native skills for path recommendations and anomaly detection are activated. From a protocol view, the manager now accepts flow records and will run analytical models to detect anomalies and compute alternative paths.
Real-world note: Activating analytics increases storage and CPU usage on the controller; size the platform appropriately for network scale.
Verify:
! Verify on vManage
show nwpi status
NWPI Status: Enabled
Flows received: 124712
Active traces: 0
Last processed: 2026-03-30 14:22:10
show ai-assistant status
AI Assistant: Native Skills Enabled
Skills: nwpi (path-recommendations, anomaly-detection)
Last analysis run: 2026-03-30 14:22:05
Step 4: Start an NWPI trace for the impacted flow (SJC-Branch → Microsoft 365)
What we are doing: Trigger an on-demand NWPI trace from SJC-Branch toward the Microsoft 365 service to collect per-hop metrics and have the AI assistant analyze it. This creates a trace that correlates flow telemetry and ThousandEyes results.
! On vManage (UI/API action — shown as CLI for lab)
configure terminal
nwpi trace start source 10.10.10.50 destination 52.112.0.0/16 application Office365 reason "User complaint: M365 slow"
end
What just happened: A trace was requested for flows originating at 10.10.10.50 to the Microsoft 365 address space. NWPI aggregates the flow records from edges and any synthetic probe data, then computes per-hop scores (loss, jitter, delay) and flags problematic hops.
Real-world note: Use the specific client IP (10.10.10.50 in this example) to correlate user-reported issues to network-level telemetry.
Verify:
! Verify trace status and results
show nwpi trace status
Trace ID: 20260330-001
Source: 10.10.10.50
Destination: 52.112.0.0/16
Application: Office365
Status: Completed
Start time: 2026-03-30 14:25:30
End time: 2026-03-30 14:26:10
show nwpi trace results id 20260330-001
Hop 1 SJC-Branch-Gi0/0 Loss: 0% Delay: 1 ms Jitter: 0.2 ms Score: 10
Hop 2 ISP-Edge Loss: 0% Delay: 8 ms Jitter: 1.0 ms Score: 8
Hop 3 RTP-Hub1-Gi0/0 Loss: 5% Delay: 25 ms Jitter: 5.2 ms Score: 3 <-- Degraded
Hop 4 Internet-Peer Loss: 0% Delay: 40 ms Jitter: 2.1 ms Score: 7
Hop 5 Microsoft-Edge Loss: 0% Delay: 10 ms Jitter: 0.5 ms Score: 9
AI Assistant Analysis:
- Anomaly detected on hop 3 (RTP-Hub1): packet loss 5% and high jitter.
- Recommendation: Reroute Office365 traffic from MPLS backhaul to direct internet breakout (DIA) at SJC-Branch for improved score.
Step 5: Apply the AI Assistant recommendation (review + change policy)
What we are doing: Review the recommendation and apply a path policy change to route Office365 traffic via local breakout at SJC-Branch. We perform a controlled policy change rather than an immediate automatic action.
! On vManage - create a data policy for Office365 local breakout (shown as config)
configure terminal
policy data-policy Office365-LocalBreakout
match application Office365
action set path-preference prefer-inet
commit
end
! Push policy to SJC-Branch
device-group SJC-Group
apply policy data-policy Office365-LocalBreakout to devices SJC-Branch
end
What just happened: A data policy matching Office365 was created and set to prefer the internet (DIA) path. Pushing this policy instructs the edge to prefer the local breakout rather than backhauling via RTP-Hub1, which the trace identified as degraded.
Real-world note: Always validate policies in a canary group or during a maintenance window; monitor user feedback and NWPI after change.
Verify:
! Verify applied policy on SJC-Branch
show running-config policy | include Office365-LocalBreakout
policy data-policy Office365-LocalBreakout
match application Office365
action set path-preference prefer-inet
! Confirm path selection for active flows
show sdwan flows application Office365
Flow ID Src Dst Path Selected
1001 10.10.10.50 52.112.0.0/16 direct-inet (preferred)
Verification Checklist
- Check 1: Telemetry collection active — run
show ip flow exporton each edge and confirm destination 192.168.100.10 and non-zero flow counts. - Check 2: NWPI enabled and receiving flows — run
show nwpi statuson vManage and confirm "Flows received" is increasing. - Check 3: AI Assistant recommendations appear — run
show nwpi trace results id <trace-id>and confirm analysis/recommendation lines are present. - Check 4: Policy change applied — run
show sdwan flows application Office365on the manager and confirm path selected matches the new policy.
Common Mistakes
| Symptom | Cause | Fix |
|---|---|---|
| No flows in NWPI | Edge routers not exporting flow telemetry or wrong collector IP/port | Confirm ip flow-export destination 192.168.100.10 4739 and that UDP port 4739 is reachable from edges to vManage |
| ThousandEyes agent shows "Not seen" | Agent not registered or firewall blocking connectivity | Verify agent IP 192.168.200.10 is reachable from vManage and agent service is running |
| AI Assistant shows no recommendations | Insufficient flow samples or short trace duration | Increase flow sampling rate or run longer/more targeted NWPI traces |
| Policy not taking effect | Policy not pushed to correct device group or wrong match criteria | Verify device group assignment and the policy match (application name must match Office365) |
Key Takeaways
- NWPI needs reliable flow telemetry and optionally ThousandEyes synthetic tests; both together provide a comprehensive view for AI assistants.
- AI assistants can detect anomalies and produce path recommendations, but changes should be reviewed and staged—never blindly automated in critical networks.
- Always verify telemetry, NWPI ingestion, and probe connectivity before trusting automated recommendations.
- In production, use canary rollouts and monitoring to ensure that a recommended path change actually improves user experience.
Important: Treat AI assistant output as guidance—combine it with your operational knowledge and change management policies before applying to production.