Predictive Analytics for WAN
Objective
In this lesson you will enable and validate Predictive Path Recommendations in the SD‑WAN analytics plane so the system recommends path changes (for example: switch from load balancing between private1 & private2 to private2) before users see degraded performance. This matters in production because proactive recommendations reduce mean time to repair and can be applied automatically via closed‑loop automation, avoiding user impact during peak traffic. Real-world scenario: a retail chain with multiple branch sites observes intermittent voice quality drops; predictive recommendations identify a persistent, degrading path and recommend switching to a more stable private circuit before calls fail.
Topology
Refer to the topology introduced in Lesson 1 for device locations, WAN circuits, and site IDs. This lesson adds no new physical devices — we operate in the SD‑WAN Manager (analytics) plane and on existing site configurations.
Tip: You access the SD‑WAN Manager GUI at https://vmanage.lab.nhprep.com using the NHPREP organizational account (use password Lab@123 for lab exercises). All configuration actions in this lesson originate in the SD‑WAN Manager.
Device Table
| Device Name | Role | Site ID / Notes |
|---|---|---|
| SD-WAN Manager | Analytics & Control | Centralized analytics and recommendations |
| Edge Router HQ | Regional hub | Existing site configuration (see Lesson 1) |
| Edge Router Branch-101 | Branch | Existing site (site-id: site-101) |
| Edge Router Branch-202 | Branch | Existing site (site-id: site-202) |
Quick Recap
- The network uses multiple underlay paths per site (examples: private1, private2, Internet).
- Lesson 1 created the base fabric: site entities, AAR (Application Aware Routing) policy, and site lists.
- This lesson does not touch underlay routing; it enables analytics recommendations and applies them at the policy layer.
Key Concepts
- Predictive Path Recommendations: analytics uses historical telemetry and predictive models to forecast path quality and recommends switching paths to improve application experience. Think of it like a weather forecast for the network — it tells you a route will worsen, and suggests moving traffic before the storm hits.
- Closed‑Loop Automation: when enabled, the system automatically creates a copy of the AAR policy, modifies the sequence for the targeted app/site, and applies it to the site. This is like having an automated operations assistant that edits and deploys a policy change for you.
- NWPI (Network-Wide Path Insights) Metadata: adding lightweight per‑flow metadata into the SD‑WAN header enables richer path telemetry, improving recommendation accuracy. Packet flow-level tagging allows correlating per-hop behavior to end-to-end application quality.
- Recommendation behavior: recommendations include current path, recommended path, path quality metrics (loss/latency/jitter) and an estimated % gain in path quality. The system may recommend switching from load-balanced paths to a single preferred path (e.g., private2).
- Why this matters: in production, proactively applying recommendations minimizes user-visible outages and stabilizes application SLAs; this is critical for voice, video, and real-time apps.
Steps
Step 1: Enable Predictive Path Recommendations in SD‑WAN Manager
What we are doing: Turn on the analytics feature that generates predictive path recommendations. This is the foundation — without it nothing downstream (recommendations, forecasts, automation) will run.
vmanage# configure analytics predictive-path-recommendations enable
vmanage(config-analytics)# license predeposit TE-EMBED-WANI
vmanage(config-analytics)# commit
What just happened: The first command enables the analytics engine to generate predictive recommendations. The second command ensures the pre-deposited embed license for analytics (TE‑EMBED‑WANI) is recognized; this license permits the analytics featureset. Committing persists the analytics configuration and starts background model generation using historical telemetry.
Real-world note: Enabling this in production should be done during a maintenance window on large fleets to allow initial model training without triggering automated changes.
Verify:
vmanage# show analytics predictive-path-recommendations status
Predictive Path Recommendations Status: Enabled
License: TE-EMBED-WANI (pre-deposited)
Model Training: In progress (historical telemetry ingestion)
Next Run: 2025-04-02 02:00:00 UTC
Step 2: Review Recommendation Summary by Application Group
What we are doing: Query the system for current recommendations across application groups so you can evaluate which apps/sites are impacted and the recommended action (for example, move Office365 traffic from load-balanced private1/private2 to private2).
vmanage# show analytics predictive-path-recommendations summary app-group Office365
App Group: Office365
Sites with Recommendations: 2
- site-101: Current Path: private1/private2 (load-balance) Recommended Path: private2 Estimated Quality Gain: 36%
- site-202: Current Path: internet Recommended Path: private1 Estimated Quality Gain: 18%
Recommendation Timestamp: 2025-03-31 18:40:00 UTC
What just happened: The command lists recommendations for the Office365 application group, showing site-level recommended path changes and the estimated percent improvement in path quality. The analytics engine correlates historical path metrics (loss/latency/jitter) and outputs the recommendation summary.
Real-world note: Focus first on apps with high sensitivity to loss/jitter (voice/video). A 36% estimated gain for voice is meaningful and warrants action.
Verify:
vmanage# show analytics predictive-path-recommendations details site site-101
Site ID: site-101
Application Group: Office365
Current Path(s): private1, private2 (load-balance)
Recommended Path: private2
Quality Metrics (Current): Loss: 3.2% Latency: 86 ms Jitter: 12 ms
Quality Metrics (Recommended): Loss: 0.5% Latency: 40 ms Jitter: 4 ms
Estimated Quality Gain: 36%
Recommendation Basis: Persistence of loss on private1 observed over 7 days
Step 3: Simulate Applying Recommendation (Create Policy Copy)
What we are doing: Use closed‑loop automation workflow to create a copy of the AAR policy and a site list for the targeted site. This demonstrates how the system prepares a non‑destructive policy change that can be reviewed before commit.
vmanage# request analytics predictive-path apply-recommendation simulate site site-101 app-group Office365
Simulation Result:
AAR Policy: Original: Centralized-AAR
AAR Policy: Proposed: Centralized-AAR-reco-site-101
Policy Changes:
- Sequence for Office365 modified to prefer private2
Site List Created: site-list-site-101 (contains site-101)
No configuration changes applied to devices (simulation only)
What just happened: The system made a copy of the centralized AAR policy and created a site-specific policy variant with the recommended sequence for Office365. Because this is a simulation, no live changes were pushed to edge devices — enabling safe review.
Real-world note: Production teams should always review simulated policy changes. Automatic application is powerful but can cause unexpected shifts if central policies are modified later.
Verify:
vmanage# show policy aar Centralized-AAR-reco-site-101
Policy Name: Centralized-AAR-reco-site-101
Applied To: site-list-site-101
Sequence 10: Office365 -> preferred path: private2 -> action: prefer
Sequence 20: Default -> keep existing sequence
Policy Status: Ready for deployment
Step 4: Apply Recommendation via Closed‑Loop Automation (Commit)
What we are doing: Commit the recommended policy to the target site. The closed‑loop mechanism applies the policy copy to the specific site and updates the site AAR sequence. This is the actual change that will alter traffic steering.
vmanage# request analytics predictive-path apply-recommendation commit site site-101 app-group Office365
Applying Recommendation...
Created policy copy: Centralized-AAR-reco-site-101
Updated AAR sequence for Office365 on site-101
Pushed configuration to devices: Edge Router Branch-101
Deployment Status: Success
What just happened: The manager created the AAR policy copy, updated the relevant sequence to prefer private2 for Office365, and pushed the configuration to the edge router at site-101. Edge devices update their runtime policy and begin steering Office365 flows according to the new sequence.
Real-world note: If central policy is later changed, the system will detect divergence and may revert recommendations. Keep a record of automated changes and coordinate with central policy owners.
Verify:
vmanage# show policy device-status device Branch-101
Device: Branch-101
Policy Installed: Centralized-AAR-reco-site-101 (active)
AAR Sequence: Office365 -> private2 (preferred)
Last Push Time: 2025-03-31 19:05:23 UTC
Operational State: Active
Step 5: Enable NWPI Metadata on Edge Devices to Improve Telemetry
What we are doing: Turn on NWPI metadata insertion so edge devices append path metadata that enhances analytics accuracy for future recommendations. This improves how the analytics engine correlates flows to path conditions.
vmanage# configure nwpi metadata enable
vmanage(config-nwpi)# apply to site-list site-list-all-edges
vmanage(config-nwpi)# commit
What just happened: Enabling NWPI causes the manager to instruct edge devices to write NWPI metadata into the SD‑WAN header for traced flows. Subsequent telemetry will include per-hop metadata that allows more precise inference of where loss/jitter occurs along a path.
Real-world note: NWPI metadata increases visibility with minimal overhead; however, confirm your monitoring policy and privacy requirements before enabling per-flow metadata in production.
Verify:
vmanage# show nwpi status
NWPI Status: Enabled
Applied To: site-list-all-edges
Metadata Version: 1
Last Metadata Ingest: 2025-03-31 19:10:00 UTC
Sample Trace:
Flow ID: 0x1a2b
Path: Branch-101 -> Transit-1 -> Hub -> Transit-2 -> Branch-202
Per-hop loss: [0.2%, 0.0%, 0.1%, 0.0%]
Step 6: Review Bandwidth Forecasting for Capacity Planning
What we are doing: Generate a bandwidth forecast for the top circuits to anticipate capacity needs. Forecasting helps decide whether a path recommendation should be permanent (because a private circuit cannot handle growth) or temporary.
vmanage# show analytics bandwidth-forecast top-circuits
Top 5 Circuits Forecast:
1) Circuit: Branch-101 - private2
Historical Peak: 85 Mbps
Forecast Peak (3 months): 110 Mbps
Seasonality: +20% weekly spikes (Mon-Fri 09:00-11:00)
2) Circuit: HQ - private1
Historical Peak: 120 Mbps
Forecast Peak (3 months): 125 Mbps
...
What just happened: The analytics engine used historical interface stats and models to project future circuit usage and seasonality. High forecasted growth on a circuit may indicate the recommendation to shift traffic could cause contention later; this influences whether to implement a permanent policy change or explore capacity upgrades.
Real-world note: Use forecasting when recommendations point traffic to circuits nearing capacity — otherwise you create a new bottleneck.
Verify:
vmanage# show analytics bandwidth-forecast circuit Branch-101-private2 detail
Circuit: Branch-101-private2
Historical Data Window: 52 weeks
Model: Ensemble (neural + statistical)
MAPE: 4.8%
Forecast Horizon: 12 weeks
Forecast Peak: 110 Mbps
Current Provisioned Bandwidth: 100 Mbps
Action Recommendation: Consider capacity upgrade to 150 Mbps within 8 weeks if growth trend continues
Verification Checklist
-
Check 1: Predictive recommendations enabled. Verify with:
show analytics predictive-path-recommendations statusExpected: Status: Enabled, License: TE-EMBED-WANI.
-
Check 2: Recommendation applied to site-101. Verify with:
show policy device-status device Branch-101Expected: Policy Installed: Centralized-AAR-reco-site-101 (active); AAR Sequence shows Office365 preferred via private2.
-
Check 3: NWPI metadata active and ingested. Verify with:
show nwpi statusExpected: NWPI Status: Enabled; Last Metadata Ingest shows recent timestamp and sample trace entries.
-
Check 4: Bandwidth forecast available for top circuits. Verify with:
show analytics bandwidth-forecast top-circuitsExpected: List with forecast peaks and action recommendations.
Common Mistakes
| Symptom | Cause | Fix |
|---|---|---|
| Recommendations show but cannot be applied | Centralized policy locked or user lacks privileges | Ensure you have Policy Admin rights; unlock central policy or create site-specific copy via simulation then apply |
| After applying recommendation, traffic still follows old path | Edge device did not receive/push failed or device policy mismatch | Check device connectivity to manager and push status: show policy device-status device <device>; re-push policy if needed |
| NWPI shows no traces after enabling | Edge devices running older agent or telemetry aggregation delayed | Confirm NWPI was pushed to all edges and wait for ingestion; validate show nwpi status for applied site-lists |
| Recommendation leads to congestion on recommended circuit | Forecasting shows circuit near capacity or growth ignored | Review bandwidth-forecast results; if forecasted Peak > provisioned, consider upgrade or distribute load to alternate circuits |
Key Takeaways
- Predictive analytics converts historical telemetry into proactive, site‑level path recommendations; this reduces user impact by acting before degradation becomes user-visible.
- Closed‑loop automation creates isolated policy copies for safe application and can automatically revert if central policy changes — always review simulated changes before commit.
- NWPI metadata enhances telemetry quality and improves the accuracy of path recommendations by supplying per-hop flow details.
- Forecasting and capacity planning are essential companions to recommendations: always validate that the recommended path can handle expected future growth.
Warning: Automated changes are powerful. In production, combine automated recommendations with change control practices (review, approval, rollback plan) to avoid unintended policy drift.