Building an SD-WAN Dashboard
Objective
In this lesson you will build a lightweight SD‑WAN monitoring dashboard by consuming the SD‑WAN Manager REST APIs. You will authenticate to the manager, extract device inventory and alarm data, transform the API JSON into useful metrics, and implement a simple alert script that can run periodically. In production, this lets NOC teams detect site outages or critical alarms quickly and integrate SD‑WAN state into existing monitoring systems.
Introduction
We will use the SD‑WAN Manager API to gather device inventory, health and alarm data, and create a simple reporter that raises alerts for critical conditions. This matters in production because the centralized manager is the single source of truth for control and telemetry in an SD‑WAN fabric — querying it programmatically allows custom reporting, SLA verification, and integration with ITSM/alerting tools. Real-world scenario: the NHPREP operations team needs a daily executive report and immediate notification when branch edges lose control connectivity or generate critical alarms.
Quick Recap
Reference topology from Lesson 1 (no new network devices are added in this lesson). The dashboard server is introduced in this lesson and added to the lab network.
ASCII Topology (management/monitoring network):
[Dashboard Server] dashboard.lab.nhprep.com
eth0: 192.168.100.50/24
192.168.100.0/24
+---------------------------------------------+
| |
| vManage vBond vSmart |
| 192.168.100.10 192.168.100.11 192.168.100.12
| |
+---------------------------------------------+
|
Management network
Device Table
| Device | Role | Management IP | Access name |
|---|---|---|---|
| vManage | SD‑WAN Manager (API provider) | 192.168.100.10 | vmanage.lab.nhprep.com |
| vBond | Orchestrator | 192.168.100.11 | vbond.lab.nhprep.com |
| vSmart | Controller | 192.168.100.12 | vsmart.lab.nhprep.com |
| Dashboard Server | Monitoring / API client | 192.168.100.50 | dashboard.lab.nhprep.com |
Credentials (used in examples): username = admin, password = Lab@123, organization = NHPREP
Important: all API calls in this lesson are directed at vManage (192.168.100.10). The dashboard server is a separate host that queries vManage via its REST API.
Key Concepts
- Manager API Authentication: SD‑WAN managers typically require a login flow that creates a session cookie (JSESSIONID) and may require an XSRF token. You must authenticate before calling protected API endpoints. In production this is handled by a service account with restricted permissions.
- Inventory vs Telemetry: Device inventory APIs return static configuration/identifiers (host‑name, system IP, UUID). Telemetry/alarms APIs provide dynamic state (alarms, interfaces, throughput). Inventory is used to identify devices for targeted telemetry queries.
- JSON and Processing: API endpoints return JSON. Scripts parse JSON (jq or Python) to convert into human‑readable metrics and to detect thresholds that trigger alerts.
- Polling cadence and rate limits: In production, polling too frequently may overload the manager or your monitoring pipeline. Use sensible intervals (e.g., 1–5 minutes) and rely on event/stream APIs where available for high granularity.
- Alerting: Your script can classify alarms by severity (e.g., Critical, Major) and trigger actions (email, ticket, webhook). Real automation uses idempotent operations — avoid spamming alerts for the same persistent condition.
Steps
Step 1: Authenticate to the SD‑WAN Manager API
What we are doing: Create an authenticated session to vManage so subsequent API calls succeed. Authentication establishes a session cookie and may return an XSRF token required by later POST/PUT calls.
# On Dashboard server: create directory for scripts and cookies
mkdir -p /opt/nhprep/sdwan && cd /opt/nhprep/sdwan
# Authenticate to vManage and save cookies
curl -k -c cookie.txt -X POST "https://192.168.100.10/j_security_check" -d "j_username=admin&j_password=Lab@123"
What just happened: The curl POST to /j_security_check submitted credentials. The manager validated the credentials and returned an HTTP response with a session cookie (JSESSIONID) stored in cookie.txt. This session cookie authenticates subsequent API requests. In protocol terms, the client performed an HTTP POST to initiate a server session; the server issued Set‑Cookie headers.
Real-world note: Use a service account with least privilege for API access. For high security, rotate credentials and use certificate-based auth where supported.
Verify:
# Display cookie file to confirm session cookie
cat cookie.txt
# Expected output (example)
# Netscape HTTP Cookie File
# https://192.168.100.10
192.168.100.10 TRUE / FALSE 0 JSESSIONID ABCDEF1234567890abcdef
Step 2: Retrieve device inventory (device list)
What we are doing: Query the manager's device inventory to learn which devices exist, their system IPs and UUIDs. This lets the dashboard map alarms to human names and identify devices to query for telemetry.
# Query devices API and save JSON output
curl -k -b cookie.txt -X GET "https://192.168.100.10/dataservice/device" -H "Accept: application/json" -o devices.json
What just happened: We issued a GET to /dataservice/device using the session cookie (cookie.txt). The manager returned a JSON array of device objects containing host-name, system-ip and uuid fields. This is a read-only inventory API call — it does not push changes to devices; it simply returns the manager's persisted inventory.
Verify:
# Show the devices.json content
cat devices.json
# Expected output (example)
[
{
"host-name": "vmanage",
"system-ip": "192.168.100.10",
"uuid": "11111111-aaaa-1111-aaaa-111111111111",
"device-type": "vmanage",
"version": "20.9.1"
},
{
"host-name": "vbond",
"system-ip": "192.168.100.11",
"uuid": "22222222-bbbb-2222-bbbb-222222222222",
"device-type": "vbond",
"version": "20.9.1"
},
{
"host-name": "vsmart",
"system-ip": "192.168.100.12",
"uuid": "33333333-cccc-3333-cccc-333333333333",
"device-type": "vsmart",
"version": "20.9.1"
},
{
"host-name": "vedge-branch1",
"system-ip": "10.0.1.1",
"uuid": "44444444-dddd-4444-dddd-444444444444",
"device-type": "vedge",
"version": "20.9.1"
},
{
"host-name": "vedge-branch2",
"system-ip": "10.0.2.1",
"uuid": "55555555-eeee-5555-eeee-555555555555",
"device-type": "vedge",
"version": "20.9.1"
}
]
Step 3: Query alarms and filter critical issues
What we are doing: Pull active alarms from the manager and filter for severity levels that require immediate attention (Critical/Major). The dashboard uses this to populate an Alerts section.
# Get active alarms
curl -k -b cookie.txt -X GET "https://192.168.100.10/dataservice/alarms" -H "Accept: application/json" -o alarms.json
# Filter critical/major using jq (install jq if not present)
cat alarms.json | jq '.[] | {deviceHostname: .deviceHostname, severity: .severity, description: .description, timestamp: .timeStamp}' > alarms_filtered.json
What just happened: The GET to /dataservice/alarms returned the manager's current alarm list. We then used jq to extract only the fields our dashboard needs. The alarms API reflects the manager's aggregation of device-reported conditions (control connectivity, datapath state, process failures). Alarms are time‑stamped so you can correlate events.
Real-world note: Some managers support event subscriptions or WebSocket telemetry that are more efficient than polling — use those for high-scale monitoring.
Verify:
# Show alarms_filtered.json
cat alarms_filtered.json
# Expected output (example)
{
"deviceHostname": "vedge-branch1",
"severity": "CRITICAL",
"description": "Control connection to vSmart lost",
"timestamp": "2026-04-02T09:12:34Z"
}
{
"deviceHostname": "vedge-branch2",
"severity": "MAJOR",
"description": "Interface ge0/1 down",
"timestamp": "2026-04-02T08:55:01Z"
}
Step 4: Create a simple Python dashboard script that summarizes devices and raises alerts
What we are doing: Implement a script (/opt/nhprep/sdwan/sdwan_dashboard.py) that authenticates, fetches inventory and alarms, and prints a concise summary. If any CRITICAL alarms are present, it exits with a non‑zero code (so monitoring systems can alert). This is the core of our dashboard logic.
# Create the dashboard script
cat > /opt/nhprep/sdwan/sdwan_dashboard.py << 'PY'
#!/usr/bin/env python3
# Simple SD-WAN Dashboard Script for NHPREP
import requests, sys, json
VMANAGE = "https://192.168.100.10"
USER = "admin"
PASS = "Lab@123"
requests.packages.urllib3.disable_warnings()
s = requests.session()
# Login
resp = s.post(f"{VMANAGE}/j_security_check", data={"j_username": USER, "j_password": PASS}, verify=False)
if resp.status_code != 200:
print("Authentication failed", resp.status_code)
sys.exit(2)
# Get devices
r = s.get(f"{VMANAGE}/dataservice/device", verify=False)
devices = r.json()
# Get alarms
a = s.get(f"{VMANAGE}/dataservice/alarms", verify=False)
alarms = a.json()
# Summarize
print("SD-WAN Dashboard Summary - NHPREP")
print("Total devices:", len(devices))
critical = [al for al in alarms if al.get("severity","").upper()=="CRITICAL"]
major = [al for al in alarms if al.get("severity","").upper()=="MAJOR"]
print("Critical alarms:", len(critical))
print("Major alarms:", len(major))
if critical:
print("\nCRITICAL ALERTS:")
for c in critical:
print(f"- {c.get('deviceHostname')} : {c.get('description')} at {c.get('timeStamp')}")
# Exit non-zero for monitoring systems
sys.exit(1)
print("\nNo critical alarms. Top 5 devices:")
for d in devices[:5]:
print(f"- {d.get('host-name')} ({d.get('system-ip')}) - {d.get('device-type')}")
sys.exit(0)
PY
# Make script executable
chmod +x /opt/nhprep/sdwan/sdwan_dashboard.py
# Run the script
/opt/nhprep/sdwan/sdwan_dashboard.py
What just happened: The script logs in to vManage using the same POST to /j_security_check, requests inventory and alarms, then summarizes counts. If any critical alarms exist, it prints details and returns exit code 1, which can be picked up by monitoring systems (Nagios, Icinga, Prometheus exporters). Architecturally, this script implements the “collector” role in a monitoring pipeline: it gathers, reduces, and signals.
Real-world note: Replace plaintext credentials with a secrets store (Vault) and run the script as a non‑privileged user. For production dashboards, push metrics into a time-series database rather than printing text.
Verify:
# Example expected output when a critical alarm exists
SD-WAN Dashboard Summary - NHPREP
Total devices: 5
Critical alarms: 1
Major alarms: 1
CRITICAL ALERTS:
- vedge-branch1 : Control connection to vSmart lost at 2026-04-02T09:12:34Z
# Exit code indicates an alert
echo $?
# Expected: 1
Step 5: Schedule the script to run periodically (simple polling)
What we are doing: Use cron to run the dashboard every 5 minutes for continuous monitoring. Scheduling ensures automated checks and consistent alerting cadence.
# Install crontab entry for root or monitoring user
# Example: run as root every 5 minutes
(crontab -l 2>/dev/null; echo "*/5 * * * * /opt/nhprep/sdwan/sdwan_dashboard.py >> /var/log/sdwan_dashboard.log 2>&1") | crontab -
# Verify cron entry
crontab -l
What just happened: We added a cron entry that runs the dashboard script every 5 minutes and appends output to /var/log/sdwan_dashboard.log. Cron is a simple scheduler — in production you may use systemd timers or a monitoring scheduler for better control and logging.
Verify:
# Show crontab entries
crontab -l
# Expected output (example)
*/5 * * * * /opt/nhprep/sdwan/sdwan_dashboard.py >> /var/log/sdwan_dashboard.log 2>&1
Verification Checklist
- Check 1: Authenticate to vManage manually and confirm cookie file contains JSESSIONID. Command:
cat cookie.txt— expected JSESSIONID line present. - Check 2: Device inventory returned by the API contains all devices. Command:
cat devices.json— expected JSON array with host-names and system-ip entries. - Check 3: Alarms API returns current alarm list with severity fields. Command:
cat alarms_filtered.json— expected entries showing severity CRITICAL or MAJOR if there are issues. - Check 4: Dashboard script exits with non-zero when a CRITICAL alarm exists. Command: run
/opt/nhprep/sdwan/sdwan_dashboard.pyandecho $?— expected exit code1for critical,0otherwise. - Check 5: Cron job is listed. Command:
crontab -l— expected cron entry for every 5 minutes.
Common Mistakes
| Symptom | Cause | Fix |
|---|---|---|
| Authentication fails; curl returns HTML login page | Wrong credentials or missing POST to /j_security_check | Verify username/password (admin / Lab@123) and re-run the POST. Check cookie file for JSESSIONID. |
| devices.json empty or truncated | Request blocked by missing cookie or TLS verification | Ensure you supply cookie file (-b cookie.txt) and use -k for lab self-signed certs, or install proper CA certs. |
| Script cannot parse JSON (ValueError) | API returned HTML error or session expired | Inspect raw API output (cat alarms.json). Re-authenticate before parsing; add error handling in script. |
| Cron runs but no output in log | Script lacks execute permission or environment PATH differences | Ensure chmod +x on script and use full interpreter path in cron, redirect stdout/stderr to log. |
| Repeated alerts for same persistent condition | Script does not deduplicate or acknowledge alarms | Implement deduplication logic or stateful suppression (record alert IDs and suppress repeats until cleared). |
Key Takeaways
- The SD‑WAN Manager provides REST APIs for inventory and alarm telemetry — use them to build custom dashboards and integrate with existing monitoring systems.
- Always authenticate correctly (session cookie / XSRF) before calling protected endpoints; treat API credentials like any other secret.
- Polling is simple but has scaling and timeliness tradeoffs — consider event or streaming APIs for large fabrics.
- Automate safely: use non‑privileged service accounts, secret managers for credentials, idempotent alerts, and sensible polling intervals to avoid alert storms.
Tip: For production dashboards, export parsed metrics to a TSDB (Prometheus, InfluxDB) and use visualization tools (Grafana) for long-term trending and SLA reporting.
This concludes Lesson 6: Building an SD‑WAN Dashboard. Use the sample script and techniques here as the foundation for richer dashboards, ticketing integrations, and SLA reports in real NHPREP deployments.