Device Monitoring via API
Objective
In this lesson you will learn how to programmatically query device status, overlay tunnel health (including BFD statistics), and interface statistics from an SD‑WAN Manager using its REST APIs. You will also enable a webhook on the SD‑WAN Manager to push alarms/events to a webhook consumer and verify the delivery. This matters in production because automated monitoring and alerting reduces mean time to detect and repair (MTTD/MTTR) and lets NOC workflows respond to degraded tunnels and device failures without manual CLI polling.
Topology
ASCII topology showing the management network and the webhook consumer. Every interface includes exact IPs.
Admin-PC (eth0: 10.10.10.100)
|
| 10.10.10.0/24
|
+----------------------+ +------------------------+
| SD-WAN Manager | | Webhook Server |
| host: lab.nhprep.com | | host: webhook.lab |
| eth0: 10.10.10.10 |--------| eth0: 10.10.10.20 |
| (API server) | 10.10.10.0/24 |
+----------------------+ +------------------------+
|
|
| 10.10.10.0/24
|
+----------------------+
| Edge Router |
| eth0: 10.10.10.1 |
+----------------------+
Device Table
| Device | Hostname / Domain | Interface | IP Address | Purpose |
|---|---|---|---|---|
| Admin PC | admin.lab.nhprep.com | eth0 | 10.10.10.100/24 | Run API calls and verification |
| SD‑WAN Manager | lab.nhprep.com | eth0 | 10.10.10.10/24 | API provider — Aggregation, Alarms, Webhooks |
| Webhook Server | webhook.lab.nhprep.com | eth0 | 10.10.10.20/24 | HTTP listener that consumes webhook events |
| Edge Router | edge1.lab.nhprep.com | eth0 | 10.10.10.1/24 | SD‑WAN edge device (source of tunnel/BFD stats) |
Credentials and naming conventions used in examples:
- Username: admin
- Password: Lab@123
- Organization: NHPREP
Tip: In production, the SD‑WAN Manager runs on a management network separated from data plane traffic. We use dedicated management IPs here to mirror that separation.
Introduction
In this lesson we will query device and tunnel health using the SD‑WAN Manager's Aggregation Query APIs and Alarms (Simple Query) APIs, and enable a webhook to consume alarm events on a listener server. Programmatic monitoring is essential in enterprise networks to detect degraded connectivity (latency, jitter, loss) and BFD failures on overlay tunnels and to integrate those events with downstream automation and ticketing systems.
Real-world scenario: A NOC dashboard polls SD‑WAN Manager daily for average tunnel latency and vQoE score; when the vQoE drops below a threshold, an alarm is delivered via webhook to an automation platform which opens a ticket and runs remedial actions.
Quick Recap
Refer back to the topology used in Lesson 1. No new physical devices are added in this lesson; we will use:
- SD‑WAN Manager: lab.nhprep.com (10.10.10.10)
- Webhook consumer: webhook.lab.nhprep.com (10.10.10.20)
- Admin PC running the API client: 10.10.10.100
Key Concepts
- Aggregation Query APIs: Used to retrieve aggregated telemetry such as BFD counters, latency/loss/jitter and computed vQoE for overlay tunnels. These APIs accept filter/query payloads and return JSON arrays of metrics across devices and tunnels.
- Protocol behavior: the SD‑WAN Manager aggregates stats it receives from edges and exposes them over HTTPS. The API client must request aggregation over a time window.
- BFD (Bidirectional Forwarding Detection): A fast failure detection mechanism for tunnels. When BFD on an overlay session detects remote unreachability, the SD‑WAN orchestrator will report that session's status and counters (session up/down, last change, packet loss).
- Packet flow: BFD runs periodic control packets between peers; missed packets beyond detection multiplier trigger failure.
- Alarms (Simple Query APIs): These return alarm records (category, severity, timestamp, consumed events) allowing you to correlate symptoms to underlying events.
- In production, alarms are used to drive incident processes.
- Webhooks: Push-style notifications where the SD‑WAN Manager POSTs JSON payloads to a configured external URL. This offloads constant polling and enables near-real-time automation.
- Real-world note: Ensure the webhook endpoint is reachable and secured (TLS and auth) to avoid lost events.
Step-by-step configuration
Step 1: Verify connectivity to the SD‑WAN Manager API
What we are doing: Ensure the admin workstation can reach the SD‑WAN Manager API endpoint; basic network reachability must be in place before sending REST requests.
# From Admin PC: ping the SD-WAN Manager
ping -c 3 10.10.10.10
What just happened: The ping command tests ICMP reachability between the Admin PC (10.10.10.100) and the SD‑WAN Manager (10.10.10.10). Without network connectivity, API calls will fail due to connection errors.
Real-world note: Management interfaces are often firewall-restricted — allow only admin subnets and automation hosts.
Verify:
# Expected output (complete)
PING 10.10.10.10 (10.10.10.10) 56(84) bytes of data.
64 bytes from 10.10.10.10: icmp_seq=1 ttl=64 time=1.23 ms
64 bytes from 10.10.10.10: icmp_seq=2 ttl=64 time=1.15 ms
64 bytes from 10.10.10.10: icmp_seq=3 ttl=64 time=1.10 ms
--- 10.10.10.10 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2042ms
rtt min/avg/max/mdev = 1.100/1.160/1.230/0.054 ms
Step 2: Query overlay tunnel BFD / Application Aware Routing statistics (Aggregation Query API)
What we are doing: Request aggregated BFD and tunnel metrics from the SD‑WAN Manager for Edge Router edge1.lab.nhprep.com (10.10.10.1) over the last 15 minutes. This provides average latency, loss, jitter and vQoE so you can detect degraded tunnels.
# POST to Aggregation Query API to request BFD/tunnel metrics
curl -s -u admin:Lab@123 -X POST "https://lab.nhprep.com/api/statistics/aggregation" \
-H "Content-Type: application/json" \
-d '{
"queryType": "tunnel_bfd_metrics",
"deviceIp": "10.10.10.1",
"timeRangeMinutes": 15,
"aggregation": ["avg_latency_ms","avg_loss_pct","avg_jitter_ms","vQoE"]
}'
What just happened: The curl command authenticates with basic credentials and asks the SD‑WAN Manager to aggregate tunnel/BFD metrics for the specified device and time window. The manager computes averages from telemetry submitted by the edge and returns a JSON array of metrics per tunnel.
Real-world note: Aggregation reduces telemetry noise and gives meaningful averages for SLA tracking in dashboards.
Verify:
# Expected JSON response (complete; example)
[
{
"deviceIp": "10.10.10.1",
"tunnelId": "tunnel-1001",
"remoteIp": "198.51.100.2",
"avg_latency_ms": 45.2,
"avg_loss_pct": 0.2,
"avg_jitter_ms": 8.1,
"vQoE": 4.3,
"bfdState": "up",
"lastChange": "2026-03-30T10:14:12Z"
},
{
"deviceIp": "10.10.10.1",
"tunnelId": "tunnel-1002",
"remoteIp": "198.51.100.3",
"avg_latency_ms": 120.6,
"avg_loss_pct": 1.5,
"avg_jitter_ms": 25.0,
"vQoE": 2.1,
"bfdState": "down",
"lastChange": "2026-03-30T10:05:50Z"
}
]
Step 3: Retrieve alarms for tunnel or BFD categories (Alarms Simple Query API)
What we are doing: Pull recent alarms for the category "tunnel" so you can correlate alarms to aggregation-derived anomalies (e.g., vQoE low, BFD down).
# POST to Simple Query Alarms API requesting recent tunnel-related alarms
curl -s -u admin:Lab@123 -X POST "https://lab.nhprep.com/api/alarms/simple" \
-H "Content-Type: application/json" \
-d '{
"category": "tunnel",
"severity": ["critical","major"],
"timeRangeMinutes": 60
}'
What just happened: The SD‑WAN Manager returns alarm records matching the filters. Each alarm contains metadata and references to consumed events that caused it. Alarms can be used to prioritize remediation.
Real-world note: Use alarm categories to drive routing of tickets to the correct NOC team (WAN vs Security vs Edge).
Verify:
# Expected JSON response (complete; example)
[
{
"alarmId": "alarm-9001",
"category": "tunnel",
"severity": "critical",
"description": "Overlay tunnel tunnel-1002 BFD down",
"deviceIp": "10.10.10.1",
"timestamp": "2026-03-30T10:05:50Z",
"consumedEvents": [
{
"eventId": "evt-5001",
"type": "bfd_session",
"details": "BFD state changed to down"
}
]
},
{
"alarmId": "alarm-9002",
"category": "tunnel",
"severity": "major",
"description": "High latency on tunnel-1001",
"deviceIp": "10.10.10.1",
"timestamp": "2026-03-30T09:55:10Z",
"consumedEvents": [
{
"eventId": "evt-5002",
"type": "latency_spike",
"details": "15min avg latency 120 ms"
}
]
}
]
Step 4: Enable a webhook on the SD‑WAN Manager to push alarms to an external endpoint
What we are doing: Configure the SD‑WAN Manager to POST alarm notifications to our webhook consumer at https://10.10.10.20:5000/hooks/alarm. This allows real‑time alarm delivery without polling.
# Register a webhook on the SD-WAN Manager
curl -s -u admin:Lab@123 -X POST "https://lab.nhprep.com/api/webhooks" \
-H "Content-Type: application/json" \
-d '{
"name": "NHPREP_alarm_sink",
"url": "https://10.10.10.20:5000/hooks/alarm",
"events": ["alarm.created","alarm.updated"],
"headers": {
"X-Org": "NHPREP"
},
"enabled": true
}'
What just happened: The SD‑WAN Manager saved a webhook subscription. Whenever an alarm matching the subscribed events is raised or updated, the manager will POST the alarm payload to the webhook URL. Headers (e.g., X-Org) let the receiver validate or tag events.
Real-world note: Always secure webhook endpoints with TLS and additional authentication (HMAC signatures) to prevent spoofing; this example uses basic registration for clarity.
Verify:
# Query list of configured webhooks to confirm registration
curl -s -u admin:Lab@123 "https://lab.nhprep.com/api/webhooks"
# Expected JSON response (complete; example)
[
{
"id": "wh-3001",
"name": "NHPREP_alarm_sink",
"url": "https://10.10.10.20:5000/hooks/alarm",
"events": ["alarm.created","alarm.updated"],
"headers": {"X-Org":"NHPREP"},
"enabled": true,
"lastDeliveryStatus": "unknown"
}
]
Step 5: Deploy a simple webhook consumer and verify reception of webhook deliveries
What we are doing: Start a minimal HTTP listener on the webhook server to accept POSTs from the SD‑WAN Manager. This demonstrates how the webhook payload appears and how to validate it before acting.
# Example: minimal Flask app to receive webhook events (run on webhook.lab 10.10.10.20)
# Save as webhook_receiver.py and run with: python3 webhook_receiver.py
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route('/hooks/alarm', methods=['POST'])
def alarm_hook():
payload = request.get_json()
print("Received webhook payload:", payload)
# respond quickly to acknowledge receipt
return jsonify({"status":"received"}), 200
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000, ssl_context='adhoc')
What just happened: The Flask app listens on port 5000 for HTTPS POST requests at /hooks/alarm and prints the JSON payload to stdout. In production this consumer would validate the payload, enrich it, and forward it to automation or ticketing systems.
Real-world note: Replace adhoc TLS with a proper certificate and implement authentication (HMAC/verify headers) before trusting or processing events.
Verify:
# Trigger a test alarm from the SD-WAN Manager (if the API supports test delivery), or send a sample event to the webhook to simulate delivery:
curl -s -u admin:Lab@123 -X POST "https://lab.nhprep.com/api/webhooks/wh-3001/test" -d '{}'
# Expected response from SD-WAN Manager if it supports test run:
{
"status": "test_triggered",
"delivery": {
"target": "https://10.10.10.20:5000/hooks/alarm",
"httpStatus": 200,
"body": {
"alarmId":"alarm-TEST",
"category":"tunnel",
"severity":"critical",
"description":"Test webhook from NHPREP"
}
}
}
# On webhook server stdout, expected printed output (complete; example)
Received webhook payload: {'alarmId': 'alarm-TEST', 'category': 'tunnel', 'severity': 'critical', 'description': 'Test webhook from NHPREP'}
Verification Checklist
- Check 1: Admin PC can reach SD‑WAN Manager — verify with ping and expect 0% packet loss.
- Check 2: Aggregation Query API returns BFD/tunnel metrics — verify a JSON array with avg_latency_ms, avg_loss_pct, avg_jitter_ms, vQoE and bfdState fields.
- Check 3: Simple Query Alarms API returns recent tunnel alarms — verify alarms contain alarmId, category, severity and consumedEvents arrays.
- Check 4: Webhook registration succeeded and test delivery reaches webhook server — verify webhook entry exists and webhook server prints received JSON payload.
Common Mistakes
| Symptom | Cause | Fix |
|---|---|---|
| API requests time out | Management network blocked by firewall or wrong IP | Verify network reachability (ping/traceroute) and open HTTPS (443) to lab.nhprep.com from admin host |
| Aggregation query returns empty array | Wrong deviceIp or time range too small; no telemetry reported | Confirm device IP (10.10.10.1) is registered and increase timeRangeMinutes or check edge telemetry |
| Webhook POSTs fail (connection refused) | Webhook server firewall or listener not running on port 5000 | Start the webhook consumer, ensure the host binds to 0.0.0.0:5000 and open firewall for incoming HTTPS |
| Webhook deliveries received but payloads unauthenticated | Missing signature/HMAC header validation | Implement signature verification; configure SD‑WAN Manager to include HMAC header or use mutual TLS |
Key Takeaways
- Use Aggregation Query APIs to obtain meaningful averages (latency, loss, jitter, vQoE) for overlay tunnels; these metrics help priortize remediation.
- BFD state is critical for fast failure detection; combine BFD state from aggregation with alarms to detect both sudden outages and slow degradations.
- Alarms (Simple Query APIs) provide the event context behind symptoms — always correlate alarms with telemetry before taking corrective action.
- Webhooks convert polling into push notifications, enabling real-time automation. In production secure webhook endpoints (TLS, authentication) and implement retry/backoff handling for deliveries.
Final note: This lesson demonstrated both pull-style (Aggregation and Alarms APIs) and push-style (Webhooks) integrations. In production networks you will typically combine both: pull periodic summaries for dashboards and rely on webhooks for real-time incidents and automated remediation.