Anomaly Detection in Production
Objective
In this lesson you will verify telemetry readiness for anomaly detection in production networks: confirm telemetry connections, validate subscription counts, and interpret telemetry health so baseline learning and threshold-free alerting can run reliably. This matters because anomaly detection systems depend on continuous, complete telemetry streams — if telemetry is missing or subscriptions are invalid, the learning model cannot build an accurate baseline, and false positives/negatives increase. Real-world scenario: before enabling automated root-cause correlation for a campus or WAN, you must ensure all devices are streaming the correct telemetry and that the collector sees the expected subscriptions.
Quick Recap
Reference topology from Lesson 1 remains in place. This lesson adds no new devices or IP addresses; we use the same device telemetry endpoints. Key IPs used in verification below (exact from the reference material):
- Telemetry collector/peer: 172.100.1.53
- Device source address: 192.168.4.7
ASCII topology (showing the exact IPs on every interface used in this lesson):
Router/Device A (device sending telemetry)
- Gi0/0 : 192.168.4.7
Telemetry Collector (central receiver)
- Gi0/1 : 172.100.1.53
Simple ASCII diagram:
Router/Device A (Gi0/0: 192.168.4.7) | | IP network | Telemetry Collector (Gi0/1: 172.100.1.53)
Tip: The anomaly-detection pipeline requires both the device-side telemetry agent and the collector to maintain an active session. If either side drops, baseline learning pauses.
Device Table
| Device | Role | Management IP |
|---|---|---|
| Router/Device A | Telemetry data source | 192.168.4.7 |
| Telemetry Collector | Receives telemetry, feeds anomaly detection | 172.100.1.53 |
Key Concepts
- Telemetry connections: Telemetry is typically streamed from devices to a collector using persistent sessions. Protocol-level behavior: when a telemetry client connects, the device establishes a transport session to the collector (peer) and begins streaming configured data models. If the session is not active, no streaming occurs and anomaly detection has no data to learn from.
- Subscriptions: A subscription defines the set of telemetry data (models, path, frequency) the collector expects. Devices export subscription state: total, valid, invalid, dynamic, configured. If subscriptions are invalid, the collector will not receive expected streams.
- Baseline learning: Baseline (normal behavior) models are derived from historical telemetry. Baseline learning requires sustained, continuous, and representative telemetry. Gaps, invalid streams, or intermittent connectivity bias the baseline.
- Threshold-free alerting & correlation: Modern anomaly detection often uses threshold-free methods (statistical baselines, ML). These methods correlate multiple telemetry signals; correlation needs coherent timestamps and complete streams from multiple sources to link events (e.g., increased interface errors + spike in latency = likely fault).
- Verification practice: Use device show commands to confirm telemetry connections and subscription health before relying on anomaly-detection outputs.
Step-by-step configuration and verification
Step 1: Validate Telemetry Transport Connection
What we are doing: We verify the device has an active transport session to the telemetry collector. This confirms that the device can actually send telemetry to the collector and that the collector peer is reachable.
show telemetry connection all
What just happened: The command queries the device telemetry subsystem and reports active telemetry sessions. The output lists peer addresses, transport ports, VRF, the device source address used for the session, and the session state. An "Active" state with "Connection up" indicates the TCP/transport session is established and telemetry can flow.
Real-world note: In production, a common failure is a firewall or ACL blocking the telemetry port — verify connectivity if the state is not Active.
Verify:
Telemetry connections
Index Peer Address Port VRF Source Address State State Description
----- ------------------- ----- --- ------------------- ---------- --------------------
109 172.100.1.53 25103 0 192.168.4.7 Active Connection up
- Expected: One or more entries showing the collector IP 172.100.1.53 with State "Active" and "Connection up".
- If you see "Connecting" or no entries for 172.100.1.53, the device is not establishing the telemetry transport.
Step 2: Check IETF Telemetry Subscription Summary
What we are doing: We check the subscription summary to confirm the number of telemetry subscriptions the device reports as configured and valid. Subscriptions are the logical definitions of what data to stream.
show telemetry ietfsubscription summary
What just happened: This command reports the maximum supported subscriptions and counts for All/Dynamic/Configured/Permanent subscriptions and their validity. A match between "Total" and "Valid" indicates all configured subscriptions are valid; invalid subscription counts (non-zero) imply the device is not sending some of the requested data paths.
Real-world note: Some telemetry paths require specific feature licenses or enabled feature modules; an invalid subscription can point to a missing feature on the device.
Verify:
Subscription Summary
====================
Maximum supported: 128
Subscription Total Valid Invalid
-----------------------------------------------
All 112 112 0
Dynamic 0 0 0
Configured 112 112 0
Permanent 0 0 0
Active–All good
Connecting –Cert/FW issue
N/A –Telemetry config missing
- Expected: "Configured" Total equals Valid, and Invalid is 0 (e.g., 112 112 0). Any "Invalid" subscriptions should be investigated.
Step 3: Interpret Telemetry Health for Baseline Learning
What we are doing: We interpret the outputs from Steps 1–2 to decide if baseline learning can proceed. Baseline learning requires both active connections and valid subscriptions. This step is analysis rather than a config change, but it's essential before enabling anomaly detection models.
show telemetry connection all
show telemetry ietfsubscription summary
What just happened: Running both commands together confirms both transport and logical subscription health. If both show healthy results (Active + all subscriptions valid), the device is ready to feed the anomaly-detection engine and build baselines. If either is unhealthy, baseline learning will be incomplete or corrupted.
Real-world note: Baseline learning is time-dependent. Even with healthy telemetry, allow sufficient time (hours to days depending on variability) for the model to gather representative samples across daily/weekly cycles.
Verify:
Telemetry connections
Index Peer Address Port VRF Source Address State State Description
----- ------------------- ----- --- ------------------- ---------- --------------------
109 172.100.1.53 25103 0 192.168.4.7 Active Connection up
Subscription Summary
====================
Maximum supported: 128
Subscription Total Valid Invalid
-----------------------------------------------
All 112 112 0
Dynamic 0 0 0
Configured 112 112 0
Permanent 0 0 0
Active–All good
Connecting –Cert/FW issue
N/A –Telemetry config missing
- Expected: the same healthy outputs shown above. This is your "green light" for baseline learning.
Step 4: Force Telemetry Configuration Push (Controller-side Action) and Re-verify
What we are doing: If the device shows stale or missing subscriptions, a forced configuration update from the collector/controller can push the correct telemetry config to the device. This is typically done from the inventory or configuration management interface on the collector/controller; after forcing a push, re-verify the device state.
! (Note: The push action is done from the collector/controller UI; perform the UI action)
! Force push telemetry config from Inventory > Actions > Telemetry > Update Telemetry Settings
show telemetry connection all
show telemetry ietfsubscription summary
What just happened: The UI-driven "force push" causes the collector to send or reconcile telemetry subscription configurations to the device. On completion, the device should reflect any new or corrected subscriptions and re-establish any missing transports. Re-running the show commands verifies the effect: transport should become Active and subscription Invalid counts should drop to zero.
Real-world note: Many production environments automate config pushes through orchestration tools. A manual force is a common troubleshooting step to fix certificate, firewall, or config drift issues quickly.
Verify:
Telemetry connections
Index Peer Address Port VRF Source Address State State Description
----- ------------------- ----- --- ------------------- ---------- --------------------
109 172.100.1.53 25103 0 192.168.4.7 Active Connection up
Subscription Summary
====================
Maximum supported: 128
Subscription Total Valid Invalid
-----------------------------------------------
All 112 112 0
Dynamic 0 0 0
Configured 112 112 0
Permanent 0 0 0
Active–All good
Connecting –Cert/FW issue
N/A –Telemetry config missing
- Expected: After the push, the device shows Active connection and all subscriptions Valid.
Step 5: Correlate Telemetry Health to Anomaly Alerts
What we are doing: We confirm that telemetry health maps to a reliable anomaly-detection pipeline. With the device streaming properly, anomaly detection systems can correlate signals. This step documents the translation: good telemetry → valid baselines → accurate anomaly and correlation outputs.
show telemetry connection all
show telemetry ietfsubscription summary
What just happened: Re-checking ensures the previously observed healthy state persists. Once stable for a reasonable observation window, the anomaly-detection engine has the data needed to run unsupervised detection, clustering, and cross-signal correlation.
Real-world note: If anomalies appear immediately after bringing up telemetry, verify that the initial baseline period has completed — early alerts can be transient while models stabilize.
Verify:
Telemetry connections
Index Peer Address Port VRF Source Address State State Description
----- ------------------- ----- --- ------------------- ---------- --------------------
109 172.100.1.53 25103 0 192.168.4.7 Active Connection up
Subscription Summary
====================
Maximum supported: 128
Subscription Total Valid Invalid
-----------------------------------------------
All 112 112 0
Dynamic 0 0 0
Configured 112 112 0
Permanent 0 0 0
Active–All good
Connecting –Cert/FW issue
N/A –Telemetry config missing
- Expected: Stable "Active" connection and zero invalid subscriptions across your observation window.
Verification Checklist
- Check 1: Telemetry transport is active to collector 172.100.1.53 — verify with
show telemetry connection alland expect "Active Connection up". - Check 2: All telemetry subscriptions are valid — verify with
show telemetry ietfsubscription summaryand expect configured=valid (e.g., 112 112 0). - Check 3: Baseline learning can proceed — observe stable telemetry for the required learning window; confirm anomaly system logs indicate baseline status (use the anomaly system UI/logs on the collector side).
Common Mistakes
| Symptom | Cause | Fix |
|---|---|---|
| Telemetry state shows "Connecting" or no entry for 172.100.1.53 | Transport blocked by firewall/ACL, routing issue, or certificate failure | Verify connectivity (ping, routing), open telemetry port (e.g., 25103) on network devices, and confirm certificates on both ends |
| Subscription Summary shows Invalid > 0 | Subscription paths reference unsupported features or missing feature licenses on the device | Enable required features or adjust subscriptions to supported telemetry paths |
| Subscriptions present but data missing / sporadic | Intermittent transport or oversubscribing (device CPU/memory limits) | Check device resource usage, reduce subscription sampling rates, or redistribute telemetry load |
| Immediate high number of anomalies after enabling detection | Baseline learning period not completed; transient anomalies recorded as baseline | Allow full learning window (hours/days depending on service); mute initial alerts or use rollout policy |
| Configuration changes on device not reflected | Collector/controller config not pushed or reconciled | Force update from collector (Inventory > Actions > Telemetry > Update Telemetry Settings), then re-verify show telemetry outputs |
Key Takeaways
- Always verify both transport and subscription health before trusting anomaly-detection outputs: "Active" transport + Valid subscriptions = readiness to learn baselines.
- Baseline learning is data-driven and time-dependent; ensure continuous telemetry streams over the full observation window to avoid biased baselines.
- Correlation and threshold-free alerting require coherent, complete telemetry from multiple data sources; partial telemetry leads to poor correlation and noisy alerts.
- Troubleshooting telemetry commonly involves network connectivity, ACL/firewall checks, feature support checks, and controller-side config pushes.
Final tip: Build a verification routine (the two show commands in this lesson) into your day-one checklist whenever you enable or modify telemetry for an environment that will use anomaly-detection — this prevents misleading alerts and supports reliable, automated remediation.