Lesson 2 of 6

Anomaly Detection in Production

Objective

In this lesson you will verify telemetry readiness for anomaly detection in production networks: confirm telemetry connections, validate subscription counts, and interpret telemetry health so baseline learning and threshold-free alerting can run reliably. This matters because anomaly detection systems depend on continuous, complete telemetry streams — if telemetry is missing or subscriptions are invalid, the learning model cannot build an accurate baseline, and false positives/negatives increase. Real-world scenario: before enabling automated root-cause correlation for a campus or WAN, you must ensure all devices are streaming the correct telemetry and that the collector sees the expected subscriptions.

Quick Recap

Reference topology from Lesson 1 remains in place. This lesson adds no new devices or IP addresses; we use the same device telemetry endpoints. Key IPs used in verification below (exact from the reference material):

  • Telemetry collector/peer: 172.100.1.53
  • Device source address: 192.168.4.7

ASCII topology (showing the exact IPs on every interface used in this lesson):

Router/Device A (device sending telemetry)

  • Gi0/0 : 192.168.4.7

Telemetry Collector (central receiver)

  • Gi0/1 : 172.100.1.53

Simple ASCII diagram:

Router/Device A (Gi0/0: 192.168.4.7) | | IP network | Telemetry Collector (Gi0/1: 172.100.1.53)

Tip: The anomaly-detection pipeline requires both the device-side telemetry agent and the collector to maintain an active session. If either side drops, baseline learning pauses.

Device Table

DeviceRoleManagement IP
Router/Device ATelemetry data source192.168.4.7
Telemetry CollectorReceives telemetry, feeds anomaly detection172.100.1.53

Key Concepts

  • Telemetry connections: Telemetry is typically streamed from devices to a collector using persistent sessions. Protocol-level behavior: when a telemetry client connects, the device establishes a transport session to the collector (peer) and begins streaming configured data models. If the session is not active, no streaming occurs and anomaly detection has no data to learn from.
  • Subscriptions: A subscription defines the set of telemetry data (models, path, frequency) the collector expects. Devices export subscription state: total, valid, invalid, dynamic, configured. If subscriptions are invalid, the collector will not receive expected streams.
  • Baseline learning: Baseline (normal behavior) models are derived from historical telemetry. Baseline learning requires sustained, continuous, and representative telemetry. Gaps, invalid streams, or intermittent connectivity bias the baseline.
  • Threshold-free alerting & correlation: Modern anomaly detection often uses threshold-free methods (statistical baselines, ML). These methods correlate multiple telemetry signals; correlation needs coherent timestamps and complete streams from multiple sources to link events (e.g., increased interface errors + spike in latency = likely fault).
  • Verification practice: Use device show commands to confirm telemetry connections and subscription health before relying on anomaly-detection outputs.

Step-by-step configuration and verification

Step 1: Validate Telemetry Transport Connection

What we are doing: We verify the device has an active transport session to the telemetry collector. This confirms that the device can actually send telemetry to the collector and that the collector peer is reachable.

show telemetry connection all

What just happened: The command queries the device telemetry subsystem and reports active telemetry sessions. The output lists peer addresses, transport ports, VRF, the device source address used for the session, and the session state. An "Active" state with "Connection up" indicates the TCP/transport session is established and telemetry can flow.

Real-world note: In production, a common failure is a firewall or ACL blocking the telemetry port — verify connectivity if the state is not Active.

Verify:

Telemetry connections
Index Peer Address        Port VRF Source Address      State      State Description
----- ------------------- ----- --- -------------------  ---------- --------------------
109   172.100.1.53        25103 0   192.168.4.7          Active     Connection up
  • Expected: One or more entries showing the collector IP 172.100.1.53 with State "Active" and "Connection up".
  • If you see "Connecting" or no entries for 172.100.1.53, the device is not establishing the telemetry transport.

Step 2: Check IETF Telemetry Subscription Summary

What we are doing: We check the subscription summary to confirm the number of telemetry subscriptions the device reports as configured and valid. Subscriptions are the logical definitions of what data to stream.

show telemetry ietfsubscription summary

What just happened: This command reports the maximum supported subscriptions and counts for All/Dynamic/Configured/Permanent subscriptions and their validity. A match between "Total" and "Valid" indicates all configured subscriptions are valid; invalid subscription counts (non-zero) imply the device is not sending some of the requested data paths.

Real-world note: Some telemetry paths require specific feature licenses or enabled feature modules; an invalid subscription can point to a missing feature on the device.

Verify:

Subscription Summary
====================
Maximum supported: 128
Subscription   Total  Valid  Invalid
-----------------------------------------------
All            112    112    0
Dynamic        0      0      0
Configured     112    112    0
Permanent      0      0      0
Active–All good
Connecting –Cert/FW issue
N/A –Telemetry config missing
  • Expected: "Configured" Total equals Valid, and Invalid is 0 (e.g., 112 112 0). Any "Invalid" subscriptions should be investigated.

Step 3: Interpret Telemetry Health for Baseline Learning

What we are doing: We interpret the outputs from Steps 1–2 to decide if baseline learning can proceed. Baseline learning requires both active connections and valid subscriptions. This step is analysis rather than a config change, but it's essential before enabling anomaly detection models.

show telemetry connection all
show telemetry ietfsubscription summary

What just happened: Running both commands together confirms both transport and logical subscription health. If both show healthy results (Active + all subscriptions valid), the device is ready to feed the anomaly-detection engine and build baselines. If either is unhealthy, baseline learning will be incomplete or corrupted.

Real-world note: Baseline learning is time-dependent. Even with healthy telemetry, allow sufficient time (hours to days depending on variability) for the model to gather representative samples across daily/weekly cycles.

Verify:

Telemetry connections
Index Peer Address        Port VRF Source Address      State      State Description
----- ------------------- ----- --- -------------------  ---------- --------------------
109   172.100.1.53        25103 0   192.168.4.7          Active     Connection up

Subscription Summary
====================
Maximum supported: 128
Subscription   Total  Valid  Invalid
-----------------------------------------------
All            112    112    0
Dynamic        0      0      0
Configured     112    112    0
Permanent      0      0      0
Active–All good
Connecting –Cert/FW issue
N/A –Telemetry config missing
  • Expected: the same healthy outputs shown above. This is your "green light" for baseline learning.

Step 4: Force Telemetry Configuration Push (Controller-side Action) and Re-verify

What we are doing: If the device shows stale or missing subscriptions, a forced configuration update from the collector/controller can push the correct telemetry config to the device. This is typically done from the inventory or configuration management interface on the collector/controller; after forcing a push, re-verify the device state.

! (Note: The push action is done from the collector/controller UI; perform the UI action)
! Force push telemetry config from Inventory > Actions > Telemetry > Update Telemetry Settings
show telemetry connection all
show telemetry ietfsubscription summary

What just happened: The UI-driven "force push" causes the collector to send or reconcile telemetry subscription configurations to the device. On completion, the device should reflect any new or corrected subscriptions and re-establish any missing transports. Re-running the show commands verifies the effect: transport should become Active and subscription Invalid counts should drop to zero.

Real-world note: Many production environments automate config pushes through orchestration tools. A manual force is a common troubleshooting step to fix certificate, firewall, or config drift issues quickly.

Verify:

Telemetry connections
Index Peer Address        Port VRF Source Address      State      State Description
----- ------------------- ----- --- -------------------  ---------- --------------------
109   172.100.1.53        25103 0   192.168.4.7          Active     Connection up

Subscription Summary
====================
Maximum supported: 128
Subscription   Total  Valid  Invalid
-----------------------------------------------
All            112    112    0
Dynamic        0      0      0
Configured     112    112    0
Permanent      0      0      0
Active–All good
Connecting –Cert/FW issue
N/A –Telemetry config missing
  • Expected: After the push, the device shows Active connection and all subscriptions Valid.

Step 5: Correlate Telemetry Health to Anomaly Alerts

What we are doing: We confirm that telemetry health maps to a reliable anomaly-detection pipeline. With the device streaming properly, anomaly detection systems can correlate signals. This step documents the translation: good telemetry → valid baselines → accurate anomaly and correlation outputs.

show telemetry connection all
show telemetry ietfsubscription summary

What just happened: Re-checking ensures the previously observed healthy state persists. Once stable for a reasonable observation window, the anomaly-detection engine has the data needed to run unsupervised detection, clustering, and cross-signal correlation.

Real-world note: If anomalies appear immediately after bringing up telemetry, verify that the initial baseline period has completed — early alerts can be transient while models stabilize.

Verify:

Telemetry connections
Index Peer Address        Port VRF Source Address      State      State Description
----- ------------------- ----- --- -------------------  ---------- --------------------
109   172.100.1.53        25103 0   192.168.4.7          Active     Connection up

Subscription Summary
====================
Maximum supported: 128
Subscription   Total  Valid  Invalid
-----------------------------------------------
All            112    112    0
Dynamic        0      0      0
Configured     112    112    0
Permanent      0      0      0
Active–All good
Connecting –Cert/FW issue
N/A –Telemetry config missing
  • Expected: Stable "Active" connection and zero invalid subscriptions across your observation window.

Verification Checklist

  • Check 1: Telemetry transport is active to collector 172.100.1.53 — verify with show telemetry connection all and expect "Active Connection up".
  • Check 2: All telemetry subscriptions are valid — verify with show telemetry ietfsubscription summary and expect configured=valid (e.g., 112 112 0).
  • Check 3: Baseline learning can proceed — observe stable telemetry for the required learning window; confirm anomaly system logs indicate baseline status (use the anomaly system UI/logs on the collector side).

Common Mistakes

SymptomCauseFix
Telemetry state shows "Connecting" or no entry for 172.100.1.53Transport blocked by firewall/ACL, routing issue, or certificate failureVerify connectivity (ping, routing), open telemetry port (e.g., 25103) on network devices, and confirm certificates on both ends
Subscription Summary shows Invalid > 0Subscription paths reference unsupported features or missing feature licenses on the deviceEnable required features or adjust subscriptions to supported telemetry paths
Subscriptions present but data missing / sporadicIntermittent transport or oversubscribing (device CPU/memory limits)Check device resource usage, reduce subscription sampling rates, or redistribute telemetry load
Immediate high number of anomalies after enabling detectionBaseline learning period not completed; transient anomalies recorded as baselineAllow full learning window (hours/days depending on service); mute initial alerts or use rollout policy
Configuration changes on device not reflectedCollector/controller config not pushed or reconciledForce update from collector (Inventory > Actions > Telemetry > Update Telemetry Settings), then re-verify show telemetry outputs

Key Takeaways

  • Always verify both transport and subscription health before trusting anomaly-detection outputs: "Active" transport + Valid subscriptions = readiness to learn baselines.
  • Baseline learning is data-driven and time-dependent; ensure continuous telemetry streams over the full observation window to avoid biased baselines.
  • Correlation and threshold-free alerting require coherent, complete telemetry from multiple data sources; partial telemetry leads to poor correlation and noisy alerts.
  • Troubleshooting telemetry commonly involves network connectivity, ACL/firewall checks, feature support checks, and controller-side config pushes.

Final tip: Build a verification routine (the two show commands in this lesson) into your day-one checklist whenever you enable or modify telemetry for an environment that will use anomaly-detection — this prevents misleading alerts and supports reliable, automated remediation.