Building a Telemetry Pipeline
Objective
Build an end-to-end telemetry pipeline: device → collector → database → dashboard. In this lesson you will configure a Model-Driven Telemetry push subscription on a Cisco IOS XE device, validate secure gRPC connectivity used by telemetry, and verify that telemetry traffic reaches the collector. This matters in production because telemetry replaces slow polling with real-time, high-fidelity streaming of operational state (CPU, interface counters, events), enabling rapid troubleshooting, capacity planning, and automation in large networks.
Real-world scenario: In a campus with hundreds of access switches and several distribution/core routers, operations teams push YANG-modeled telemetry from devices to a central collector (mTLS-protected gRPC), which decodes GPB (protobuf) payloads into a time-series database and dashboards. Alerts and capacity trends are derived from that data.
Quick Recap
We use the same topology from Lesson 1. This lesson does not add new routers, but introduces a telemetry collector host and shows service ports used by telemetry and gNOI.
ASCII topology (interfaces show exact IPs used in the reference material):
[IOS-XE Device] [Collector Host]
Device name: ios-xe-1 Host name: collector-1
mgmt0 / Gi0/0: 10.85.134.92/24 eth0: 10.1.1.3/24
gNOI service: 10.85.134.92:9339 telemetry receiver: 10.1.1.3:57500
10.85.134.92:9339 <---- TCP/gRPC ----> collector (listening at 10.1.1.3:57500)
Device table
| Device | Role | Management IP |
|---|---|---|
| ios-xe-1 | Telemetry publisher (IOS XE) | 10.85.134.92 |
| collector-1 | Telemetry collector / receiver | 10.1.1.3 |
IP addressing
| Interface | IP Address |
|---|---|
| ios-xe-1 mgmt0 / Gi0/0 | 10.85.134.92 |
| collector-1 eth0 | 10.1.1.3 |
| telemetry receiver port | 57500 |
| gNOI service port on device | 9339 |
Key Concepts
-
Model-Driven Telemetry (MDT) push model: Devices open a gRPC (or SSH/HTTP2) session to a receiver and stream YANG-modeled data periodically or on-change, rather than a collector polling the device. In production, push reduces polling load and network control-plane overhead.
-
Encodings (GPB / protobuf): Encodings define how structured YANG data is serialized. GPB (Google Protocol Buffers) is compact and efficient for high-frequency telemetry (used in streaming to collectors). Think of encodings as "file formats" for the same data — CSV vs JSON vs binary protobuf.
-
Update-policy: periodic vs on-change: Periodic sends samples at a fixed interval (e.g., 60000 ms). On-change sends only when the data changes (useful for rare events). Periodic is used when regular sampling for trending is needed; on-change reduces volume for infrequent events.
-
Minimum interval and timing: The system enforces a minimum periodic interval (the reference notes 100 centiseconds = 1 second). Choose intervals in production to balance resolution and bandwidth/cpu cost.
-
Security: mTLS and certificates: Telemetry transport often uses mTLS (mutual TLS) for authentication and confidentiality. The gNOI examples in the reference illustrate using certificates to authenticate and manage device services. In production, certificate management is required for secure telemetry.
Step-by-step configuration
Step 1: Configure a YANG push telemetry subscription on the IOS XE device
What we are doing: Create a Model-Driven Telemetry subscription that pushes CPU utilization data (from the ios-xe YANG operational model) to the collector at 10.1.1.3:57500 using gRPC and GPB encoding. This is the core configuration that makes the device initiate telemetry to the collector.
configure terminal
telemetry ietfsubscription 1
encoding encode-kvgpb
filter xpath process-cpu-ios-xe-oper:cpu-usage/cpu-utilization
stream yang-push
update-policy periodic 60000
receiver ipaddress 10.1.1.3 57500
protocol grpc-tcp
end
write memory
What just happened:
telemetry ietfsubscription 1created subscription instance 1.encoding encode-kvgpbselected GPB encoding (compact protobuf binary) for efficient transport.filter xpath ...tells the device which YANG path to sample (CPU utilization from the ios-xe operational YANG module). The device will read that path each sample interval.stream yang-pushandupdate-policy periodic 60000create a push stream that sends data every 60,000 milliseconds (60 seconds). The platform enforces a minimum interval of 1 second (100 centiseconds) if configured smaller.receiver ipaddress 10.1.1.3 57500 protocol grpc-tcpdirects the device to open a gRPC/TCP session to the collector at 10.1.1.3 port 57500 and transmit the GPB payloads.
Real-world note: Use GPB (protobuf) for high-frequency telemetry in production because it reduces bandwidth and CPU cost on both device and collector compared to verbose formats like JSON or XML.
Verify:
show running-config | section telemetry
Expected output (subscription block must appear exactly):
telemetry ietfsubscription 1
encoding encode-kvgpb
filter xpath process-cpu-ios-xe-oper:cpu-usage/cpu-utilization
stream yang-push
update-policy periodic 60000
receiver ipaddress 10.1.1.3 57500
protocol grpc-tcp
Step 2: Validate gRPC / gNOI reachability and certificate-based connectivity to the device
What we are doing: Use the gNOI OS client to verify secure gRPC connectivity to the device's management gNOI service (10.85.134.92:9339). This demonstrates certificate-based client connectivity that mirrors telemetry mTLS flows — verifying connectivity to the device is a prerequisite for a live telemetry stream.
cd ~/certs
gnoi_os -insecure -target_addr 10.85.134.92:9339 -op verify -target_name c9300 -alsologtostderr -cert ./client.crt -ca ./rootCA.pem -key ./rootCA.key
What just happened:
- The
gnoi_osclient attempted a gNOI OS "verify" RPC to the device at 10.85.134.92:9339 using client certificates. gNOI runs over gRPC; the example uses certificates for TLS (mTLS) and demonstrates the same certificate-based auth model used for secure telemetry collectors. Successful verify shows that certificate trust and network connectivity for gRPC are functional.
Real-world note: Administrators commonly validate gRPC/mTLS connectivity using gNOI/gNMI clients before enabling telemetry to ensure certificate chains and firewall rules allow the control-plane handshake.
Verify:
gnoi_os -insecure -target_addr 10.85.134.92:9339 -op verify -target_name c9300 -alsologtostderr -cert ./client.crt -ca ./rootCA.pem -key ./rootCA.key
Expected output:
Running OS version: 17.05.01.0.144.1617180620
(Exact version text is from the device's installed image and confirms successful gNOI RPC and TLS handshake.)
Step 3: Observe telemetry traffic arriving at the collector (network-level verification)
What we are doing: At the collector host, capture packets on the telemetry receiver port (57500) to verify the device is opening a gRPC/TCP connection and sending telemetry payloads. This step validates network reachability and that the device actively connects to the collector.
tcpdump -n -i eth0 port 57500 -c 20
What just happened:
tcpdumplistens on the collector interface (eth0) and captures the first 20 packets matching port 57500. You should observe the three-way TCP handshake from source 10.85.134.92 to destination 10.1.1.3:57500, followed by TLS handshake frames and application data (gRPC). This proves the telemetry push session is established.
Real-world note: If you see TCP SYNs but no TLS handshake, check firewall rules or server certificate acceptance; if TLS starts then stops, look at certificate trust or authorization on the collector.
Verify:
tcpdump -n -i eth0 port 57500 -c 20
Sample expected output (example lines you should see):
15:03:12.123456 IP 10.85.134.92.49212 > 10.1.1.3.57500: Flags [S], seq 100000000, win 64240, options [mss 1460,sackOK,TS val 123456 ecr 0,nop,wscale 7], length 0
15:03:12.123678 IP 10.1.1.3.57500 > 10.85.134.92.49212: Flags [S.], seq 200000000, ack 100000001, win 65535, options [mss 1460,sackOK,TS val 654321 ecr 123456], length 0
15:03:12.123789 IP 10.85.134.92.49212 > 10.1.1.3.57500: Flags [.], ack 1, win 64224, length 0
15:03:12.124012 IP 10.85.134.92.49212 > 10.1.1.3.57500: Flags [P.], seq 1:512, ack 1, win 64224, length 511
15:03:12.124050 IP 10.1.1.3.57500 > 10.85.134.92.49212: Flags [.], ack 512, win 65535, length 0
15:03:12.124200 IP 10.85.134.92.49212 > 10.1.1.3.57500: Flags [P.], seq 512:1024, ack 1, win 64224, length 512
Interpretation:
- The TCP SYN/SYN-ACK/ACK indicates the device initiated a TCP connection to the collector.
- Subsequent packets containing application data (length > 0 after the handshake) correspond to the TLS and gRPC payload — that is the telemetry stream.
Step 4: Decode, store and visualize (conceptual / pipeline explanation)
What we are doing: Describe the collector responsibilities: accept the gRPC GPB stream, decode GPB into structured metrics, write time-series fields to a database (e.g., InfluxDB, Timescale), and feed a dashboard (e.g., Grafana). This step is about where the device data goes after it reaches the collector.
- Collector must implement YANG-to-GPB parsing and support the device-specific schema (e.g., ios-xe models).
- Collector maps YANG leaves to database measurement names and timestamps.
- Dashboards query the database and visualize CPU utilization trends, thresholds, and alerts.
Real-world note: In production, collector design must consider rate limiting, schema evolution, and durable buffering (injected telemetry can overwhelm a collector during a network event).
Verify: (At minimum, confirm collector decoded a sample and stored it by querying the database index.)
# Example: query the collector's DB for the latest cpu-utilization sample (syntax varies by DB)
# This is a conceptual verification; your collector and DB will provide exact query commands.
Expected result:
- A recent timestamped CPU utilization metric for device 10.85.134.92 is present in the database with the sample value and metadata (encoding gpB, subscription id 1, path).
Verification Checklist
- Check 1: The IOS XE device has subscription 1 configured — verify with
show running-config | section telemetryand confirm the GPB/gRPC receiver line for 10.1.1.3:57500. - Check 2: gRPC connectivity to the device is functional — run the gNOI verify client and expect the device to respond with its running OS version string.
- Check 3: Collector receives telemetry traffic — use a packet capture on collector eth0 port 57500 and confirm TCP handshake and application payloads from 10.85.134.92.
- Check 4: Collector decodes at least one GPB payload and stores a time-series sample for device 10.85.134.92 (verify via your DB query).
Common Mistakes
| Symptom | Cause | Fix |
|---|---|---|
| No TCP connection from device to collector | Collector IP/port unreachable (routing or firewall) | Ensure collector is reachable from device mgmt plane; open port 57500 and verify routing. |
| TLS handshake fails (connection resets after SYN/ACK) | Certificate trust or wrong certificate/key used by client or server | Confirm client and server certs, CA chain, and that the collector trusts the device certificate (mTLS). |
| Telemetry data appears but fields are empty or unreadable | Wrong encoding selected or collector doesn't support GPB for this YANG | Ensure encoding encode-kvgpb matches collector decoding capability, or switch to a supported encoding. |
| Subscription present but no telemetry after configure | Device cannot initiate gRPC due to missing gRPC service or platform restriction | Verify device gRPC capability and that device software supports MDT YANG push; check CPU and telemetry service status. |
| Excessive bandwidth from telemetry | Very short periodic interval configured | Increase update-policy periodic interval (e.g., > 60000 ms) or move to on-change for sparse updates. |
Key Takeaways
- Model-Driven Telemetry (push) lets devices stream YANG-modeled data to a collector over gRPC, reducing polling load and enabling real-time monitoring. Remember the push vs pull analogy: push is a device streaming updates; pull is periodic polling by the collector.
- Use efficient encodings (GPB/protobuf) for high-frequency streams; ensure the collector supports the selected encoding and YANG schema.
- Secure telemetry with mTLS and certificates — validating gRPC connectivity (gNOI/gNMI clients) is a practical pre-flight check before enabling production streams.
- Always verify at multiple layers: device config, network connectivity (TCP handshake), TLS negotiation, and collector-level decoding/storage. In production, run scale tests to ensure the collector and database can handle the ingest rate.
Tip: Think of telemetry subscriptions like a scheduled delivery service: the device is the sender, the collector is the receiver/warehouse, encodings are the packaging format, and update-policy controls how often the courier shows up. Proper packaging (encoding) and secure credentials ensure the shipment arrives intact and usable.
If you want, I can provide a sample collector configuration and a DB ingestion mapping for GPB-decoded ios-xe cpu metrics (example mapping for InfluxDB + Grafana), using the exact telemetry filter and ports we used here.