Cisco Multivendor Telemetry Collection Solution

Introduction

Modern enterprise networks are rarely built on a single vendor's equipment. Routers from one manufacturer sit alongside switches from another, firewalls from a third, and wireless controllers from yet another. When it comes time to collect multivendor telemetry from all of these devices, engineers face a serious challenge: how do you unify data collection across platforms that speak different protocols, use different data models, and expose different metrics? The answer lies in a purpose-built collection architecture that abstracts the complexity of individual device protocols and presents a single, scalable interface for gathering and distributing network telemetry data.

This article explores how a centralized telemetry collection solution works in multivendor environments. We will cover the architecture and deployment models, walk through every supported collection protocol, explain the anatomy of collection job payloads, break down monitoring and troubleshooting workflows, and examine the powerful "collect once, distribute many" optimization paradigm. Whether you are building a custom automation pipeline or integrating with existing network management applications, this guide provides the technical depth you need to design and operate a production-grade network telemetry collection system.

By the end of this article, you will understand how to create collection jobs via both GUI and API, how to structure payloads for different sensor types, how to monitor collection and distribution health independently, and how to consume collected data in downstream applications using messaging buses like Kafka and gRPC.

What Is Multivendor Telemetry Collection?

At its core, multivendor telemetry collection is the process of gathering operational data from network devices made by different vendors through a single, unified collection layer. Rather than deploying separate collectors for each vendor's equipment or protocol, a centralized solution normalizes the collection workflow so that engineers interact with one set of APIs and one management interface regardless of the underlying device type.

The Problem It Solves

In a typical enterprise network, you might need to collect interface counters from Cisco IOS-XE routers using Model-Driven Telemetry (MDT), pull CPU utilization from third-party switches using SNMP, and gather routing table information from another vendor's equipment using gNMI. Without a unified collection layer, each of these workflows requires its own tooling, its own monitoring, and its own integration with downstream analytics platforms.

A centralized telemetry collection solution addresses this by providing:

Protocol abstraction — support for SNMP, MDT, gNMI, CLI, SNMP TRAP, and syslog through a single platform
Vendor-agnostic collection — the ability to collect from both Cisco and third-party devices
Unified API and GUI — a single interface for creating, managing, and monitoring collection jobs
Scalable distribution — collected data flows to downstream applications through standard messaging buses

Beyond Telemetry

Although streaming telemetry was the original use case for this type of collection architecture, its utility extends well beyond real-time metrics gathering. The same collection infrastructure serves three distinct categories of network data needs:

Use Case Category	Description	Protocols Used
Telemetry Systems	Real-time operational metrics and counters	gNMI, MDT, SNMP (poll)
Inventory Systems	Device and component inventory data	SNMP, CLI, gNMI
Monitoring Systems	Network event detection and alerting	SNMP TRAP, Syslog

This "single collection point" philosophy means that one deployment can satisfy telemetry, inventory, and monitoring requirements simultaneously, reducing operational overhead and eliminating redundant collection infrastructure.

Architecture of the Multivendor Telemetry Collection Solution

The collection architecture is built around a decoupled design where the collection layer operates independently from the application layer. This separation is critical for scalability and resilience.

Core Components

The architecture consists of several key components working together:

Infrastructure Layer — provides the management plane, including the GUI and API interfaces used to configure and monitor collection jobs
Data Gateway — the workhorse of the system, responsible for actually connecting to network devices, collecting data, and forwarding it to destinations
Plugin System — protocol-specific plugins (MDT, SNMP, gNMI, CLI) that handle the details of each collection protocol
Messaging Bus — Kafka or gRPC servers that act as the distribution layer between the Data Gateway and downstream applications
Downstream Applications — customer or vendor applications (such as Grafana, InfluxDB, or custom tools) that consume the collected data

Decoupled Collection and Application Layers

One of the most important architectural decisions is the complete decoupling of the collection layer from the application layer. The Data Gateway operates as a standalone collection entity that can be deployed, scaled, and managed independently of the applications consuming its data. This provides several advantages:

Horizontal scaling — additional Data Gateway instances can be deployed to handle increased collection load without modifying the application layer
Application offloading — the collection workload is removed from application servers, freeing them to focus on data processing and presentation
N+M redundancy — multiple Data Gateway instances provide fault tolerance without requiring application-level redundancy changes

Pro Tip: When planning your deployment, size your Data Gateway instances based on the number of devices and collection cadence, not on the number of downstream applications. The collection side is almost always the bottleneck, not the distribution side.

Lab Topology Example

A practical lab environment for learning multivendor telemetry collection typically includes:

The infrastructure management platform with its GUI and API interfaces
One or more Data Gateway instances
Network devices from multiple vendors
A Kafka messaging bus for data distribution
Downstream visualization and storage tools such as Grafana, InfluxDB, and Kafka consumers

The target in a lab environment is often to collect interface counters from multiple devices using different protocols (MDT, SNMP, gNMI) and verify that the data arrives correctly at each destination.

What Protocols Does Multivendor Telemetry Support?

The collection solution supports a comprehensive set of protocols, each suited to different device types, data requirements, and operational scenarios.

SNMP (Simple Network Management Protocol)

SNMP remains one of the most widely supported monitoring protocols across network vendors. The collection platform supports both standard and proprietary MIBs for gathering data from any SNMP-capable device. For SNMP polling, sensor data is specified using:

A MIB name
A MIB table name
A MIB scalar variable

SNMP is particularly useful for third-party devices that may not support modern telemetry protocols.

Model-Driven Telemetry (MDT)

MDT provides streaming telemetry capabilities where the network device pushes data to the collector at a configured cadence. MDT supports:

OpenConfig YANG data models — vendor-neutral models for common operational data
Native YANG data models — vendor-specific models that expose platform-specific counters and features
OpenConfig data models with vendor extensions — hybrid models that augment standard schemas with vendor-specific leaves

Sensor data for MDT is specified using an XPath expression that identifies the data tree path within the YANG model.

gNMI (gRPC Network Management Interface)

gNMI is a modern, gRPC-based protocol for network management that supports both streaming telemetry and configuration management. Like MDT, gNMI uses YANG data models and XPath-style paths for sensor specification. The key difference lies in the encoding protocols — gNMI defines multiple encoding options:

Encoding	Description
PROTO	Protocol Buffers binary encoding
JSON	Standard JSON encoding
JSON_IETF	JSON encoding following IETF conventions
ASCII	Human-readable text encoding
Binary	Raw binary encoding

The actual encoding available depends on the vendor's implementation. Not all devices support all encoding types.

Pro Tip: When collecting from multi-vendor environments, pay close attention to which encodings each device supports. A mismatch between the requested encoding and what the device supports is one of the most common causes of collection failures.

CLI (Command Line Interface)

CLI-based collection allows data to be gathered by executing CLI commands on devices and parsing the output. Sensor data for CLI collection is simply the CLI command to execute. It is important to note that CLI collection support is limited to Cisco devices. For third-party devices, CLI collection would require additional customization.

SNMP TRAP

Unlike SNMP polling, which actively queries devices for data, SNMP TRAP collection listens for unsolicited notifications sent by devices. Sensor data for TRAP collection is specified using a Trap OID (Object Identifier) that identifies the specific trap to listen for.

Syslog

Syslog collection captures log messages sent by network devices. Sensor data for syslog is specified using a combination of:

Severity number — indicating the importance level of the message
Facility number — indicating the subsystem that generated the message

Protocol Comparison Summary

Protocol	Direction	Sensor Specification	Multi-Vendor	Best For
SNMP Poll	Collector pulls	MIB name/table/scalar	Yes	Legacy devices, broad compatibility
MDT	Device pushes	YANG XPath	Cisco + select vendors	High-frequency Cisco telemetry
gNMI	Device pushes or collector pulls	YANG XPath	Yes	Modern multi-vendor streaming
CLI	Collector pulls	CLI command	Cisco only	Data not available via other protocols
SNMP TRAP	Device pushes	Trap OID	Yes	Event-driven monitoring
Syslog	Device pushes	Severity + Facility	Yes	Log aggregation and alerting

Deployment Models for Multivendor Telemetry

The collection solution supports multiple deployment models to accommodate different organizational requirements and infrastructure architectures.

On-Premises Deployment

In an on-premises deployment, the entire collection infrastructure runs within the customer's data center. This model is suited for organizations that need to keep all collected telemetry data within their own network boundaries. On-premises deployments support:

One or more Data Gateway instances collecting from multi-vendor devices
Software offloading through dynamic Data Gateway deployment
Multi-vendor enablement across the device fleet
Multiple instances for large-scale environments

Cloud Deployment

Cloud-based deployments enable secure data collection and distribution to cloud-hosted applications. In this model, Data Gateway instances may run on-premises but forward collected data to cloud-based analytics and management platforms. The Data Gateway provides secure data collection and distribution to cloud applications.

Customer Application Integration

For organizations building custom applications, the Data Gateway provides data collection capabilities that support both Cisco and third-party devices. This deployment model is particularly valuable for:

Custom SNMP and telemetry data collection
Third-party device data collection testing in customer labs
Integration with proprietary analytics and automation pipelines

Pro Tip: When evaluating deployment models, consider starting with a lab environment that mirrors your production topology. Use the lab to validate collection from all device types and protocols before deploying to production.

How to Create Collection Jobs for Multivendor Telemetry

Collection jobs are the fundamental unit of work in the telemetry collection system. Each job defines what data to collect, from which devices, using which protocol, and where to send the results. Jobs can be created through three methods: the GUI, the API (using tools like Postman), or through automation tools.

Creating Jobs via the GUI

The graphical interface provides a guided workflow for creating collection jobs. This is the simplest method and is ideal for ad-hoc collection needs or for learning the system. The GUI is part of the infrastructure management platform and provides forms for specifying all job parameters.

Creating Jobs via the API

For programmatic and repeatable job creation, the API accepts JSON payloads that define every aspect of a collection job. This method is essential for integrating collection into automation pipelines.

Creating Jobs via Automation Tools

For large-scale deployments, automation tools can be used to create collection jobs programmatically, enabling infrastructure-as-code approaches to telemetry collection management.

Anatomy of a Collection Job Payload

The API payload for creating a collection job (createCollectionJob) consists of five major sections:

1. Device Set

The device set defines which network devices are in scope for the collection job. Devices can be specified in two ways:

By individual device — using an array of device_ids identified by their Universal Identifiers (UUID). Device UUIDs can be retrieved either through the GUI or via a nodes API query.
By device group — using a device group ID identified by device "tags" configured in the management UI. This is useful for collecting from logical groups of devices (e.g., all core routers or all branch switches).

! Device UUIDs are retrieved from the management platform
! Example: querying nodes API to get device UUIDs
! Each device in the network has a unique UUID assigned

2. Sensor Input Configuration

The sensor_input_configs section is a list of sensor definitions, each containing:

sensor_data — the specific data to collect (varies by protocol type)
cadence — the collection frequency in milliseconds

The sensor data field accepts one of seven types depending on the collection protocol:

Sensor Type	Protocol	Data Specification
`snmp_sensor`	SNMP	MIB name, table, or scalar variable
`cli_sensor`	CLI	CLI command string
`mdt_sensor`	MDT	YANG XPath expression
`gnmi_sensor`	gNMI	YANG XPath expression
`gnmi_standard_sensor`	gNMI (standard)	YANG XPath expression
`trap_sensor`	SNMP TRAP	Trap OID
`syslog_sensor`	Syslog	Severity and Facility numbers

A single collection job can include multiple sensor data entries for different sensors, allowing you to collect multiple metrics in a single job.

3. Sensor Output Configuration

The sensor_output_configs section defines where collected data should be sent. It is also a list, and each entry contains:

sensor_data — must match the corresponding sensor data from the input configuration
destination — the Kafka or gRPC server to receive the data

The destination is composed of two identifiers:

destination_id — the UUID of the Kafka or gRPC server, previously configured under Data Gateway Global Settings in the GUI
context_id — the Kafka topic name. If the destination is a gRPC server, the context_id is ignored

Pro Tip: Only one destination can be defined per sensor_data entry. If you need to send the same sensor data to multiple destinations, you will need to create separate collection jobs or leverage the collect-once-distribute-many paradigm described later in this article.

4. Application Context

The application context serves as the unique identifier for each collection job within the system. It consists of two user-defined strings:

context_id — a string identifying the collection context
application_id — a string identifying the application requesting the collection

Together, these values form a unique job identifier. Duplicate application contexts are not allowed — each collection job must have a unique combination of context_id and application_id.

5. Collection Mode

The collection mode specifies which protocol to use for the collection job. This determines how the Data Gateway communicates with the target devices to gather the requested sensor data.

Payload Example Walkthrough

A complete collection job payload brings all five sections together. Here is the logical flow of a typical payload:

Specify the Node UUID of the target device
Define the YANG Data Model and Data Tree Path for the sensor input
Set the Kafka UUID as the destination
Specify the Kafka Topic name for data routing
Assign a unique Application Context for job identification

Each of these elements maps directly to a field in the JSON payload, making it straightforward to construct jobs programmatically once you understand the structure.

How to Monitor Multivendor Telemetry Collection Jobs

Effective monitoring is essential for maintaining reliable telemetry collection at scale. The platform provides comprehensive monitoring capabilities on both the ingress (collection) and egress (distribution) sides.

Understanding Ingress and Egress Monitoring

Monitoring is implemented on both sides of the data pipeline:

Ingress monitoring — tracks incoming messages from network devices to the Data Gateway
Egress monitoring — tracks outgoing messages from the Data Gateway to the messaging bus (Kafka or gRPC)

This dual-sided monitoring is available through both the API and the GUI and applies to both internally initiated and customer-initiated collections.

Collection Monitoring

The collection monitoring interface provides a hierarchical view:

Jobs List — a top-level view of all active collection jobs and their status
Per-Job Device List — drill down into any job to see individual devices, including devices impacted by collection issues
Per-Device Collection Metrics — detailed metrics for each device, including the collection protocol in use, success rates, and error information

This hierarchy makes it easy to identify exactly which devices are experiencing collection problems and what protocol-level issues may be causing them.

Distribution Monitoring

Distribution monitoring provides per-destination metrics that track whether collected data is successfully reaching its configured Kafka or gRPC endpoint.

A critical behavior to understand is the independence of collection and distribution status:

If a destination (Kafka or gRPC server) becomes unreachable, collection from network devices will still be reported as successful
However, distribution will be reported as failed
The overall job status will be marked as degraded

This distinction is important for troubleshooting. A degraded job status does not necessarily mean that data collection from devices has failed — it may mean that the downstream messaging bus is experiencing issues while collection continues normally.

Pro Tip: When investigating degraded collection jobs, always check both the collection and distribution monitoring views independently. The root cause is often on the distribution side (e.g., a Kafka broker that is down) rather than the collection side.

Inside the Data Gateway Message Stream

Understanding the format and structure of Data Gateway messages is essential for building applications that consume collected telemetry data.

Message Format

Data Gateway messages follow the Google Protocol Buffers (protobuf) definition. The proto files that define the message schema can be compiled into client libraries for multiple programming languages, making it possible to build consumers in Python, Go, Java, and others.

These proto files must be used to parse messages that the Data Gateway posts to the Kafka or gRPC messaging bus. Without the correct protobuf definitions, consumer applications will not be able to deserialize the message payload.

Message Structure

Each Data Gateway message contains a header and a payload:

Message Header includes:

Node name — the hostname of the device that produced the data
Node UUID — the unique identifier of the source device
Collection start and end times — timestamps bracketing the collection interval
Sensor data — identifies which sensor path produced this data
Application Context and ID — maps the message back to the originating collection job

Message Payload:

The payload contains the actual collected data in the format dictated by the collection protocol and encoding.

Customer Application Integration

The Data Gateway does not offer direct integration options with customer applications. Instead, it requires an external messaging bus — either Kafka or gRPC — as an intermediary.

The integration flow is:

Data Gateway collects data from network devices
Data Gateway publishes messages to the Kafka topic or gRPC server
Customer applications consume messages from the messaging bus

The message format on the bus can be either PROTO (Protocol Buffers binary) or JSON, depending on the configuration.

If customer applications need data retention (historical queries rather than real-time streaming), they should implement an intermediate data lake between the messaging bus and the application. The messaging bus itself is a transit layer, not a storage layer.

The Collect Once, Distribute Many Paradigm in Multivendor Telemetry

One of the most powerful optimization features of the collection architecture is the collect once, distribute many paradigm. This feature dramatically reduces the load on network devices by ensuring that each data path is collected only once, regardless of how many applications request it.

How Cadence Optimization Works

When multiple applications request the same sensor path from the same device but at different cadences, the system applies intelligent optimization:

Input-side optimization — the Data Gateway collects at the lowest cadence (highest frequency) configured across all requesting applications
Output-side distribution — each destination receives data at its requested cadence, as long as it is a multiple of the input-side cadence
Rounding behavior — if an output cadence is not an exact multiple of the input cadence, the output is rounded down to the nearest multiple

Practical Example

Consider three applications requesting the same sensor path from the same device:

Job	Device	Sensor	Destination	Requested Cadence
Job A	Device 10	Sensor X	Destination 1	5 seconds
Job B	Device 10	Sensor X	Destination 2	25 seconds
Job C	Device 10	Sensor X	Destination 3	43 seconds

Without optimization, the Data Gateway would create three separate collection streams to the same device for the same data, tripling the load on the device. With the collect-once paradigm:

Input Stage:

Collection cadence is set to 5 seconds (the minimum of 5, 25, and 43)
Only one collection stream is created to the device

Output Stage:

Destination 1 receives data every 5 seconds (exact multiple of 5)
Destination 2 receives data every 25 seconds (exact multiple of 5)
Destination 3 receives data every 40 seconds (43 rounded down to nearest multiple of 5)

Cadence Removal Behavior

When a collection job with the lowest cadence is removed, the system automatically adjusts:

The next lowest cadence among remaining jobs becomes the new collection cadence
Distribution cadences for remaining destinations are recalculated based on the new input cadence

This dynamic adjustment ensures that collection always operates at the optimal frequency without manual intervention.

Pro Tip: When designing your collection jobs, be intentional about cadence values. Using cadences that are multiples of a common base (e.g., 5, 10, 15, 30 seconds) ensures that the rounding behavior does not cause unexpected deviations from your desired collection frequency.

Building a Python Consumer for Multivendor Telemetry Data

Once collection jobs are running and data is flowing to the messaging bus, the next step is building consumer applications that process the collected data. The reference architecture supports implementing consumers in any programming language that can compile Protocol Buffers definitions.

Consumer Architecture

A simple consumer application follows this pattern:

Compile the protobuf definitions — use the Data Gateway proto files to generate language-specific message classes
Connect to the messaging bus — establish a connection to the Kafka cluster or gRPC server
Subscribe to topics — listen on the Kafka topic (context_id) configured in the collection job's output configuration
Deserialize messages — use the compiled protobuf classes to parse incoming messages
Process data — extract the relevant metrics from the message payload and forward them to your analytics pipeline, database, or alerting system

Integration with Data Storage

For applications that require historical data analysis rather than just real-time streaming, an intermediate data lake should be implemented between the messaging bus and the application layer. Common architectures include:

Kafka to InfluxDB — for time-series metrics storage and visualization with Grafana
Kafka to Elasticsearch — for log-style data with Kibana dashboards
gRPC to custom database — for specialized analytics applications

The messaging bus serves as a transit layer, not a persistent storage layer. Customer applications must implement their own data retention strategy.

Optimizing Collection Jobs for Scale

As your telemetry collection deployment grows, optimization becomes critical for maintaining performance and minimizing impact on network devices.

Job Consolidation

Rather than creating individual collection jobs for each metric on each device, consolidate related sensors into fewer, broader jobs. Each collection job can contain multiple sensor input configurations, allowing you to collect several metrics from the same device in a single job.

Device Grouping

Use device tags and group-based device sets rather than individual device UUIDs when possible. This approach:

Simplifies job management as devices are added or removed
Enables automatic inclusion of new devices that match the group criteria
Reduces the number of API calls needed to manage collection

Cadence Planning

Align collection cadences across applications to maximize the benefit of the collect-once paradigm:

Use base cadences that are common factors (e.g., 5, 10, 30 seconds)
Avoid prime-number cadences that do not align well with other applications
Consider the minimum cadence your network devices can support without performance impact

Monitoring at Scale

At scale, proactive monitoring of both collection and distribution health is essential. Set up alerting on:

Degraded job status — indicates distribution issues even when collection succeeds
Failed collection metrics — indicates device-level issues (unreachable devices, authentication failures, unsupported paths)
Cadence drift — monitor whether actual collection frequency matches the configured cadence

Frequently Asked Questions

What protocols does multivendor telemetry collection support?

The solution supports six collection protocols: SNMP (polling with standard and proprietary MIBs), Model-Driven Telemetry (MDT), gNMI, CLI, SNMP TRAP, and Syslog. Each protocol is implemented as a plugin within the Data Gateway, and a single collection deployment can use all protocols simultaneously across different devices and collection jobs.

Can I collect telemetry from non-Cisco devices?

Yes, multivendor device support is a core capability. SNMP, gNMI, SNMP TRAP, and Syslog collection work with third-party devices that support these standard protocols. MDT collection works with devices that support YANG-based streaming telemetry. CLI collection, however, is currently limited to Cisco devices — third-party CLI collection would require additional customization.

How does collected data reach my custom applications?

The Data Gateway does not provide direct integration with customer applications. Instead, it publishes collected data to an external messaging bus — either Kafka or gRPC. Your applications connect to the messaging bus to consume data. Messages follow the Google Protocol Buffers (protobuf) format and must be deserialized using the Data Gateway proto files, which can be compiled for multiple programming languages. If you need data retention for historical analysis, you should implement an intermediate data lake between the messaging bus and your application.

What happens when a Kafka broker goes down?

If the distribution destination (Kafka or gRPC server) becomes unreachable, the Data Gateway continues collecting data from network devices — collection is reported as successful. However, distribution is reported as failed, and the overall job status is marked as degraded. This design ensures that temporary messaging bus outages do not disrupt the collection process itself. Once the destination recovers, distribution resumes.

How does the collect-once paradigm reduce device load?

When multiple applications request the same sensor path from the same device at different cadences, the Data Gateway collects only once at the lowest configured cadence (highest frequency). It then distributes data to each destination at the appropriate cadence. For example, if three applications request the same interface counters at 5, 25, and 43-second intervals, the device is polled only every 5 seconds. Destination 1 gets every sample, Destination 2 gets every fifth sample, and Destination 3 gets every eighth sample (43 rounded down to 40, which is 8 times 5).

Can I create collection jobs programmatically?

Yes, collection jobs can be created through the API using structured JSON payloads, through the GUI for manual operations, or through automation tools for large-scale deployments. The API approach is essential for integrating telemetry collection into infrastructure-as-code workflows and CI/CD pipelines. Each payload defines the device set, sensor inputs, output destinations, application context, and collection mode.

Conclusion

Building a unified multivendor telemetry collection architecture is essential for any organization managing a diverse network infrastructure. The key takeaways from this guide are:

Protocol flexibility matters — supporting SNMP, MDT, gNMI, CLI, TRAP, and Syslog through a single platform eliminates the need for protocol-specific collection tools
Decoupled architecture scales — separating the collection layer from the application layer enables horizontal scaling and N+M redundancy
Structured payloads enable automation — understanding the five sections of a collection job payload (device set, sensor input, sensor output, application context, and collection mode) is the foundation for programmatic job management
Dual-sided monitoring prevents blind spots — independently monitoring collection and distribution health reveals issues that a single status indicator would hide
Collect once, distribute many saves resources — intelligent cadence optimization minimizes device load while satisfying multiple application requirements

As networks continue to grow in complexity and vendor diversity, mastering telemetry collection becomes an increasingly valuable skill for network engineers and automation professionals. The concepts covered in this article — from payload construction to cadence optimization — apply broadly across any environment where centralized, multivendor data collection is needed.

To deepen your understanding of network automation and telemetry, explore the hands-on courses available at NHPREP that cover related topics including gRPC, gNMI, model-driven telemetry, and network programmability.