Lesson 2 of 5

Troubleshooting Device Discovery

Objective

Fix device discovery failures in Catalyst Center by troubleshooting SNMP, SSH/NETCONF, and basic IP reachability. In production, discovery failures prevent inventory, monitoring, and automation tools from managing devices — causing configuration drift and delayed incident response. This lesson walks you through the exact checks and CLI commands used to locate and remediate common discovery blockers so devices return to a managed state.

Quick Recap

This lesson continues from Lesson 1 using the same topology. No new devices are added in this lesson.

ASCII topology (showing exact IP placeholders as used in the reference material):

[ Catalyst Center Server ]
    eth0: <IP address>
         |
         | (IP network)
         |
[ Support Host ]                    [ Device: border -1 ]
    eth0: <IP address>               mgmt0: <IP address>
                                     prompt: border-1#

Notes:

  • The topology uses the exact placeholders shown in the reference material: <IP address> on each interface and the device name border -1 (prompt: border-1#).
  • All troubleshooting commands in this lesson reference the same placeholders so you can substitute your real management IPs when performing this in your lab.

Key Concepts

  • ICMP reachability (ping/traceroute): Ping uses ICMP Echo Request/Reply to verify IP-layer reachability. Traceroute uses incrementing TTLs so you can identify the path and any intermediate device dropping traffic. In production, traceroute helps locate a blocked firewall hop.
  • SNMP polling (snmpget): SNMP uses UDP (commonly to port 161) to fetch MIB values. A successful snmpget proves SNMP configuration (community/credentials) and UDP reachability. If SNMP fails but ICMP works, check SNMP credentials, ACLs, or UDP filtering.
  • NETCONF over SSH (netconf connectivity ssh -p 830): NETCONF sessions use SSH (often on port 830) for device configuration exchange. A successful SSH/NETCONF connect validates management-plane connectivity and credentials. If NETCONF fails while SSH on port 22 works, check port 830 access or NETCONF agent state.
  • Inventory resync and logs: Catalyst Center periodically syncs inventory (default every 24 hours) and supports manual resync via the Inventory Dashboard (Actions → Inventory → ‘Resync Device’). Logs from the inventory service show whether discovery steps succeeded or failed.
  • Remote interactive support (RADKit-like behavior): The platform supports starting interactive sessions to device contexts (example: service.inventory ['border -1'].interactive()), which is useful for live troubleshooting when direct management access is limited.

Tip: Think of these checks as layered — first verify IP reachability (ICMP), then UDP/TCP services (SNMP, SSH/NETCONF), then higher-level application (inventory resync and logs). Each layer narrows down where the failure lies.


Step-by-step configuration

Step 1: Verify IP reachability (ICMP) from Catalyst Center to the device

What we are doing: Confirm the Catalyst Center can reach the device at the IP layer. If the device is unreachable, discovery cannot proceed. This step isolates network reachability problems (routing, ACLs, firewall).

ping <IP address>
traceroute <IP address>

What just happened:

  • ping <IP address> sends ICMP Echo Requests; successful replies confirm basic IP connectivity between Catalyst Center and the device.
  • traceroute <IP address> increments TTLs to reveal intermediate hops; it helps identify the network hop or firewall blocking access.

Real-world note: In production, many discovery problems are simply routing or firewall issues between the management plane and devices. Always verify reachability first.

Verify:

ping <IP address>
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to <IP address>, timeout is 2 seconds:
!!!!! 
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/2 ms

traceroute <IP address>
traceroute to <IP address> (using UDP, 30 hops max):
 1  10.0.0.1  1 ms  1 ms  1 ms
 2  10.0.1.1  2 ms  2 ms  2 ms
 3  <IP address>  3 ms  3 ms  3 ms
  • Expected result: ping shows successful replies (exclamation marks), and traceroute reaches the final <IP address>. If traceroute stops or times out at a hop, that hop is the likely blocker.

Step 2: Validate SNMP from Catalyst Center to the device

What we are doing: Confirm Catalyst Center can poll the device using SNMP. SNMP is heavily used by Inventory and discovery to collect device information and detect configuration changes.

snmpget -v <version> <IP address> -c <community> <OID>

What just happened:

  • The snmpget command attempts a single SNMP GET to the device. A valid response returns the requested OID value (for example, sysDescr). If SNMP returns a timeout or error, SNMP credentials or UDP port filtering is likely the cause.

Real-world note: In production, SNMP community strings or SNMPv3 credentials must match exactly and SNMP must be allowed by device ACLs and any intermediate firewall.

Verify:

snmpget -v 2c <IP address> -c public 1.3.6.1.2.1.1.1.0
SNMPv2-SMI::mib-2.1.1.0 = STRING: "Device OS version X.Y.Z, NHPREP build"
  • Expected result: A value for the requested OID (for example, a sysDescr string). If you see Timeout or No Such Object, check community string, SNMP version, and UDP 161 reachability.

Step 3: Test NETCONF (SSH) connectivity to the device

What we are doing: Confirm NETCONF connectivity (NETCONF typically runs over SSH on port 830). Discovery and some provisioning workflows rely on NETCONF to push or retrieve configuration data.

netconf connectivity ssh -p 830 <username>@<IP address>

What just happened:

  • This command attempts an SSH-based NETCONF connection to the device on port 830. A successful connection proves that SSH/NETCONF is reachable and that credentials/keys are accepted. If the command fails, it indicates a problem with port 830 access, SSH keys, or NETCONF agent status on the device.

Real-world note: Some devices use NETCONF on port 22; others use 830. The platform expects NETCONF on 830 in many deployments — confirm device expectations before troubleshooting.

Verify:

netconf connectivity ssh -p 830 <username>@<IP address>
Attempting NETCONF SSH connection to <username>@<IP address>:830 ...
Connected to <IP address>
Session established. NETCONF capabilities exchanged.
Connection closed.
  • Expected result: A successful connection sequence including capability exchange. If you see a connection refused or timeout, check firewall rules for port 830 and device NETCONF configuration.

Step 4: Start an interactive inventory session (attach to device context)

What we are doing: Attach to the inventory service’s interactive context for the device border -1. This allows you to run device-context commands through the Catalyst Center control plane or support utilities (read-only access to device context is provided).

service.inventory ['border -1'].interactive()

What just happened:

  • The command begins an interactive session tied to the inventory context of border -1. The platform returns informational messages and presents a device prompt (border-1#) allowing direct troubleshooting or support-level commands where permitted.

Real-world note: Interactive sessions are valuable when device management ports are accessible only via the management platform's control plane (for example, when direct SSH from your workstation is blocked).

Verify:

service.inventory ['border -1'].interactive()
08:05:41.928Z INFO | starting interactive session (will be closed when detached)
Attaching to border -1 ...
Type: ~. to detach. ~? for other shortcuts.
When using nested SSH sessions, add an extra ~ per level of nesting.
border-1#
  • Expected result: You should see the informational lines and the device prompt border-1#. If the session fails to attach, consult inventory logs for details.

Step 5: Review Inventory service logs to confirm discovery/resync actions

What we are doing: Pull logs from the inventory service to find errors or failure reasons for discovery/resync attempts. The magctlservice logs command supports multiple options to view recent logs, raw logs, and follow output.

magctlservice logs --help
magctlservice logs -r inventory
magctlservice logs -rt10 inventory
magctlservice logs -rf inventory

What just happened:

  • magctlservice logs --help shows available options and how to filter logs.
  • magctlservice logs -r inventory retrieves logs for the inventory service.
  • magctlservice logs -rt10 inventory returns the last 10 minutes of logs.
  • magctlservice logs -rf inventory shows raw logs and can follow them for live updates during a manual resync.

Real-world note: For clustered or XL deployments, check logs on the specific inventory instance handling the device (the dashboard will show which inventory instance is used).

Verify:

magctlservice logs --help
Usage: magctlservice logs [OPTIONS] SERVICE
Connects to Elastic Search and pulls logs
Options:
  -o, --output [ json]    Print log records in json
  -m, --mins TEXT         How many minutes in the past to search for logs
  -r, --raw               View raw log files
  -c, --container TEXT    Show logs for this container
  -t, --timezone TEXT     View logs in selected timezone ieAmerica/Los_Angeles , Asia/Calcutta
  -f, --follow            Follow logs when using --raw
  -p, --previous          Show logs from previous running instance of service (if available)
  -t, --tail INTEGER      lines of recent log file to display. Defaults to -1, showing all log lines
  -a, --appstack TEXT     AppStack on which to perform the operation
  --help                  Show this message and exit.

magctlservice logs -r inventory
[2025-03-10T08:05:41Z] INFO  InventoryService - Starting resync for device border -1
[2025-03-10T08:05:41Z] INFO  InventoryService - SNMP GET to <IP address> using community 'public' succeeded
[2025-03-10T08:05:42Z] INFO  InventoryService - NETCONF session to <IP address> succeeded
[2025-03-10T08:05:43Z] INFO  InventoryService - Device border -1 marked as Managed

magctlservice logs -rt10 inventory
[Recent 10 minutes of logs showing inventory actions...]
  • Expected result: Look for lines indicating SNMP and NETCONF success, or explicit error messages that explain why discovery failed (timeouts, auth failures, port blocked). Use the log timestamps to correlate with your manual resync.

Verification Checklist

  • Check 1: ICMP reachability — run ping <IP address> and confirm replies. If not, check network path and intermediate firewalls.
  • Check 2: SNMP polling — run snmpget -v <version> <IP address> -c <community> <OID> and confirm a MIB value is returned.
  • Check 3: NETCONF/SSH connectivity — run netconf connectivity ssh -p 830 <username>@<IP address> and confirm session establishment and capability exchange.

Common Mistakes

SymptomCauseFix
Ping succeeds but SNMP times outUDP 161 blocked by firewall or SNMP community/credentials incorrectOpen/allow UDP 161 between Catalyst Center and device; verify SNMP community or SNMPv3 credentials
NETCONF connect times out or refusedPort 830 blocked or NETCONF agent disabled on deviceEnsure port 830 is allowed; verify device NETCONF agent is enabled or use configured NETCONF port
Device listed as “Unreachable” in Inventory UIManagement path blocked, or device IP changedVerify routing and firewall; run traceroute <IP address> to identify the blocked hop
Resync completes but inventory still shows errorsInventory service logs show auth failure or parsing errorUse magctlservice logs -r inventory to identify precise error and fix credentials or template mismatch

Key Takeaways

  • First verify IP-layer reachability (ping/traceroute) before blaming higher-layer services — it narrows the problem quickly.
  • SNMP and NETCONF are both required for comprehensive discovery; success of one does not guarantee the other. SNMP tests SNMP/UDP reachability; NETCONF tests SSH on port 830 and the device’s config agent.
  • The Inventory Dashboard supports manual resync (Actions → Inventory → ‘Resync Device’), but logs (magctlservice logs) are essential to see the actual failure cause.
  • Use interactive inventory sessions (service.inventory ['border -1'].interactive()) when direct device access is restricted; the platform provides a device context for live troubleshooting.

Warning: In production, ensure you have authorization before initiating interactive sessions or resyncs. Unauthorized changes or repeated resyncs can generate noise and confusion in large deployments.

If you complete the checklist and still see discovery failures, gather the inventory logs for the device and share them with your operations team following your support procedures — logs from magctlservice logs -r inventory are the primary artifact support will request.