Troubleshooting Device Discovery
Objective
Fix device discovery failures in Catalyst Center by troubleshooting SNMP, SSH/NETCONF, and basic IP reachability. In production, discovery failures prevent inventory, monitoring, and automation tools from managing devices — causing configuration drift and delayed incident response. This lesson walks you through the exact checks and CLI commands used to locate and remediate common discovery blockers so devices return to a managed state.
Quick Recap
This lesson continues from Lesson 1 using the same topology. No new devices are added in this lesson.
ASCII topology (showing exact IP placeholders as used in the reference material):
[ Catalyst Center Server ]
eth0: <IP address>
|
| (IP network)
|
[ Support Host ] [ Device: border -1 ]
eth0: <IP address> mgmt0: <IP address>
prompt: border-1#
Notes:
- The topology uses the exact placeholders shown in the reference material:
<IP address>on each interface and the device nameborder -1(prompt:border-1#). - All troubleshooting commands in this lesson reference the same placeholders so you can substitute your real management IPs when performing this in your lab.
Key Concepts
- ICMP reachability (ping/traceroute): Ping uses ICMP Echo Request/Reply to verify IP-layer reachability. Traceroute uses incrementing TTLs so you can identify the path and any intermediate device dropping traffic. In production, traceroute helps locate a blocked firewall hop.
- SNMP polling (snmpget): SNMP uses UDP (commonly to port 161) to fetch MIB values. A successful snmpget proves SNMP configuration (community/credentials) and UDP reachability. If SNMP fails but ICMP works, check SNMP credentials, ACLs, or UDP filtering.
- NETCONF over SSH (netconf connectivity ssh -p 830): NETCONF sessions use SSH (often on port 830) for device configuration exchange. A successful SSH/NETCONF connect validates management-plane connectivity and credentials. If NETCONF fails while SSH on port 22 works, check port 830 access or NETCONF agent state.
- Inventory resync and logs: Catalyst Center periodically syncs inventory (default every 24 hours) and supports manual resync via the Inventory Dashboard (Actions → Inventory → ‘Resync Device’). Logs from the inventory service show whether discovery steps succeeded or failed.
- Remote interactive support (RADKit-like behavior): The platform supports starting interactive sessions to device contexts (example:
service.inventory ['border -1'].interactive()), which is useful for live troubleshooting when direct management access is limited.
Tip: Think of these checks as layered — first verify IP reachability (ICMP), then UDP/TCP services (SNMP, SSH/NETCONF), then higher-level application (inventory resync and logs). Each layer narrows down where the failure lies.
Step-by-step configuration
Step 1: Verify IP reachability (ICMP) from Catalyst Center to the device
What we are doing: Confirm the Catalyst Center can reach the device at the IP layer. If the device is unreachable, discovery cannot proceed. This step isolates network reachability problems (routing, ACLs, firewall).
ping <IP address>
traceroute <IP address>
What just happened:
ping <IP address>sends ICMP Echo Requests; successful replies confirm basic IP connectivity between Catalyst Center and the device.traceroute <IP address>increments TTLs to reveal intermediate hops; it helps identify the network hop or firewall blocking access.
Real-world note: In production, many discovery problems are simply routing or firewall issues between the management plane and devices. Always verify reachability first.
Verify:
ping <IP address>
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to <IP address>, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/2 ms
traceroute <IP address>
traceroute to <IP address> (using UDP, 30 hops max):
1 10.0.0.1 1 ms 1 ms 1 ms
2 10.0.1.1 2 ms 2 ms 2 ms
3 <IP address> 3 ms 3 ms 3 ms
- Expected result:
pingshows successful replies (exclamation marks), andtraceroutereaches the final<IP address>. If traceroute stops or times out at a hop, that hop is the likely blocker.
Step 2: Validate SNMP from Catalyst Center to the device
What we are doing: Confirm Catalyst Center can poll the device using SNMP. SNMP is heavily used by Inventory and discovery to collect device information and detect configuration changes.
snmpget -v <version> <IP address> -c <community> <OID>
What just happened:
- The
snmpgetcommand attempts a single SNMP GET to the device. A valid response returns the requested OID value (for example, sysDescr). If SNMP returns a timeout or error, SNMP credentials or UDP port filtering is likely the cause.
Real-world note: In production, SNMP community strings or SNMPv3 credentials must match exactly and SNMP must be allowed by device ACLs and any intermediate firewall.
Verify:
snmpget -v 2c <IP address> -c public 1.3.6.1.2.1.1.1.0
SNMPv2-SMI::mib-2.1.1.0 = STRING: "Device OS version X.Y.Z, NHPREP build"
- Expected result: A value for the requested OID (for example, a sysDescr string). If you see
TimeoutorNo Such Object, check community string, SNMP version, and UDP 161 reachability.
Step 3: Test NETCONF (SSH) connectivity to the device
What we are doing: Confirm NETCONF connectivity (NETCONF typically runs over SSH on port 830). Discovery and some provisioning workflows rely on NETCONF to push or retrieve configuration data.
netconf connectivity ssh -p 830 <username>@<IP address>
What just happened:
- This command attempts an SSH-based NETCONF connection to the device on port 830. A successful connection proves that SSH/NETCONF is reachable and that credentials/keys are accepted. If the command fails, it indicates a problem with port 830 access, SSH keys, or NETCONF agent status on the device.
Real-world note: Some devices use NETCONF on port 22; others use 830. The platform expects NETCONF on 830 in many deployments — confirm device expectations before troubleshooting.
Verify:
netconf connectivity ssh -p 830 <username>@<IP address>
Attempting NETCONF SSH connection to <username>@<IP address>:830 ...
Connected to <IP address>
Session established. NETCONF capabilities exchanged.
Connection closed.
- Expected result: A successful connection sequence including capability exchange. If you see a connection refused or timeout, check firewall rules for port 830 and device NETCONF configuration.
Step 4: Start an interactive inventory session (attach to device context)
What we are doing: Attach to the inventory service’s interactive context for the device border -1. This allows you to run device-context commands through the Catalyst Center control plane or support utilities (read-only access to device context is provided).
service.inventory ['border -1'].interactive()
What just happened:
- The command begins an interactive session tied to the inventory context of
border -1. The platform returns informational messages and presents a device prompt (border-1#) allowing direct troubleshooting or support-level commands where permitted.
Real-world note: Interactive sessions are valuable when device management ports are accessible only via the management platform's control plane (for example, when direct SSH from your workstation is blocked).
Verify:
service.inventory ['border -1'].interactive()
08:05:41.928Z INFO | starting interactive session (will be closed when detached)
Attaching to border -1 ...
Type: ~. to detach. ~? for other shortcuts.
When using nested SSH sessions, add an extra ~ per level of nesting.
border-1#
- Expected result: You should see the informational lines and the device prompt
border-1#. If the session fails to attach, consult inventory logs for details.
Step 5: Review Inventory service logs to confirm discovery/resync actions
What we are doing: Pull logs from the inventory service to find errors or failure reasons for discovery/resync attempts. The magctlservice logs command supports multiple options to view recent logs, raw logs, and follow output.
magctlservice logs --help
magctlservice logs -r inventory
magctlservice logs -rt10 inventory
magctlservice logs -rf inventory
What just happened:
magctlservice logs --helpshows available options and how to filter logs.magctlservice logs -r inventoryretrieves logs for theinventoryservice.magctlservice logs -rt10 inventoryreturns the last 10 minutes of logs.magctlservice logs -rf inventoryshows raw logs and can follow them for live updates during a manual resync.
Real-world note: For clustered or XL deployments, check logs on the specific inventory instance handling the device (the dashboard will show which inventory instance is used).
Verify:
magctlservice logs --help
Usage: magctlservice logs [OPTIONS] SERVICE
Connects to Elastic Search and pulls logs
Options:
-o, --output [ json] Print log records in json
-m, --mins TEXT How many minutes in the past to search for logs
-r, --raw View raw log files
-c, --container TEXT Show logs for this container
-t, --timezone TEXT View logs in selected timezone ieAmerica/Los_Angeles , Asia/Calcutta
-f, --follow Follow logs when using --raw
-p, --previous Show logs from previous running instance of service (if available)
-t, --tail INTEGER lines of recent log file to display. Defaults to -1, showing all log lines
-a, --appstack TEXT AppStack on which to perform the operation
--help Show this message and exit.
magctlservice logs -r inventory
[2025-03-10T08:05:41Z] INFO InventoryService - Starting resync for device border -1
[2025-03-10T08:05:41Z] INFO InventoryService - SNMP GET to <IP address> using community 'public' succeeded
[2025-03-10T08:05:42Z] INFO InventoryService - NETCONF session to <IP address> succeeded
[2025-03-10T08:05:43Z] INFO InventoryService - Device border -1 marked as Managed
magctlservice logs -rt10 inventory
[Recent 10 minutes of logs showing inventory actions...]
- Expected result: Look for lines indicating SNMP and NETCONF success, or explicit error messages that explain why discovery failed (timeouts, auth failures, port blocked). Use the log timestamps to correlate with your manual resync.
Verification Checklist
- Check 1: ICMP reachability — run
ping <IP address>and confirm replies. If not, check network path and intermediate firewalls. - Check 2: SNMP polling — run
snmpget -v <version> <IP address> -c <community> <OID>and confirm a MIB value is returned. - Check 3: NETCONF/SSH connectivity — run
netconf connectivity ssh -p 830 <username>@<IP address>and confirm session establishment and capability exchange.
Common Mistakes
| Symptom | Cause | Fix |
|---|---|---|
| Ping succeeds but SNMP times out | UDP 161 blocked by firewall or SNMP community/credentials incorrect | Open/allow UDP 161 between Catalyst Center and device; verify SNMP community or SNMPv3 credentials |
| NETCONF connect times out or refused | Port 830 blocked or NETCONF agent disabled on device | Ensure port 830 is allowed; verify device NETCONF agent is enabled or use configured NETCONF port |
| Device listed as “Unreachable” in Inventory UI | Management path blocked, or device IP changed | Verify routing and firewall; run traceroute <IP address> to identify the blocked hop |
| Resync completes but inventory still shows errors | Inventory service logs show auth failure or parsing error | Use magctlservice logs -r inventory to identify precise error and fix credentials or template mismatch |
Key Takeaways
- First verify IP-layer reachability (ping/traceroute) before blaming higher-layer services — it narrows the problem quickly.
- SNMP and NETCONF are both required for comprehensive discovery; success of one does not guarantee the other. SNMP tests SNMP/UDP reachability; NETCONF tests SSH on port 830 and the device’s config agent.
- The Inventory Dashboard supports manual resync (Actions → Inventory → ‘Resync Device’), but logs (
magctlservice logs) are essential to see the actual failure cause. - Use interactive inventory sessions (
service.inventory ['border -1'].interactive()) when direct device access is restricted; the platform provides a device context for live troubleshooting.
Warning: In production, ensure you have authorization before initiating interactive sessions or resyncs. Unauthorized changes or repeated resyncs can generate noise and confusion in large deployments.
If you complete the checklist and still see discovery failures, gather the inventory logs for the device and share them with your operations team following your support procedures — logs from magctlservice logs -r inventory are the primary artifact support will request.