Catalyst Center Service Architecture
Objective
Understand the Catalyst Center microservices architecture, how core services relate to each other, and the primary operational tooling for service inspection, logging, and restarts. This lesson teaches the practical CLI workflow (via SSH and magctl) to inspect service health, view logs, and restart services safely — skills used in production to recover failing microservices, validate dependency issues, and minimize downtime during troubleshooting.
In production, Catalyst Center runs as a set of microservices (Kubernetes pods/containers). Knowing how to check service health, follow logs, and perform controlled restarts is essential for maintaining Inventory, Provisioning, and other critical automation workflows. Example: when a device provisioning run fails, an engineer uses magctl to trace the failure across provisioning-service → task-service → orchestration-engine and restart the appropriate service without impacting unrelated workloads.
Topology & Device Table
Key Concepts (theory and protocol behavior)
-
Microservices & Pods
- Theory: A pod is a group of one or more containers that share network and storage. Multiple pods implementing a service are grouped under a Kubernetes Service abstraction.
- Practical: When you inspect a service via
magctl, you are typically inspecting pods (containers) running that service.
-
Namespaces (isolation)
- Theory: A namespace provides logical isolation within a cluster — like having separate virtual clusters inside a single physical cluster.
- Practical: Catalyst Center maps application stacks to namespaces; this affects resource scoping and troubleshooting boundaries.
-
Nodes & High Availability
- Theory: Nodes are worker machines that host pods. In an HA cluster, pods of a namespace are spread across nodes to tolerate node failures.
- Practical: A service “running” can still be unhealthy if its pod is repeatedly rescheduled across nodes — logs and restart counts reveal that.
-
maglev vs magctl
- Theory:
maglevis the SSH user and includes some tooling;magctlis the primary command used to monitor and manipulate system services (similar tokubectlbut tailored to this environment). - Practical: You SSH as
maglevto runmagctl appstack status, inspect logs, and restart services.
- Theory:
-
Service lifecycle & restarts
- Theory: A soft restart restarts the container while keeping the pod object; a hard restart deletes and recreates the pod (non-persistent container storage is lost).
- Practical: Use soft restarts for routine service reloads; use hard restarts only when a full pod replacement is required and you accept state loss.
Step-by-step configuration (operational steps)
Step 1: SSH into the Catalyst Center management node
What we are doing: We establish an SSH session to the Catalyst Center as the maglev user on TCP port 2222. This gives us the CLI environment where magctl runs. SSH is the management plane access for operations.
ssh maglev@Catalyst Center_IP -p 2222
What just happened:
- The SSH client initiated a TCP connection to
Catalyst Center_IPon port2222. - Upon successful authentication as user
maglev, you receive a shell wheremagctland other management tools are available. This step is required becausemagctlis intended to be executed from the Catalyst Center management environment.
Real-world note: In production, SSH to management endpoints is often restricted by bastion hosts and MFA. Ensure proper audit logging and least-privilege access.
Verify:
maglev@catalyst-center:~$ whoami
maglev
maglev@catalyst-center:~$ hostname
catalyst-center
maglev@catalyst-center:~$
Expected output above shows you are on the Catalyst Center host as maglev and can proceed to use magctl.
Step 2: Check overall appstack status
What we are doing: Query the cluster/application stack to get the health status of the running microservices using magctl appstack status. This establishes a baseline of which services are desired vs. current and whether any service is degraded.
maglev@catalyst-center:~$ magctl appstack status
What just happened:
magctl appstack statusrequests the orchestration layer for the desired and current state of each application service.- The command reports service-level health (running, pending, crashed) allowing you to identify problematic services at a glance.
Verify:
maglev@catalyst-center:~$ magctl appstack status
APPSTACK: catalyst-center
Service Name Desired Current Status
postgres 1 1 running
inventory 1 1 running
inventory-manager 1 1 running
provisioning-service 1 1 running
task-service 1 1 running
spf-service 1 1 running
Expected output: A table showing each service with Desired and Current replica counts and a Status column. Any mismatch or non-running status indicates a problem.
Step 3: Retrieve service logs (raw, tail, and follow)
What we are doing: We fetch logs for a service (example: inventory-manager) to inspect recent events and error traces. We demonstrate raw full logs, tailing the last N lines, and following live logs.
maglev@catalyst-center:~$ magctl service logs -r inventory-manager
maglev@catalyst-center:~$ magctl service logs -r inventory-manager | tail -n 50
maglev@catalyst-center:~$ magctl service logs -rf inventory-manager
What just happened:
magctl service logs -r <service>prints raw logs for the requested service.- Piping to
tail -n 50limits output to the most recent 50 lines for focused troubleshooting. magctl service logs -rf <service>follows logs in real time (liketail -f), useful when reproducing an issue.
Real-world note: Follow logs during repro to capture transient errors that disappear after a restart.
Verify:
maglev@catalyst-center:~$ magctl service logs -r inventory-manager
2025-03-15T09:12:23Z INFO [inventory-manager] Starting inventory-manager v1.12.3
2025-03-15T09:12:25Z INFO [inventory-manager] Connected to postgres at postgres:5432
2025-03-15T09:14:07Z WARN [inventory-manager] Device sync delayed: retrying in 10s
2025-03-15T09:14:17Z ERROR [inventory-manager] Failed to parse device response from 10.0.0.45: timeout
2025-03-15T09:15:01Z INFO [inventory-manager] Processing backlog: 12 devices
Or tail last 50 lines:
maglev@catalyst-center:~$ magctl service logs -r inventory-manager | tail -n 50
2025-03-15T09:14:07Z WARN [inventory-manager] Device sync delayed: retrying in 10s
2025-03-15T09:14:17Z ERROR [inventory-manager] Failed to parse device response from 10.0.0.45: timeout
2025-03-15T09:15:01Z INFO [inventory-manager] Processing backlog: 12 devices
Follow live logs (shows streaming lines; sample snapshot):
maglev@catalyst-center:~$ magctl service logs -rf inventory-manager
2025-03-15T09:15:02Z INFO [inventory-manager] Sync worker started
2025-03-15T09:15:04Z INFO [inventory-manager] Device 10.0.0.22: sync complete
Step 4: Soft restart a service (container restart)
What we are doing: We perform a soft restart of inventory-manager to reload the container without deleting the pod. This is useful for ephemeral fixes (e.g., clearing transient memory leaks) while preserving pod identity.
maglev@catalyst-center:~$ magctl service restart inventory-manager
What just happened:
magctl service restart <service>triggers a container restart inside the existing pod. Network identity, mounted volumes, and pod metadata remain intact. The life-cycle hooks in the container runtime may run (preStop/postStart), and the orchestration layer reports the container restart event.
Real-world note: Soft restarts are preferred for minimal disruption; they avoid the overhead of pod recreation and help keep in-progress sessions if the service design supports it.
Verify:
maglev@catalyst-center:~$ magctl service status inventory-manager
Service: inventory-manager
State: running
Replicas: 1/1
Pod: inventory-manager-5f7c98d6b7-abcde
Node: node-1
Restarts: 1
Uptime: 00:00:30
Expected verification: Restarts increments (shows 1 or >0) and State is running. If State is not running, further investigation with logs is required.
Step 5: Hard restart (pod delete & recreate) — use with caution
What we are doing: We perform a hard restart (pod delete + recreate) of inventory-manager when a soft restart did not resolve the issue or when the pod is stuck. This is more disruptive — non-persistent in-pod data is lost.
maglev@catalyst-center:~$ magctl service restart -d inventory-manager
What just happened:
magctl service restart -d <service>deletes the pod and requests the orchestration system to create a fresh pod instance. Because the container and ephemeral filesystem are recreated, any non-persistent state inside the container vanishes. This often clears corrupted in-memory state or broken container images, at the cost of losing in-pod ephemeral data.
Warning: In case of hard restart, pod is deleted && re-created = non-persistant storage/inside container app data loss!
Verify:
maglev@catalyst-center:~$ magctl service status inventory-manager
Service: inventory-manager
State: running
Replicas: 1/1
Pod: inventory-manager-6d8a1c3f2b-fghij
Node: node-2
Restarts: 0
Uptime: 00:00:15
Expected verification: Pod identifier changed (new pod name), Restarts may be 0 for the new pod, and State should return to running. If it does not, check logs and underlying dependencies (e.g., postgres).
Verification Checklist
- Check 1: Confirm you can SSH to Catalyst Center and run
magctl— verify withwhoamiandhostname. - Check 2: Validate the appstack status shows all critical services
running— verify withmagctl appstack status. - Check 3: Confirm you can retrieve and follow service logs for
inventory-manager— verify withmagctl service logs -r inventory-managerandmagctl service logs -rf inventory-manager. - Check 4: After a restart (soft or hard), confirm service returns to
runningand pod identity reflects the expected action — verify withmagctl service status <service-name>.
Common Mistakes
| Symptom | Cause | Fix |
|---|---|---|
| Unable to SSH on port 2222 | Using wrong SSH port or blocked by firewall | Ensure SSH uses -p 2222 and management ACLs allow access to Catalyst Center_IP on TCP/2222 |
magctl appstack status shows service in pending | Pod scheduling issues or resource constraints | Check node capacity, check magctl service status <service> and service logs to determine why pod is pending |
| Log output shows repeated timeouts with devices | Downstream device/network reachability problems | Validate network reachability to devices, and inspect provisioning workflow logs (provisioning-service, task-service) |
| After hard restart, application data missing | Service relied on ephemeral in-pod storage | Restore from persistent storage or re-run provisioning tasks; avoid using hard restart for stateful services without backups |
Key Takeaways
magctlis the primary operational tool for inspecting Catalyst Center microservices; it exposes status, logs, and restart controls that map to pod/container lifecycle operations.- Always check service logs before restarting; logs reveal root-cause symptoms (dependency errors, timeouts) and help choose soft vs. hard restart.
- Soft restart reboots the container in-place; hard restart deletes and recreates the pod — this can cause loss of non-persistent in-pod data.
- In production, respect service dependencies (inventory → postgres, provisioning → orchestration → spf-service). Restarting a dependent service without checking those relationships can cause cascading failures.
Important: Think of namespaces as separate virtual clusters inside your physical cluster, and pods as the containers doing the work. Knowing where to look (service status, logs, restart behavior) parallels knowing which circuit breaker to check in a power distribution system — it isolates failures with minimal collateral impact.
End of Lesson 1 of 5 — "Catalyst Center Service Architecture".