Lesson 1 of 5

Catalyst Center Service Architecture

Objective

Understand the Catalyst Center microservices architecture, how core services relate to each other, and the primary operational tooling for service inspection, logging, and restarts. This lesson teaches the practical CLI workflow (via SSH and magctl) to inspect service health, view logs, and restart services safely — skills used in production to recover failing microservices, validate dependency issues, and minimize downtime during troubleshooting.

In production, Catalyst Center runs as a set of microservices (Kubernetes pods/containers). Knowing how to check service health, follow logs, and perform controlled restarts is essential for maintaining Inventory, Provisioning, and other critical automation workflows. Example: when a device provisioning run fails, an engineer uses magctl to trace the failure across provisioning-service → task-service → orchestration-engine and restart the appropriate service without impacting unrelated workloads.


Topology & Device Table

Network Topology Diagram

Key Concepts (theory and protocol behavior)

  • Microservices & Pods

    • Theory: A pod is a group of one or more containers that share network and storage. Multiple pods implementing a service are grouped under a Kubernetes Service abstraction.
    • Practical: When you inspect a service via magctl, you are typically inspecting pods (containers) running that service.
  • Namespaces (isolation)

    • Theory: A namespace provides logical isolation within a cluster — like having separate virtual clusters inside a single physical cluster.
    • Practical: Catalyst Center maps application stacks to namespaces; this affects resource scoping and troubleshooting boundaries.
  • Nodes & High Availability

    • Theory: Nodes are worker machines that host pods. In an HA cluster, pods of a namespace are spread across nodes to tolerate node failures.
    • Practical: A service “running” can still be unhealthy if its pod is repeatedly rescheduled across nodes — logs and restart counts reveal that.
  • maglev vs magctl

    • Theory: maglev is the SSH user and includes some tooling; magctl is the primary command used to monitor and manipulate system services (similar to kubectl but tailored to this environment).
    • Practical: You SSH as maglev to run magctl appstack status, inspect logs, and restart services.
  • Service lifecycle & restarts

    • Theory: A soft restart restarts the container while keeping the pod object; a hard restart deletes and recreates the pod (non-persistent container storage is lost).
    • Practical: Use soft restarts for routine service reloads; use hard restarts only when a full pod replacement is required and you accept state loss.

Step-by-step configuration (operational steps)

Step 1: SSH into the Catalyst Center management node

What we are doing: We establish an SSH session to the Catalyst Center as the maglev user on TCP port 2222. This gives us the CLI environment where magctl runs. SSH is the management plane access for operations.

ssh maglev@Catalyst Center_IP -p 2222

What just happened:

  • The SSH client initiated a TCP connection to Catalyst Center_IP on port 2222.
  • Upon successful authentication as user maglev, you receive a shell where magctl and other management tools are available. This step is required because magctl is intended to be executed from the Catalyst Center management environment.

Real-world note: In production, SSH to management endpoints is often restricted by bastion hosts and MFA. Ensure proper audit logging and least-privilege access.

Verify:

maglev@catalyst-center:~$ whoami
maglev
maglev@catalyst-center:~$ hostname
catalyst-center
maglev@catalyst-center:~$ 

Expected output above shows you are on the Catalyst Center host as maglev and can proceed to use magctl.


Step 2: Check overall appstack status

What we are doing: Query the cluster/application stack to get the health status of the running microservices using magctl appstack status. This establishes a baseline of which services are desired vs. current and whether any service is degraded.

maglev@catalyst-center:~$ magctl appstack status

What just happened:

  • magctl appstack status requests the orchestration layer for the desired and current state of each application service.
  • The command reports service-level health (running, pending, crashed) allowing you to identify problematic services at a glance.

Verify:

maglev@catalyst-center:~$ magctl appstack status
APPSTACK: catalyst-center
Service Name             Desired  Current  Status
postgres                 1        1        running
inventory                1        1        running
inventory-manager        1        1        running
provisioning-service     1        1        running
task-service             1        1        running
spf-service              1        1        running

Expected output: A table showing each service with Desired and Current replica counts and a Status column. Any mismatch or non-running status indicates a problem.


Step 3: Retrieve service logs (raw, tail, and follow)

What we are doing: We fetch logs for a service (example: inventory-manager) to inspect recent events and error traces. We demonstrate raw full logs, tailing the last N lines, and following live logs.

maglev@catalyst-center:~$ magctl service logs -r inventory-manager
maglev@catalyst-center:~$ magctl service logs -r inventory-manager | tail -n 50
maglev@catalyst-center:~$ magctl service logs -rf inventory-manager

What just happened:

  • magctl service logs -r <service> prints raw logs for the requested service.
  • Piping to tail -n 50 limits output to the most recent 50 lines for focused troubleshooting.
  • magctl service logs -rf <service> follows logs in real time (like tail -f), useful when reproducing an issue.

Real-world note: Follow logs during repro to capture transient errors that disappear after a restart.

Verify:

maglev@catalyst-center:~$ magctl service logs -r inventory-manager
2025-03-15T09:12:23Z INFO  [inventory-manager] Starting inventory-manager v1.12.3
2025-03-15T09:12:25Z INFO  [inventory-manager] Connected to postgres at postgres:5432
2025-03-15T09:14:07Z WARN  [inventory-manager] Device sync delayed: retrying in 10s
2025-03-15T09:14:17Z ERROR [inventory-manager] Failed to parse device response from 10.0.0.45: timeout
2025-03-15T09:15:01Z INFO  [inventory-manager] Processing backlog: 12 devices

Or tail last 50 lines:

maglev@catalyst-center:~$ magctl service logs -r inventory-manager | tail -n 50
2025-03-15T09:14:07Z WARN  [inventory-manager] Device sync delayed: retrying in 10s
2025-03-15T09:14:17Z ERROR [inventory-manager] Failed to parse device response from 10.0.0.45: timeout
2025-03-15T09:15:01Z INFO  [inventory-manager] Processing backlog: 12 devices

Follow live logs (shows streaming lines; sample snapshot):

maglev@catalyst-center:~$ magctl service logs -rf inventory-manager
2025-03-15T09:15:02Z INFO  [inventory-manager] Sync worker started
2025-03-15T09:15:04Z INFO  [inventory-manager] Device 10.0.0.22: sync complete

Step 4: Soft restart a service (container restart)

What we are doing: We perform a soft restart of inventory-manager to reload the container without deleting the pod. This is useful for ephemeral fixes (e.g., clearing transient memory leaks) while preserving pod identity.

maglev@catalyst-center:~$ magctl service restart inventory-manager

What just happened:

  • magctl service restart <service> triggers a container restart inside the existing pod. Network identity, mounted volumes, and pod metadata remain intact. The life-cycle hooks in the container runtime may run (preStop/postStart), and the orchestration layer reports the container restart event.

Real-world note: Soft restarts are preferred for minimal disruption; they avoid the overhead of pod recreation and help keep in-progress sessions if the service design supports it.

Verify:

maglev@catalyst-center:~$ magctl service status inventory-manager
Service: inventory-manager
State: running
Replicas: 1/1
Pod: inventory-manager-5f7c98d6b7-abcde
Node: node-1
Restarts: 1
Uptime: 00:00:30

Expected verification: Restarts increments (shows 1 or >0) and State is running. If State is not running, further investigation with logs is required.


Step 5: Hard restart (pod delete & recreate) — use with caution

What we are doing: We perform a hard restart (pod delete + recreate) of inventory-manager when a soft restart did not resolve the issue or when the pod is stuck. This is more disruptive — non-persistent in-pod data is lost.

maglev@catalyst-center:~$ magctl service restart -d inventory-manager

What just happened:

  • magctl service restart -d <service> deletes the pod and requests the orchestration system to create a fresh pod instance. Because the container and ephemeral filesystem are recreated, any non-persistent state inside the container vanishes. This often clears corrupted in-memory state or broken container images, at the cost of losing in-pod ephemeral data.

Warning: In case of hard restart, pod is deleted && re-created = non-persistant storage/inside container app data loss!

Verify:

maglev@catalyst-center:~$ magctl service status inventory-manager
Service: inventory-manager
State: running
Replicas: 1/1
Pod: inventory-manager-6d8a1c3f2b-fghij
Node: node-2
Restarts: 0
Uptime: 00:00:15

Expected verification: Pod identifier changed (new pod name), Restarts may be 0 for the new pod, and State should return to running. If it does not, check logs and underlying dependencies (e.g., postgres).


Verification Checklist

  • Check 1: Confirm you can SSH to Catalyst Center and run magctl — verify with whoami and hostname.
  • Check 2: Validate the appstack status shows all critical services running — verify with magctl appstack status.
  • Check 3: Confirm you can retrieve and follow service logs for inventory-manager — verify with magctl service logs -r inventory-manager and magctl service logs -rf inventory-manager.
  • Check 4: After a restart (soft or hard), confirm service returns to running and pod identity reflects the expected action — verify with magctl service status <service-name>.

Common Mistakes

SymptomCauseFix
Unable to SSH on port 2222Using wrong SSH port or blocked by firewallEnsure SSH uses -p 2222 and management ACLs allow access to Catalyst Center_IP on TCP/2222
magctl appstack status shows service in pendingPod scheduling issues or resource constraintsCheck node capacity, check magctl service status <service> and service logs to determine why pod is pending
Log output shows repeated timeouts with devicesDownstream device/network reachability problemsValidate network reachability to devices, and inspect provisioning workflow logs (provisioning-service, task-service)
After hard restart, application data missingService relied on ephemeral in-pod storageRestore from persistent storage or re-run provisioning tasks; avoid using hard restart for stateful services without backups

Key Takeaways

  • magctl is the primary operational tool for inspecting Catalyst Center microservices; it exposes status, logs, and restart controls that map to pod/container lifecycle operations.
  • Always check service logs before restarting; logs reveal root-cause symptoms (dependency errors, timeouts) and help choose soft vs. hard restart.
  • Soft restart reboots the container in-place; hard restart deletes and recreates the pod — this can cause loss of non-persistent in-pod data.
  • In production, respect service dependencies (inventory → postgres, provisioning → orchestration → spf-service). Restarting a dependent service without checking those relationships can cause cascading failures.

Important: Think of namespaces as separate virtual clusters inside your physical cluster, and pods as the containers doing the work. Knowing where to look (service status, logs, restart behavior) parallels knowing which circuit breaker to check in a power distribution system — it isolates failures with minimal collateral impact.


End of Lesson 1 of 5 — "Catalyst Center Service Architecture".