SD-WAN Controller Architecture
SD-WAN Controller Architecture
Introduction
Catalyst SD-WAN is built on a controller-based architecture that separates the management, control, and data planes into distinct components. Understanding how these controllers work together — and how they communicate with the WAN Edge routers — is the foundation for every other topic in this course.
In this lesson, we break down the three core controllers: vManage, vBond, and vSmart. You will learn what role each one plays, how they establish secure connections with WAN Edge devices, and how policy updates travel from the controller all the way into the router's data plane. By the end, you will be able to describe the controller communication flow, verify controller connectivity, and understand the internal processing path that makes SD-WAN policy enforcement possible.
Key Concepts
The Three Controllers
Every Catalyst SD-WAN fabric relies on three controller types working in concert. Each has a specific responsibility, and all three must be healthy for the overlay to function.
| Controller | Primary Role | Key Responsibility |
|---|---|---|
| vManage | Management Plane | Centralized GUI and API-based management, monitoring, configuration templates, software upgrades, and disaster recovery orchestration |
| vBond | Orchestration Plane | Initial authentication and orchestration; introduces WAN Edge devices to vManage and vSmart; validates serial numbers |
| vSmart | Control Plane | Distributes routing information and centralized policies to WAN Edge routers via the Overlay Management Protocol (OMP) |
Key Daemons on Controllers
The controllers run several important SD-WAN daemons that handle specific functions:
- vDaemon — The core daemon responsible for establishing and maintaining DTLS/TLS control connections between the WAN Edge and the controllers. It processes hello packets and tracks how many vSmart and vManage instances the device is connected to.
- OMPd — The Overlay Management Protocol daemon that manages the OMP TCP sessions used to exchange routing information and policies.
- FTMd — Facilitates tunnel management functions.
- Confd — A configuration daemon that stores and manages the device configuration database.
- TTMd — Handles tunnel and transport management tasks.
Control Connection Security
All control-plane communication between WAN Edge devices and the controllers is encrypted using DTLS (Datagram Transport Layer Security) or TLS (Transport Layer Security). The OMP session itself runs as a cleartext TCP session, but it is encapsulated inside the DTLS or TLS control connection, ensuring end-to-end encryption for all policy and routing exchanges.
How It Works
Controller Discovery and Authentication
When a WAN Edge router boots, it must first locate and authenticate with the controllers. The process begins with vBond:
- The WAN Edge contacts vBond using its configured organization name and a certificate-based identity.
- vBond validates the device's serial number against its authorized serial list.
- Once validated, vBond provides the WAN Edge with the IP addresses of the available vManage and vSmart controllers.
- The WAN Edge then establishes DTLS or TLS control connections directly to vManage and vSmart.
The vBond orchestrator maintains a valid vSmart list that tracks all authorized controller serial numbers. This list must be consistent across all vBond instances. You can verify it with:
show orchestrator valid-vsmart
The total count in this output should equal the sum of vManage nodes plus vSmart nodes in the fabric.
OMP Centralized Policy Update Flow
One of the most important processes to understand is how a centralized policy update travels from vSmart into the WAN Edge data plane. This is a multi-step process that touches several internal components of the IOS-XE architecture.
Step 1 — DTLS/TLS Session to vDaemon
The WAN Edge maintains encrypted control connections (DTLS or TLS) to the controllers. All inbound control traffic — including OMP updates — arrives over these encrypted tunnels and is handed to the vDaemon process running in the Linux kernel space.
Step 2 — vSmart Sends Policy Update Over OMP
vSmart transmits the centralized policy update over the OMP TCP session. On the WAN Edge, the OMP TCP packets are decapsulated from the DTLS/TLS control connection by vDaemon and passed to OMPd for processing.
Step 3 — OMPd Processes and Hands Off to Confd
The OMP daemon receives the policy in XML format. It prunes any unsupported configurations and commits the relevant policy elements to Confd, the configuration database daemon.
Step 4 — FPMd and FMAN Processing
The Forwarding Policy Manager Daemon (FPMd) handles configuration callbacks from Confd, parses the configuration, populates its internal data structures, and pushes the information further down the stack. The update flows through two layers of the Forwarding Manager:
- FMAN-RP (Route Processor level) — Acts as a pass-through layer that bridges the control plane and the data plane.
- FMAN-FP (Forwarding Processor level) — Sets up object dependencies between objects and programs the data plane.
Step 5 — Policy Installed into the Data Plane (QFP)
The policy is ultimately installed into the Quantum Flow Processor (QFP), also known as the Cisco Packet Processor (CPP). The QFP uses the Feature Invocation Array (FIA) for traffic processing, which implements packet processing features including data policies, Application-Aware Routing (AAR) policies, ACLs, QoS marking, and security policies.
Key Point: The same architecture applies across all IOS-XE platforms. The Route Processor (RP) runs the Linux kernel and handles control-plane processes and inter-process communication. The RP programs the data plane. The data plane (QFP) uses the FIA for all transit traffic processing.
Policy Programming Verification
When verifying that a policy has been properly programmed, you need to understand a few essential terms:
| Term | Meaning |
|---|---|
| AOM | Asynchronous Object Manager — a control-plane mechanism that allows processes to continue with other tasks without waiting for inter-process communication operations to finish |
| AOM State: Done | The object has been successfully programmed — this is the expected good state |
| AOM State: Pending | The object is still waiting to be programmed — this indicates a potential problem |
| Class-group | Equivalent to a policy in SD-WAN terminology |
| Class | Equivalent to a policy sequence |
The complete policy chain should be verifiable from OMP, through FPMD, to FMAN, and finally into the QFP data plane. If any link in that chain shows a "Pending" AOM state, the policy has not been fully installed.
Hello Mechanism and Controller Counting
The vDaemon process uses hello packets to maintain awareness of controller availability. Each hello exchange reports:
- The current vSmart count and the count received from vBond
- The current vManage count and the count received from vBond
- The number of vSmart controllers currently connected versus the maximum allowed
If there is a discrepancy between the controller count reported by vBond and the count the WAN Edge actually sees, it can trigger connectivity issues such as BFD session flaps.
Configuration Example
Verifying Controller Connections
To check the local control-plane properties and see the status of each transport interface's controller connections:
show sdwan control local-properties
Sample output shows each interface, its local and public IP addresses, the number of controllers connected, color, and uptime:
GigabitEthernet0/0/0 192.168.3.2 12426 192.168.3.2 :: 12426 2/1 mpls up 2 yes/yes/no No/No 4:23:26:05 0:00:33:32
In this output, 2/1 indicates 2 vSmart controllers and 1 vManage are connected on this transport.
Verifying BFD Sessions
To confirm the status and uptime of BFD sessions to remote sites:
show sdwan bfd sessions
1.1.1.10 50 up mpls mpls 192.168.4.2 10.2.1.5 12346 ipsec 7 1000 0:00:01:28 1
A very low uptime (such as 0:00:01:28 above) can indicate a recent flap that warrants investigation.
Debugging Controller Communication
When you need to trace vDaemon activity to understand controller hello exchanges and troubleshoot connectivity issues:
debug platform software sdwan vdaemon all
show logging process vdaemon internal
The debug output reveals the hello exchange details, including how many vSmart and vManage controllers the device knows about versus how many it is actually connected to. Look for lines showing the controller counts:
Current VSmart count 2, new VSmart count 2, Current VManage count 1, new VManage count 0
Current vSmarts connected to 2, max-controllers 2
Current Valid vsmart count 3, new Valid vSmart count 3
If the "new VManage count" drops to 0 while the "Current VManage count" is 1, this signals that vBond is reporting a different controller count than what the edge device currently sees, which can cause the device to re-evaluate its connections.
Real-World Application
Disaster Recovery and Cluster Management
In production deployments, vManage is often deployed as a cluster for high availability. Disaster recovery (DR) configurations use primary and secondary data centers, each running vManage nodes. Before any DR failover, you must verify:
- All nodes in the cluster report a state of "Ready"
- All services (statistics-db, application-server, messaging-server, configuration-db) are reachable and show true
- The valid vSmart serial list is identical across all vBond orchestrators
- Replication status between primary and secondary data centers shows "success"
Preventing Control-Plane Disruptions
A known issue in earlier software versions involved BFD flaps caused by PAT (Port Address Translation) on control-plane connections. When multiple WAN Edge devices share a single public IP through PAT, port changes can disrupt BFD sessions. The recommended approaches are:
- Use NAT with dedicated public IPs instead of PAT for control-plane connections (though this requires one public IP per edge device)
- On private-color TLOCs, configure no carrier and no port-hop under the tunnel interface to prevent unnecessary port changes
Best Practice: Always ensure the same troubleshooting tools and methodology apply across all IOS-XE platforms. Whether you are working on a physical ISR, ASR, or Catalyst 8000, the internal architecture — RP, QFP, FMAN, AOM — follows the same pattern. Master the verification flow once and you can apply it everywhere.
Design Considerations
- Deploy a minimum of two vSmart controllers for redundancy; the
max-controllerssetting on WAN Edge devices governs how many vSmarts can be active simultaneously. - Place vBond orchestrators in locations reachable by all WAN Edge devices, since vBond is always the first point of contact.
- Monitor AOM states proactively — a "Pending" state that persists indicates a policy programming failure that will leave the data plane out of sync with the intended configuration.
Summary
- Three controllers form the foundation of Catalyst SD-WAN: vManage (management), vBond (orchestration), and vSmart (control plane via OMP).
- Control connections between WAN Edge and controllers use DTLS or TLS encryption; OMP runs as a TCP session encapsulated inside these secure tunnels.
- Policy updates flow from vSmart through OMP to vDaemon, then through Confd, FPMd, FMAN-RP, FMAN-FP, and finally into the QFP data plane where the Feature Invocation Array (FIA) enforces them.
- AOM verification is essential: a "Done" state confirms successful policy programming, while "Pending" signals a problem that needs investigation.
- All IOS-XE platforms share the same architecture, meaning the same troubleshooting tools and verification steps apply regardless of the hardware platform.
In the next lesson, we will explore how OMP routes and policies are exchanged in detail, including TLOC routes, service routes, and the policy constructs that control traffic flow across the SD-WAN fabric.