Lesson 6 of 7

SD-WAN Scalability and Design

Objective

In this lesson you will learn SD‑WAN scalability and design patterns: why full‑mesh, hub‑and‑spoke, and regional‑hub topologies are selected, how they affect control and data plane scale, and how to implement and verify a simple hub‑spoke + regional‑hub design on IOS routers to demonstrate the routing and policy consequences. This matters in production because tunnel counts, control plane state, and security inspection all impact CPU, memory, and failure domains — making the right topology essential for predictable performance in large enterprises.

Real-world scenario: A global enterprise has 200 branch sites and three regional data centers. Direct full‑mesh tunnels between every branch would overwhelm device resources and complicate inspection policies. Instead, the network team must choose regional hubs to reduce tunnel counts and centralize security inspection while preserving optimal SaaS performance where needed.


Quick Recap

We use the same topology introduced in Lesson 1. No new devices are added in this lesson — we focus on topology choices and routing/design changes. The devices in this lesson are:

  • Controller (management/controller)
  • WAN‑HUB (global/wide area hub)
  • REGIONAL‑HUB (regional concentrator)
  • BRANCH1 (edge branch)

ASCII topology with exact IP addresses on every interface:

                                Controller
                                (CTRL)
                         +---------------------+
                         | Gi0/0 192.168.100.1 |
                         +----------+----------+
                                    |
                                    | 192.168.100.0/24
                                    |
                         +----------+----------+
                         |      WAN-HUB        |
                         | Gi0/2 192.168.100.2 |
                         | Gi0/0 10.0.0.1/30   |
                         | Gi0/1 10.0.1.1/30   |
                         +----+-------------+--+
                              |             |
              10.0.0.0/30     |             | 10.0.1.0/30
                              |             |
                      +-------+--+      +---+-------+
                      | REGIONAL  |      |  BRANCH1  |
                      | HUB       |      | Gi0/1 10.0.1.2 |
                      | Gi0/0 10.0.0.2|  +---------------+
                      +---------------+

Device IP addressing summary:

DeviceInterfaceIP AddressRole
ControllerGi0/0192.168.100.1/24Controller/management
WAN-HUBGi0/2192.168.100.2/24Controller link
WAN-HUBGi0/010.0.0.1/30Link to REGIONAL-HUB
WAN-HUBGi0/110.0.1.1/30Link to BRANCH1
REGIONAL-HUBGi0/010.0.0.2/30Regional concentrator
BRANCH1Gi0/110.0.1.2/30Branch edge

Tip: For examples and credentials use lab.nhprep.com and password Lab@123 where a domain or password is required.


Key Concepts

Before touching the CLI, understand these principles — theory + practical implications:

  • Mesh complexity (O(N^2) tunnels): In a full mesh, every site maintains a tunnel to every other site. The number of tunnels grows as N*(N-1)/2. Protocol-level impact: each tunnel consumes CPU, RAM (state), DTLS/TLS sessions for control/crypto, and increases state to be maintained during churn. In production this matters because an increase in tunnels can saturate CPU and cause control instability.

  • Hub‑and‑spoke concentration: Branches point traffic to a hub. This reduces tunnel count dramatically (O(N)) and centralizes stateful services (NGFW, IPS, URL filtering). Packet flow: branch → hub (inspection/peering) → destination. In production this is used when a central security posture or regional peering is required.

  • Regional hubs (hierarchical design): Use multiple hubs by geography. This partitions the network into clusters. Benefit: reduces intercontinental hairpinning while keeping per-device scale within limits. Think of it like multiple “mini‑meshes” with central points of concentration.

  • Controller vs data plane separation: Controllers provide policy, orchestration and may push route policies, but data plane tunnels carry user traffic. Scalability decisions affect both: more devices => more control sessions to manage, and more tunnels => more data plane resource usage.

  • Security inspection cost: NGFW/IPS/URL‑filtering require memory and CPU. When you centralize inspection at a hub, the hub must be sized to handle peak concurrent sessions and throughput. In practice, design for sustained traffic plus bursts and use regional hubs to distribute load.

Analogy: Think of full mesh like every colleague calling every colleague individually — the phone system overloads quickly. Hub‑and‑spoke is like everyone calling a receptionist (hub) who connects or handles the call.


Step-by-step configuration

Each step below shows commands, explains why, and gives verification output you should see. All configuration examples use IOS-style commands.

Step 1: Configure basic interface addressing and loopbacks

What we are doing: Assign IP addresses to physical interfaces and create a loopback for stable router ID and management. This matters because stable IDs and correct IP addressing are prerequisites for routing adjacencies and for controller discovery.

WAN-HUB# configure terminal
WAN-HUB(config)# hostname WAN-HUB
WAN-HUB(config)# interface GigabitEthernet0/2
WAN-HUB(config-if)# ip address 192.168.100.2 255.255.255.0
WAN-HUB(config-if)# no shutdown
WAN-HUB(config-if)# exit
WAN-HUB(config)# interface GigabitEthernet0/0
WAN-HUB(config-if)# ip address 10.0.0.1 255.255.255.252
WAN-HUB(config-if)# no shutdown
WAN-HUB(config-if)# exit
WAN-HUB(config)# interface GigabitEthernet0/1
WAN-HUB(config-if)# ip address 10.0.1.1 255.255.255.252
WAN-HUB(config-if)# no shutdown
WAN-HUB(config-if)# exit
WAN-HUB(config)# interface Loopback0
WAN-HUB(config-if)# ip address 1.1.1.1 255.255.255.255
WAN-HUB(config-if)# exit
WAN-HUB(config)# end

What just happened: Each interface now has an IP and is administratively up; the loopback provides a stable identifier (useful for routing protocols and controller registration). At the protocol level, OSPF or BGP (configured later) will advertise the loopback as the router‑ID / reachability anchor. In SD‑WAN architectures, this loopback often represents the device ID used by the control plane.

Real-world note: Use loopbacks for stable router IDs; physical interfaces can flap and you will want your control plane identity to remain stable.

Verify:

WAN-HUB# show ip interface brief
Interface                  IP-Address      OK? Method Status                Protocol
GigabitEthernet0/0        10.0.0.1        YES manual up                    up
GigabitEthernet0/1        10.0.1.1        YES manual up                    up
GigabitEthernet0/2        192.168.100.2   YES manual up                    up
Loopback0                 1.1.1.1         YES manual up                    up

Step 2: Establish OSPF area 0 to demonstrate reachability and full‑mesh routing behavior

What we are doing: Enable OSPF on the hub, regional hub, and branch link IPs. OSPF simulates the overlay routing distribution and allows you to observe adjacency formation and route propagation across hubs and branches.

WAN-HUB# configure terminal
WAN-HUB(config)# router ospf 1
WAN-HUB(config-router)# router-id 1.1.1.1
WAN-HUB(config-router)# network 1.1.1.1 0.0.0.0 area 0
WAN-HUB(config-router)# network 10.0.0.0 0.0.0.3 area 0
WAN-HUB(config-router)# network 10.0.1.0 0.0.0.3 area 0
WAN-HUB(config-router)# network 192.168.100.0 0.0.0.255 area 0
WAN-HUB(config-router)# end

Repeat equivalent OSPF commands on REGIONAL‑HUB and BRANCH1 with router-ids 2.2.2.2 and 3.3.3.3 and their respective networks.

What just happened: OSPF starts sending Hello packets on all interfaces in the specified networks (default Hello interval 10s on broadcast networks). Adjacent routers that share a common area and interface parameters will form neighbor relationships and exchange LSAs, populating the link‑state database and building routes. This is a stand‑in for the SD‑WAN overlay where control plane learning distributes routes to endpoints.

Real-world note: In SD‑WAN, control plane overlays often handle routing abstractions; here OSPF lets you visualize route propagation and scaling effects (LSA count, SPF frequency).

Verify:

WAN-HUB# show ip ospf neighbor
Neighbor ID     Pri   State           Dead Time   Address         Interface
2.2.2.2          1    FULL/DR         00:00:34    10.0.0.2        GigabitEthernet0/0
3.3.3.3          1    FULL/ -         00:00:36    10.0.1.2        GigabitEthernet0/1
WAN-HUB# show ip route
Codes: C - connected, O - OSPF, S - static
C    10.0.0.0/30 is directly connected, GigabitEthernet0/0
C    10.0.1.0/30 is directly connected, GigabitEthernet0/1
C    192.168.100.0/24 is directly connected, GigabitEthernet0/2
O    10.0.0.2/32 [110/20] via 10.0.0.2, 00:00:36, GigabitEthernet0/0
O    10.0.1.2/32 [110/20] via 10.0.1.2, 00:00:34, GigabitEthernet0/1

Step 3: Implement hub‑and‑spoke routing (static default at branch pointing to hub)

What we are doing: Configure a default route on BRANCH1 that points to its hub (WAN‑HUB). This concentrates internet/centralized services traffic at the hub and illustrates the traffic path that enforces security/inspection.

BRANCH1# configure terminal
BRANCH1(config)# ip route 0.0.0.0 0.0.0.0 10.0.1.1
BRANCH1(config)# end

What just happened: Branch1 now forwards any traffic for unknown destinations to the hub (next hop 10.0.1.1). In production, this is how branches send user traffic to centralized security stacks or regional hubs. It reduces the need to build direct tunnels to every destination and simplifies policy enforcement.

Real-world note: If you centralize too much traffic you risk creating a hub bottleneck. Use regional hubs to distribute load geographically.

Verify:

BRANCH1# show ip route
Codes: C - connected, O - OSPF, S - static
C    10.0.1.0/30 is directly connected, GigabitEthernet0/1
O    10.0.1.1/32 [110/20] via 10.0.1.1, 00:02:12, GigabitEthernet0/1
S    0.0.0.0/0 [1/0] via 10.0.1.1
BRANCH1# show ip route 0.0.0.0
S    0.0.0.0/0 [1/0] via 10.0.1.1
     * Route source is a static route

Step 4: Simulate regional hub aggregation (OSPF summary or redistribution to reduce LSA churn)

What we are doing: On the regional hub, summarize or redistribute internal prefixes to limit growth of LSAs propagated to the central hub. This reduces link‑state database size seen by WAN‑HUB and cheaper SPF runs.

REGIONAL-HUB# configure terminal
REGIONAL-HUB(config)# router ospf 1
REGIONAL-HUB(config-router)# area 0 range 10.0.0.0 255.255.255.252
REGIONAL-HUB(config-router)# end

What just happened: The area range command summarizes the specified prefix so that the parent area (area 0) sees a single summary LSA instead of many host routes. In large deployments this reduces the number and size of LSAs that each hub must process, lowering CPU and memory footprint during SPF runs.

Real-world note: Summarization is a key tool when scaling routing domains. Use it to keep control plane state within device capacity.

Verify:

REGIONAL-HUB# show ip ospf database router
OSPF Router with ID (2.2.2.2) (Process ID 1)

Rtr Count: 3

Link ID         ADV Router      Age  Seq#  CkSum  Link count
2.2.2.2         2.2.2.2         300 0x80000002 0x00ab  3
REGIONAL-HUB# show ip route
Codes: C - connected, O - OSPF, S - static
C    10.0.0.0/30 is directly connected, GigabitEthernet0/0
O    10.0.0.1/32 [110/20] via 10.0.0.1, 00:01:02, GigabitEthernet0/0

Step 5: Apply a simple access‑list on the hub to illustrate centralized security policy

What we are doing: Configure an IP access‑list on WAN‑HUB to drop obvious malicious traffic (e.g., block Telnet) to simulate a minimal centralized security posture and show how hub policing may be used.

WAN-HUB# configure terminal
WAN-HUB(config)# ip access-list extended HUB-SECURITY
WAN-HUB(config-ext-nacl)# deny tcp any any eq 23
WAN-HUB(config-ext-nacl)# permit ip any any
WAN-HUB(config-ext-nacl)# exit
WAN-HUB(config)# interface GigabitEthernet0/1
WAN-HUB(config-if)# ip access-group HUB-SECURITY in
WAN-HUB(config-if)# end

What just happened: The ACL denies Telnet (TCP 23) on traffic entering Gi0/1 (from branch). Applied at the hub, this provides a simple example of central policy enforcement. In production, stateful NGFWs/IPS provide far richer inspection (application awareness, user identity, URL filtering), but ACLs remain useful for quick filters or as a first‑drop mechanism.

Real-world note: Stateful inspection and URL filtering require more resources than a simple ACL. When centralizing inspection, ensure the hub platform supports NGFW/IPS capacity consistent with expected throughput.

Verify:

WAN-HUB# show access-lists HUB-SECURITY
Extended IP access list HUB-SECURITY
    deny tcp any any eq 23
    permit ip any any

Verification Checklist

  • Check 1: OSPF adjacencies are formed — verify with show ip ospf neighbor on WAN‑HUB and expect neighbor entries for 2.2.2.2 and 3.3.3.3.
  • Check 2: Branch default route points to the hub — verify with show ip route on BRANCH1 and expect S 0.0.0.0/0 via 10.0.1.1.
  • Check 3: Regional summarization is in effect — verify with show ip ospf database on REGIONAL‑HUB and expect summarized LSAs or reduced LSA counts.
  • Check 4: Hub ACL is present and applied — verify with show access-lists HUB-SECURITY and show ip interface GigabitEthernet0/1 to confirm the ACL is inbound.

Common Mistakes

SymptomCauseFix
OSPF neighbors never reach FULLMismatched area or network statement, or interface downVerify show ip interface brief; ensure router ospf network statements include the interface and area is consistent
Branch traffic bypasses hub (direct internet)Missing default route on branch or incorrect next‑hopConfigure ip route 0.0.0.0 0.0.0.0 <hub-ip> and verify with show ip route
Hub CPU spikes under loadCentralized inspection without proper sizingDistribute inspection to regional hubs or resize hub platform; summarize routes to reduce SPF churn
ACL blocks unintended trafficOverly broad ACL sequence or wrong directionCheck show access-lists and show ip interface to ensure ACL entries and direction are correct; test with targeted pings

Key Takeaways

  • Full mesh creates O(N^2) tunnel/state growth; it is simple but does not scale for hundreds of sites.
  • Hub‑and‑spoke reduces tunnel counts and centralizes security, but hubs must be sized to handle aggregated traffic and inspection.
  • Regional hubs provide a practical compromise — they localize traffic and inspection, balancing performance and manageability.
  • Always plan controller/data‑plane separation and control‑plane state (LSAs, sessions) when designing for scale: summarize and limit what each device must process.

Final real-world reminder: In production SD‑WAN deployments, choose topology and hub sizing based on measured concurrent sessions and throughput requirements. Design for growth and failure — regional distribution of services reduces single points of failure and keeps control plane churn manageable.