SD-Access Fabric Deployment: From Design to Day-2 Operations

Introduction

You have read the validated design guides, watched every fundamentals overview, and spun up a proof-of-concept in the lab. Now the question is: where do you actually start when it is time to deploy SD-Access in a production environment with thousands of endpoints, hundreds of switches, and multiple business units sharing the same physical infrastructure? SD-Access deployment is not a single step -- it is a structured journey that begins with careful requirements gathering, moves through deliberate design decisions, and continues well past the initial turn-up into ongoing day-2 operations.

This article walks you through that entire journey. We will use a realistic enterprise scenario -- a large manufacturing organization with 25,000 users, 45,000 concurrent devices, 2,100 access switches, and 70 remote branch offices -- to illustrate every decision point. Whether you are planning your first SD-Access fabric site or scaling an existing deployment to enterprise proportions, the principles, scalability limits, and operational patterns covered here apply directly to your environment.

By the end of this guide you will understand how to collect and size your requirements, carve fabric sites, choose underlay and overlay options, connect to upstream services through a fusion firewall, plan a phased migration, and avoid the most common pitfalls that derail real-world SDA fabric design projects. For hands-on practice with these concepts, explore the SD-Access Fabric Lab course on NHPREP.

What Are the Key Requirements for SD-Access Deployment?

Before you provision a single switch, you need a complete picture of what the fabric must support. SD-Access introduces scalability considerations at two levels: deployment-wide and site-level. Missing either one leads to rework later.

Deployment-Wide Requirements (Catalyst Center)

At the broadest level, your Catalyst Center cluster (formerly DNA Center) must accommodate:

Total number of endpoints -- both concurrent and transient devices across every fabric site.
Total number of network devices -- switches, controllers, and access points under management.
Total number of interfaces -- every physical and logical interface counts toward Catalyst Center limits (the deployment-wide limit is 1.5 million physical plus logical interfaces).
IP pools and Layer 2 overlays -- each IP pool creates two logical interfaces (an SVI and a LISP tunnel) on every switch in the fabric site where it is provisioned.

Site-Level Requirements

At each individual fabric site you need to quantify:

Concurrent endpoints -- IPv4, IPv6, wired, and wireless counts separately.
Round-trip latency to controllers (Catalyst Center, ISE, WLC).
IP pools and Layer 2 handoffs required at the site.
Physical fabric device count -- the number of switches that will hold Edge, Border, or Control Plane roles.

All of these scalability limits are documented in the Catalyst Center data sheet, but applying abstract numbers to a real design for the first time is challenging. That is exactly why walking through a concrete scenario is so valuable.

Pro Tip: Once you have collected high-level requirements -- number of sites, number of endpoints, Catalyst Center sizing, ISE nodes, wired vs. wireless split -- feed them into the SD-Access Design Tool available at cs.co/sda-design-tool. It generates a high-level design document automatically.

A Real-World Scenario

Consider a large manufacturing organization undergoing a legacy network refresh. The main campus has three sub-areas interconnected via dark fibre in a ring topology:

Parameter	Value
Users	25,000
Concurrent devices	45,000
Access switches	2,100 (WS-C2960X) in 1,300 cabinets
Wireless APs	5,200 (AIR-CAP3702I)
VLANs	700 (user and device segmentation)
L3 boundary	Distribution layer
Segmentation	MPLS, DC firewall enforcement
Remote sites	70 small offices via MPLS WAN

Each remote site has a single switch and one or two APs. Two on-site active/active data centres provide application hosting, internet access, and public cloud peering.

How Do You Carve Fabric Sites in an SD-Access Deployment?

A fabric site is an instance of an SD-Access fabric -- a collection of Edge Node switches that share the same set of Control Plane (CP) and Border Node (BN) switches. It is the fundamental building block of every deployment.

What Defines a Fabric Site?

Sites are typically defined by geographic location, but geography is not the only factor. You should also consider:

Endpoint scale -- each platform has a maximum number of endpoints it can handle in CP and BN roles.
Failure domain scoping -- all Edge Nodes share the same CP and BN nodes; if all CP or BN nodes fail, the site fails.
Underlay connectivity attributes -- mixing links with different MTU or multicast capabilities forces the lowest common denominator.

Sites are interconnected by a Transit that carries VRF and SGT information in VXLAN headers between Border Nodes.

Endpoint Scale Limits

The CP node keeps information about every site endpoint in RAM and uses CPU to process that data (including wireless roaming events). The BN keeps all endpoint information in TCAM as host routes.

Platform	CP Node Limit (Endpoints)	BN Node Limit (Host Routes)
C9300/L	16,000 EPs	16,000 (/32 or /128)
C9500-32C / C9500-48Y4C / C9500-24Y4C	80,000 EPs	150,000 (/32 or /128)

If an endpoint has multiple IP addresses (for example, one IPv4 and several IPv6 addresses), each address counts as an individual entry on the Border Node.

Failure Domain Considerations

Several important behaviors are tied to the site boundary:

Configuration elements such as VRF, VLAN, multicast settings, wireless policies, and default switchport policies are applied at the site level to all fabric switches simultaneously.
A fabric site runs a single instance of underlay IGP and overlay LISP and appears as a single BGP AS externally.
A site with fabric wireless can have a maximum of 2 CP nodes.
During a total CP failure, no new endpoints can be onboarded and roaming events will not work. However, existing traffic flows remain cached for 24 hours.

Pro Tip: Some configuration changes can be scoped to a subset of switches using Fabric Zones, which are child sites of a parent fabric site. Edge nodes, Extended Nodes, and Policy Extended Nodes can be assigned to Fabric Zones for more granular IP pool provisioning.

Underlay Connectivity Attributes

Avoid mixing links with different MTU or multicast support within a single fabric site. For example, if part of your site uses dark fibre links supporting 9,000-byte MTU and multicast while another part relies on a radio link limited to 1,500-byte MTU with no multicast, you will be forced to the lowest common denominator -- effectively crippling the VXLAN overlay.

When Should You Use a Single Fabric Site vs. Multiple Sites?

This is one of the most consequential SD-Access deployment decisions. The general guidance is: build a large single fabric site within a single geographic area until you hit one of these triggers:

You reach the fabric device limit (1,200 logical switches for Catalyst Center XL) or the endpoint limit (approximately 100,000 EPs).
Links between parts of your fabric site cannot support increased MTU (from 1,550 to 9,000 bytes) or cannot be multicast-enabled.
Part of your fabric site needs to remain online even if the rest of the site is offline (independent failure domain).
Part of your fabric site needs to provide Direct Internet Access (DIA) for overlay users.

Applying This to ACME

With 2,100 access switches in 1,300 switch cabinets, ACME cannot build a single fabric site because 1,300 exceeds the 1,200-device limit per site. The solution is three fabric sites in the main campus, aligned with the three geographic sub-areas.

This multi-site approach introduces trade-offs:

No seamless wireless roaming across sites because an IP subnet can exist in only one site.
Each site needs its own set of WLCs and BN/CP nodes.
Additional switching hardware is required for SDA Transit CP nodes.

Handling 70 Small Remote Sites

For the 70 remote MPLS sites, there are two main options:

Attribute	Individual Sites (FIAB)	Stretched Site
Device SD-Access role	Fabric in a Box (FIAB)	Edge Nodes on-site, CP+BN in central location
Management overhead	High -- manage 70 sites individually	Low -- all changes on a single site
Survivability	High -- each site runs its own CP/BN	Low -- shared CP/BN nodes
Flexibility	High -- DIA and unique routing per site	Low -- single egress point at central location

For ACME, the remote sites do not have local server resources or DIA and access everything through the centralized data centre. The MPLS carrier supports MTU above 1,550 bytes, and the small branch sites do not require overlay multicast or Layer 2 flooding. A single "stretched" fabric site is the right choice -- VXLAN tunnels run over MPLS between site Edge Nodes and the central BN/CP switches.

Pro Tip: When using MPLS as the SD-Access underlay for a stretched site, remember that VXLAN adds 50 bytes of overhead. If the WAN MTU cannot be increased, use ip tcp adjust-mss to avoid fragmentation. This only helps with TCP traffic.

What External Dependencies Does SD-Access Deployment Require?

Before you spin up your first fabric site, you need several external services in place. All of them reside outside the fabric site and require only Layer 3 IP connectivity to fabric devices.

Required Services

Catalyst Center -- the automation engine for SD-Access. It provisions underlay and overlay configurations, manages device lifecycle, and provides assurance dashboards.
DHCP / DNS -- required if you intend to provide these services to endpoints connecting through the fabric.
Cisco ISE -- required if you want to authenticate and authorize users or devices. One Catalyst Center cluster can integrate with only a single ISE cluster.
Wireless LAN Controller (WLC) -- required for wireless access. A WLC can enable fabric-enabled wireless for a single site only.
Fusion device -- typically a firewall, used to implement VRF route-leaking and enforce security policy at the leaking point.

Latency Requirements

Dependency	Maximum RTT
Catalyst Center to fabric devices	200 ms
ISE to fabric devices	100 ms
Fabric WLC to fabric APs	20 ms (deploy on-site)

The 20 ms RTT requirement for the WLC-to-AP path essentially mandates placing the wireless controller on-site or very close to the fabric site.

Catalyst Center Deployment

If high availability is required, deploy a 3-node cluster. Avoid splitting the three cluster nodes across two separate locations -- if you lose one location, a single node in the surviving site will shut down automatically.

For disaster recovery, deploy Catalyst Center in 1:1 or 3:3 mode. As of mid-2025, virtual Catalyst Center appliances (AWS or ESXi) do not have native HA or DR capabilities.

Deployment Mode	Failure Detection	Failover Time	Failback
HA Cluster	5 minutes	7-13 minutes	Automatic
Disaster Recovery	3 minutes	15-30 minutes	Manual

How Should You Design the SD-Access Underlay?

The underlay is the IP transport that carries all VXLAN-encapsulated overlay traffic. Getting it right is foundational.

What the Underlay Needs

For each SD-Access BN, CP, and Edge Node, you must:

Configure a Loopback0 interface with a /32 address.
Set increased MTU to accommodate the 50-byte VXLAN header overhead (minimum 1,550 bytes; 9,000 bytes within campus is ideal).
Configure vtp transparent and enable multicast routing.
Configure point-to-point routed links between each switch in the topology.
Enable a routing protocol so every switch can reach every other switch's Loopback0.
Enable PIM sparse-mode on each point-to-point link and Loopback0, and configure anycast ASM RP on CP/BN nodes.
Configure SNMP and SSH credentials.

LAN Automation vs. Manual (DIY) Underlay

You have two paths to build the underlay:

Feature	LAN Automation	Manual (DIY)
Approach	Turnkey automation	CLI template or manual CLI
Routing protocol	IS-IS (single Level-2 area)	Any (most organizations deploy OSPFv2)
IPv4 address allocation	Separate pools for loopbacks and P2P (2.3.5+)	Fully flexible
Multicast configuration	Yes	Yes
BFD configuration	Yes	Yes
STP configuration	Yes	Yes
MTU configuration	Yes	Yes
Customization (MACsec, BFD timers, IGP areas)	No	Yes

OSPF or IS-IS for the SD-Access Underlay?

LISP requires a /32 host route for the destination VTEP Loopback0 to be present in the forwarding table. The maximum tested and supported number of Catalyst 9000 switches in a single link-state protocol area is 250. More than 250 switches requires a multi-area deployment.

This is where the OSPF vs. IS-IS decision matters:

IS-IS Level-1 areas filter all inter-area prefixes (including Loopback0 host routes) and inject a default route instead. This breaks LISP lookups.
OSPF areas allow inter-area routes by default, making multi-area design straightforward.

The solutions for IS-IS are to implement multi-area design with Level-2 to Level-1 route leaking, or simply to use OSPF multi-area design instead. For ACME, OSPFv2 was chosen because the IT team had extensive OSPF experience and was not comfortable with IS-IS manual deployment at this scale.

Pro Tip: One of the most common struggles during SDA deployments involves underlay routing design. If you are not experienced with IS-IS, deploying it for the first time during a production fabric rollout adds unnecessary risk. Use the IGP your team knows well.

Underlay Multicast

Multicast in the underlay is no longer optional. It is required for:

Layer 2 flooding (broadcasts) in user overlays -- most deployments need this.
Layer 2 border functionality -- most deployments use this.
Multicast support in overlays.

For underlay RP placement:

Configure anycast RPs on BN/CP nodes.
Use a separate Loopback interface (not Loopback0) for the RP source.
Set up MSDP between the two Border Nodes / RPs.
Configure static RPs -- do not use BSR or Auto-RP.
Enable PIM sparse-mode on all point-to-point links and Loopback interfaces.

How Does the SD-Access Overlay Work in Practice?

The overlay is where endpoints live, segmented into Virtual Networks (VNs) and protected by Group-Based Policy (the rebranded TrustSec). Two overlay dimensions matter: unicast and multicast.

Overlay Unicast

Broadcasts are suppressed by default in SD-Access, which means you can safely create large subnets -- 10,000 hosts in a single IP pool is perfectly fine. Avoid migrating existing subnets "as is" into the fabric; doing so is one of the fastest ways to hit the subnet limit in Catalyst Center.

The limits are significant:

The sum of subnets and pure L2 overlays cannot exceed 1,000 per fabric site with Catalyst Center XL (200 and 600 in smaller appliance models).
Each IP pool creates 2 interfaces (SVI + LISP tunnel) on every switch in the fabric site. With 700 IP pools across 1,300 stacked switches, the interface count would be approximately 1,920,800 -- well above the 1.5 million deployment-wide limit.
Consolidating to 100 IP pools in the same 1,300-switch fabric produces roughly 360,800 interfaces, which is comfortably within limits.

Overlay Multicast

Overlay multicast requires a multicast-enabled underlay (head-end replication should be avoided for large fabrics). Key design points:

Overlay multicast is enabled per Virtual Network (VRF), not per IP pool, and requires a dedicated IP pool per multicast-enabled VN.
Both internal and external RPs are supported. Use an external RP when possible.
Multicast route-leaking is not supported on the Catalyst 9000 platform. If you have sources in one VN and receivers in another, use an external RP and perform route-leaking outside the fabric (for example, on the fusion device).
SD-Access supports all multicast flow variations: ASM, SSM (concurrently), and sources/receivers placed inside or outside the fabric.

Deployment Mode	Underlay Multicast	Best For	IPv6 Support
Head-End Replication	Not required	Fewer Edge Nodes	V4 and V6
Native Multicast	Required	Large number of Edge Nodes	V4 only

How Do You Connect SD-Access to Upstream Networks Through a Fusion Firewall?

The fusion firewall is where VRF route-leaking happens and security policy is enforced between Virtual Networks. Getting this connectivity right is critical, especially with active/active Border Nodes.

The Problem

With two active Border Nodes, each BN registers itself as an active gateway and advertises fabric subnets via BGP with identical AS-PATH length. The firewall receives two equal routes via two next-hops but typically installs only one. Half the traffic arrives at the firewall via the "wrong" interface and gets dropped.

Solution 1: Active/Passive Border Nodes

Make the Border Nodes active/passive:

Configure the primary BN to have a better LISP priority (lower is better; default is 10) as the fabric exit.
Configure the secondary BN to add AS-PATH prepend when advertising fabric subnets to the firewall.
Adjust underlay IGP metric to prefer the primary BN as the exit point.

Solution 2: ECMP on the Firewall

Configure Equal-Cost Multipathing (ECMP) on the firewall so both next-hops are installed in the forwarding table. Every mainstream firewall vendor supports this. FTD firewalls support ECMP from version 6.5 onward. This approach requires coordination with the firewall team and careful attention to multicast-ECMP interaction.

Solution 3: Intermediate L3 Hop

Insert another L3 hop (typically a stacked switch) between the Border Nodes and the firewall pair, presenting a single interface to the firewall per VRF. This creates a single logical point of failure, requires extra hardware, and adds operational complexity.

Solution 4: Stack Border Nodes

Stack the Border Nodes to present a single interface to the firewall. Avoid this approach. It creates a single point of failure (especially with collocated CP/BN roles), hardware changes require a StackWise Virtual reboot (fabric outage), and there is no ISSU for SVL in SD-Access.

IS-IS Routing Loop Avoidance

If you use IS-IS as the underlay routing protocol, there is a subtle routing loop risk. LAN Automation configures default-information originate under IS-IS, which in IS-IS acts like default-information originate always in OSPF -- it originates a default route regardless of whether one exists in the routing table.

If one Border Node loses its upstream connection but continues advertising a default route into IS-IS, traffic destined for that BN gets sent back to the distribution layer, which follows the default route from the other BN, creating a loop. The solution is to retain the inter-border link and advertise it in the IS-IS domain.

General Fusion Firewall Best Practices

Configure BFD to the fusion device to speed up BGP convergence.
Research your firewall vendor's HA and Graceful Restart implementation to ensure BFD does not trigger a BGP adjacency drop during firewall failover.
Catalyst Center provisions iBGP peering between BNs in the underlay. Configure bgp neighbor fall-over on that peering to speed up upstream BGP convergence.

What Is the SD-Access Transit and When Do You Need It?

The SD-Access Transit enables fabric sites to communicate with each other using VXLAN tunnels between Border Nodes over a plain IP network. It is essential for multi-site deployments that need end-to-end macro- and micro-segmentation.

Transit Requirements

MTU greater than 1,550 bytes between Border Nodes.
Dedicated Transit Control Plane nodes -- these are separate devices with IP reachability to every fabric site's Border Nodes but are not in the data forwarding path.
Multicast in the transit network (if overlay multicast is required).

VXLAN is used because it carries both VRF and SGT in the header over a plain IP network. The transit network only needs to provide IP connectivity between BN Loopback0 interfaces.

LISP Pub/Sub vs. LISP/BGP

The control plane architecture for both fabric sites and the transit can use either LISP Pub/Sub or LISP/BGP:

Feature	LISP Pub/Sub	LISP/BGP
Available since	Catalyst Center 2.2.3.x, IOS-XE 17.6.x	Original architecture
Control Plane load	Lower	Higher
Convergence	Faster	Slower
iBGP between BNs	Not required	Required per-VN peering
External Border	Requires default route (0.0.0.0/0) from upstream	Standard BGP
Transit CP nodes	Up to 4	Up to 2

All sites connected via SDA Transit must use the same CP architecture (either Pub/Sub or LISP/BGP across all sites). For greenfield deployments, deploy LISP Pub/Sub.

LISP Pub/Sub also enables several advanced features: Dynamic Default Border, Backup Internet, and automated route leaking using LISP Extranet.

Colocating Border and Control Plane Roles

The BN stresses TCAM (host routes), while the CP stresses CPU and RAM (endpoint database, wireless roaming). It is safe to colocate BN and CP roles on the same switch up to approximately 50,000 endpoints on C9500H or above, even in wireless-heavy environments. You can split the roles for architectural reasons (fault isolation, network modularity) rather than for scale. Avoid using routing platforms (C8K) as Control Plane or Border Nodes if possible.

What Is the SD-Access Deployment Migration Strategy?

A phased migration is critical to managing risk. Here is the proven project flow for a large-scale deployment.

Phase M1: Build the Management Stack

Deploy Catalyst Center as the automation engine. If HA is required, deploy a 3-node cluster. If disaster recovery is needed, deploy in 1:1 or 3:3 mode. Do not split cluster nodes across two locations.

Phase M2: Integrate with Existing ISE

One Catalyst Center cluster integrates with a single ISE cluster. Reuse existing authentication flows and add new SD-Access-specific authorization profiles.

Warning: Changing the already-integrated ISE cluster requires removing all SD-Access fabric sites in Catalyst Center or disabling authentication on all switchports. Plan this integration carefully from the start.

Phase M3: Deploy Parallel Core

Deploy new core switches in parallel to the existing ones.
Add the new switches to Catalyst Center and enable BN + CP roles.
Configure required VNs (VRFs) in Catalyst Center and assign to the new fabric site.
Configure BGP peerings for underlay and new VNs between Border Nodes and the fusion firewall.

At this point, test everything before migrating any access switches:

Create all final-state subnets and anycast gateways.
Test every endpoint class -- authentication, multicast, exotic use-cases like PC imaging and Wake-on-LAN.
Test fabric failover (shutdown a border, unplug links).

Border configuration does not change whether the fabric has 2 Edge Nodes or 200 Edge Nodes.

Phase M4: Migrate Access -- Per Distribution Block

For each distribution block (building):

Deploy a new Catalyst 9000 switch in parallel with the old one.
Configure routed point-to-point links, Loopback0, OSPFv2, MTU above 1,550, PIM sparse-mode, multicast RP, SNMP, and SSH credentials.
Discover the new switch in Catalyst Center and assign the Edge Node role.
Assign switchports to user VLANs (if not using dynamic ISE-based authentication).
Re-patch endpoints to the new switch.
Continue until all access switches in the block are replaced.
Remove MPLS configuration from distribution switches.
Remove stacking configuration from distribution switches.
Once all distribution blocks are migrated, remove legacy core switches.

Phase M5: Migrate Wireless

Keep wireless "as is" during the switching rollout. Seamless roaming is not possible between fabric-enabled wireless and centrally-switched SSIDs because they have different termination points. Finalize the switching migration first, block by block. Then convert wireless to Fabric-Enabled Wireless (FEW) if there are RF gaps. If there are no RF gaps, wait until the full switching migration is completed.

You can mix Fabric-Enabled Wireless (FEW) and Over-the-Top (OTT) modes on the same AP and even within the same site.

Pro Tip: If multicast is required on an OTT SSID, the AP pool in INFRA_VN needs to be multicast-enabled via CLI template using ip pim sparse under the AP pool SVI.

How Do You Handle Layer 2 Borders and Broadcast Traffic?

Some migration scenarios require stretching VLANs between the fabric and the traditional network. SD-Access provides two Layer 2 border models.

Gateway Inside the Fabric

Use this when endpoints have static IP addresses (such as CCTV cameras):

Create an anycast gateway inside the fabric.
Shut down the corresponding SVI in the traditional network.
Configure the BN with a Layer 2 handoff (gateway inside fabric) -- this is the L2 BN.
Optionally, configure an external VLAN ID if it does not match the fabric VLAN ID.
Allow the VLAN on the trunk between the traditional network and the L2 BN.

Cameras on the old network use the SVI on the L2 BN to reach the fabric. A maximum of 6,000 endpoints can be connected outside the fabric via this method.

Gateway Outside the Fabric

Use this for endpoints that do not use IP (Profinet, BACnet, Modbus, and other industrial protocols relying on MAC-layer communication) or for overlapping IP addresses in multi-tenancy scenarios. This model uses a dedicated L2 BN and requires Layer 2 flooding to be enabled for the stretched segment.

Broadcast Traffic (Layer 2 Flooding)

Layer 2 flooding is disabled by default in SD-Access. This is what enables large Layer 2 segments without broadcast storms. When you need L2 flooding:

It is automatically enabled for segments stretched via an L2 Border Node with gateways outside the fabric.
It can be manually enabled per subnet.
It floods Ethernet broadcast and link-local multicast (TTL=1) in the overlay.
It requires multicast in the underlay.

Put hosts that need L2 flooding into a separate VLAN/VNI. Do not enable L2 flooding on main VLANs with conventional endpoints.

Warning: STP BPDUs are not tunneled inside the fabric, but broadcasts are. If the same VLAN appears on two L2 BN handoffs, you will create a Layer 2 forwarding loop. Dual-homing from a single L2 BN is supported (with STP blocking the redundant path). Multi-chassis EtherChannel from stacked L2 BNs (StackWise or StackWise Virtual) is also supported.

What Are the Day-2 Operational Considerations for SD-Access?

Once the fabric is live, ongoing operations require attention to microsegmentation policy, VXLAN fragmentation, and ISE dependency management.

Microsegmentation with Group-Based Policy

Group-Based Policy (formerly TrustSec) enables micro-segmentation using Security Group ACLs (SGACLs). Key operational facts:

SGACLs only apply to unicast traffic. Broadcast and multicast traffic is not filtered.
Traffic to and from IPv6 link-local addresses is not filtered by Group-Based Policy.
Policy can be managed on either Catalyst Center or ISE. Managing on ISE provides multi-matrix support and per-site default permit/deny, while Catalyst Center offers a simpler single-matrix view.

Default Deny -- Handle with Care

Enabling a default deny policy is the end-goal for most zero-trust deployments, but it requires careful preparation:

LAN Automation sessions prior to version 2.3.7.4 enable GBP enforcement CLI (cts role-based enforcement) on fabric switch routed uplinks. Enabling "default deny" without first disabling GBP enforcement on fabric routed ports will break management connectivity for all switches in the fabric.
For existing LAN Automation deployments, use a one-off CLI template to remove that enforcement line.
From Catalyst Center 2.3.7.6 onward, you can selectively disable GBP enforcement for INFRA_VN AP and EX pools via the GUI.

ISE Dependency for SGACLs

SGACLs expire and get refreshed every 24 hours by re-downloading from ISE via RADIUS. If all ISE PSN nodes are unavailable for more than 24 hours and your network has a default deny policy:

All permissive SGACLs are withdrawn, and the default deny takes effect for all traffic until ISE returns -- this is fail-close mode.
For fail-open behavior, configure catch-all permissive SGACLs via CLI template. Static SGACLs will activate when dynamic SGACLs time out.

VXLAN Fragmentation

Per RFC 7348, VTEPs must not fragment VXLAN packets. The VXLAN overhead is 50 bytes (8 bytes VXLAN + 8 bytes UDP + 20 bytes IP + 14 bytes Ethernet). The solutions:

Within the campus: Increase link MTU to at least 1,550 bytes (ideally 9,000).
Over the WAN: Use ip tcp adjust-mss with a value of 1300. This command can be applied per VLAN (pushed to all Edge Nodes) or per BN (pushed to all L3 VRF handoff interfaces). Note that it only helps with TCP traffic.

ip tcp adjust-mss 1300

Switchport Authentication Policy

SD-Access supports three authentication modes at deployment:

Closed authentication -- 802.1X + MAB (IBNS 2.0 template). No DHCP or ARP before authentication.
Open authentication -- 802.1X + MAB. Even if authentication fails, the endpoint is still allowed.
None -- no authentication; all ports are statically configured.

You can start with "None" and transition to stronger authentication later. The phased approach from visibility-only (MAB, open mode) to full control (802.1X, closed mode) minimizes disruption.

Frequently Asked Questions

How many endpoints can a single SD-Access fabric site support?

It depends on hardware. A C9300/L supports up to 16,000 endpoints as CP and 16,000 host routes as BN. The C9500-32C, C9500-48Y4C, and C9500-24Y4C support up to 80,000 endpoints as CP and 150,000 host routes as BN. The practical site limit is approximately 100,000 endpoints on high-end platforms.

Can I use OSPF instead of IS-IS for the SD-Access underlay?

Yes. LAN Automation uses IS-IS (single Level-2 area) by default, but manual underlay builds can use any routing protocol. Most organizations deploy OSPFv2. OSPF is particularly advantageous in multi-area designs because it allows inter-area routes by default, whereas IS-IS Level-1 areas filter inter-area prefixes and inject a default route instead, which breaks LISP lookups without additional route-leaking configuration.

What happens if the Control Plane nodes fail completely?

During a total CP failure, no new endpoints can be onboarded into the fabric and wireless roaming events will not work. However, existing traffic flows will remain cached and forwarded for up to 24 hours. This gives operations teams a substantial window to restore CP functionality before end-user impact becomes widespread.

Should I colocate Border Node and Control Plane roles on the same switch?

Colocation is safe up to approximately 50,000 endpoints on C9500H or higher platforms, even in wireless-heavy environments. The BN primarily stresses TCAM (host routes), while the CP primarily stresses CPU and RAM (endpoint database, wireless roaming). You may choose to separate the roles for architectural reasons like fault isolation or network modularity, but scale alone does not mandate separation below that threshold.

What is the difference between a "stretched" fabric site and individual fabric sites for remote offices?

A stretched site places Edge Nodes at remote locations while sharing centralized CP and BN nodes. It offers lower management overhead but lower survivability and flexibility. Individual sites (Fabric in a Box) provide higher survivability and flexibility but require managing each site independently. You can mix and match based on requirements.

How do I avoid breaking management connectivity when enabling default deny?

LAN Automation sessions prior to Catalyst Center 2.3.7.4 enable GBP enforcement on fabric switch routed uplinks in the underlay. If you enable a default deny policy without first disabling this enforcement, management connectivity to all fabric switches will break. Use a one-off CLI template to remove the enforcement configuration from affected interfaces before enabling default deny. Starting with Catalyst Center 2.3.7.6, you can selectively disable GBP enforcement for INFRA_VN AP and EX pools through the GUI.

Conclusion

SD-Access fabric deployment is a structured, multi-phase endeavor that rewards careful planning and disciplined execution. The key takeaways from this guide:

Size before you build -- collect endpoint counts, device counts, and interface limits before making any design decisions. Use the SD-Access Design Tool to generate your high-level design.
Carve fabric sites deliberately -- base your site boundaries on endpoint scale, failure domain requirements, and underlay connectivity attributes, not just geography.
Choose your underlay IGP wisely -- use the routing protocol your team knows. Multi-area OSPF is straightforward; IS-IS requires route-leaking in multi-area designs.
Test at the border before migrating access -- stand up your Border/CP nodes, configure all VNs and subnets, and validate every endpoint class before touching the first access switch.
Migrate in phases -- wired switching first (block by block), wireless second, authentication policy third. Trying to transform everything at once is the single most common cause of project failure.
Plan for day-2 from day-1 -- understand SGACL lifecycle (24-hour ISE refresh), default deny implications, and VXLAN fragmentation handling before you go live.

To build hands-on skills with SD-Access fabric provisioning, explore the SD-Access Fabric Lab course on NHPREP, where you can practice site design, underlay configuration, overlay provisioning, and policy enforcement in a guided lab environment.