Application-Aware Routing and SLA
Application-Aware Routing and SLA
Introduction
One of the most powerful capabilities in Catalyst SD-WAN is its ability to make intelligent forwarding decisions based on real-time network conditions. Traditional routing protocols choose paths based on static metrics like bandwidth or hop count, but they have no visibility into what is actually happening on the wire — packet loss, latency, and jitter can silently degrade application performance without triggering a route change.
Application-Aware Routing (AAR) solves this problem. It continuously monitors the health of every data plane tunnel using BFD (Bidirectional Forwarding Detection) probes and compares measured performance against administrator-defined SLA (Service Level Agreement) thresholds. When a tunnel fails to meet the SLA for a given application, AAR automatically steers traffic to a tunnel that does — all without manual intervention.
In this lesson, you will learn:
- How AAR measures tunnel performance using BFD statistics
- How SLA classes define acceptable loss, latency, and jitter thresholds
- The complete decision flowchart AAR follows when selecting tunnels
- How AAR interacts with Data Policy when both match a packet
- How to troubleshoot AAR and Data Policy from the vSmart controller
- The role of preferred color, backup preferred color, and the remote preferred color feature
Key Concepts
BFD-Based Tunnel Monitoring
AAR relies on BFD Hello packets exchanged across every data plane tunnel to collect real-time statistics. These Hello packets provide continuous measurements of three critical metrics:
| Metric | What It Measures |
|---|---|
| Packet Loss | Percentage of BFD packets lost on the tunnel |
| Latency | Round-trip delay measured by BFD probes |
| Jitter | Variation in latency over time |
These measurements feed directly into the SLA evaluation engine that determines whether a tunnel is healthy enough to carry a given application's traffic.
SLA Classes
An SLA class is a policy object that defines the maximum acceptable values for loss, latency, and jitter on a tunnel. For example, you might define an SLA class for voice traffic that requires less than 1% loss and less than 150 ms latency. AAR compares the measured BFD statistics against these thresholds to decide which tunnels qualify.
Sliding Window Statistics
AAR does not make decisions based on a single measurement. Instead, it uses a sliding window mechanism to collect and age out statistics over time. Understanding how this works is essential to predicting AAR behavior.
| Parameter | Default Value | Description |
|---|---|---|
| Polling Interval | 10 minutes (600 seconds) | Duration of each statistics collection window |
| BFD Hello Interval | 1 second | How often BFD Hello packets are sent |
| Packets per Window | 600 | Total BFD Hellos per polling interval (600s / 1s) |
| Multiplier | 6 | Number of buckets used to calculate the mean |
| Total Retention | 1 hour | How long statistics are retained (10 min x 6) |
| Number of Buckets | 6 (indexed 0-5) | Bucket 0 = newest, Bucket 5 = oldest |
Every 10 minutes, the newest statistics are placed in bucket 0, the contents of bucket 5 are discarded, and the remaining statistics shift into the next higher-numbered bucket. This means AAR always has up to one hour of historical data to work with.
Important: The multiplier determines how many buckets are actually used to calculate the mean loss and latency. If the multiplier is set to 3, only buckets 0, 1, and 2 are used. AAR always retains all six buckets, but the multiplier controls how many participate in the calculation.
Every 60 minutes, AAR calculates the mean of the loss and latency across all buckets included by the multiplier and compares this value to the specified SLA. If the calculated value satisfies the SLA, no action is taken. If the value does not satisfy the SLA, AAR calculates a new tunnel path for the affected traffic.
How It Works
AAR Tunnel Selection Flowchart
When a packet matches an AAR sequence, the router follows a precise decision tree to determine which tunnel to use. Here is the complete logic:
-
Is an SLA class configured for this sequence?
- If no, traffic is forwarded using ECMP across all tunnels that meet the default SLA class and all colors.
- If yes, proceed to step 2.
-
Do any tunnels meet the configured SLA?
- If yes, check whether a preferred color is configured.
- If no, check whether strict mode is configured — if strict is set, the packet is dropped. If strict is not set, check for fallback options.
-
Is a preferred color configured?
- If yes, check whether the preferred color tunnel is up.
- If the preferred color is up, ECMP across tunnels meeting SLA on the preferred colors.
- If the preferred color is down, ECMP across tunnels meeting SLA on all colors.
- If no, ECMP across tunnels meeting SLA on all colors.
- If yes, check whether the preferred color tunnel is up.
-
Is remote-preferred-color configured?
- If yes, ECMP across tunnels meeting SLA on the remote preferred colors.
- If no, continue to fallback evaluation.
-
Fallback when no tunnels meet SLA (non-strict):
- If backup-preferred-color is configured and that color is up, use it.
- If fallback-to-best-path is configured, send packets using the best of worst (BOW) tunnel — the tunnel with the least-bad performance even though it does not meet the SLA.
- If neither fallback option is available, traffic may be dropped depending on the strict setting.
Key Point: The remote-preferred-color feature can coexist with the local preferred color. When fallback-to-best-path is active and some tunnels meet the best-of-worst criteria, remote-preferred-color is respected for BOW tunnel selection. However, remote-color-restrict is ignored in BOW scenarios because the intent is to forward traffic rather than drop it. Remote-color-restrict only applies in non-BOW scenarios.
Interaction Between AAR and Data Policy
In many deployments, a packet may match both an AAR app-route policy and a Data Policy simultaneously. Understanding the priority between them is critical. The guiding principle is: Data Policy makes the final decision, with consideration for AAR SLA.
The processing flow works as follows:
- An incoming packet is evaluated against both the App-Route Policy and the Data Policy.
- If the App-Route Policy finds a path matching the SLA, and a Data Policy also matches, the Data Policy path decision takes precedence.
- If AAR strict mode is configured but no tunnel meets the SLA, and Data Policy specifies a local TLOC with strict, the packet path is determined by routing due to the TLOC being down.
- If the App-Route Policy has no SLA match and strict is not configured, AAR evaluates the default SLA class before any drop decision.
- If Data Policy specifies a local or remote TLOC action, that action overrides the AAR path selection.
This interaction ensures that administrative intent expressed through Data Policy always wins, while AAR provides intelligent SLA-based steering when Data Policy does not explicitly override the path.
Configuration Example
Troubleshooting AAR and Data Policies
Troubleshooting AAR follows a structured workflow from the vSmart controller. Below are the key verification steps and commands.
Step 1: Check policy commit changes
show configuration commit changes <number>
This confirms that the most recent policy push was applied successfully and shows what changed.
Step 2: Verify OMP peering between WAN Edge and vSmart
show omp peers <system-ip>
OMP peering must be healthy for the vSmart to push AAR and Data Policies to the WAN Edge routers.
Step 3: Check AAR and Data Policy assignment and direction
show support omp peer peer-ip <system-ip> | include -pol
This displays which policies are assigned to a given peer and in which direction (inbound or outbound).
Step 4: Verify policy translation from vManage UI to CLI on vSmart
show run policy list <name>
show run policy data-policy <name>
show run policy app-route-policy <name>
show run apply-policy site-list data-policy <name>
show run apply-policy site-list app-route-policy <name>
These commands confirm that the policy definition created in the vManage GUI was correctly translated into the CLI representation on the vSmart controller.
Step 5: Check policy-to-XML translation (crafting)
show support omp peer peer-ip <system-ip>
This validates that the policy was successfully crafted into XML format for distribution to the WAN Edge devices via OMP.
Best Practice: When troubleshooting AAR, always start from the vSmart perspective. The workflow is similar to control policy troubleshooting — verify the commit, check OMP peering, confirm policy assignment, validate CLI translation, and finally inspect the XML crafting.
Real-World Application
Common Deployment Scenarios
Multi-ISP Environments: AAR is most valuable when a site has multiple WAN transports — for example, MPLS and broadband Internet. Voice and video traffic can be assigned strict SLA classes that keep them on the low-latency MPLS path, while bulk data uses the broadband link. If MPLS degrades, AAR automatically moves sensitive traffic to the Internet path provided it meets the SLA.
Preferred Color Steering: Organizations often want certain applications to prefer specific transports. Using preferred color with AAR, you can direct business-critical SaaS traffic over a direct Internet link while keeping internal ERP traffic on MPLS — with automatic failover if the preferred path degrades.
Best-of-Worst Fallback: In scenarios where all tunnels are experiencing degradation, enabling fallback-to-best-path ensures that traffic still flows over the least-degraded tunnel rather than being dropped. This is particularly important for sites with limited WAN diversity.
Design Considerations
- SLA Thresholds: Set SLA thresholds based on actual application requirements. Overly aggressive thresholds cause unnecessary path changes (flapping), while overly relaxed thresholds fail to protect application quality.
- Multiplier Tuning: A lower multiplier (for example, 3 instead of 6) makes AAR more responsive to recent conditions since fewer historical buckets are averaged. A higher multiplier provides more stability but slower reaction to changes.
- Strict vs. Non-Strict: Use strict mode only for applications where sending traffic over a degraded path is worse than dropping it entirely. For most traffic, non-strict with fallback-to-best-path is the safer choice.
- Policy Overlap: When deploying both AAR and Data Policies, remember that Data Policy always has the final say. Design your policies so they complement each other rather than conflict.
Summary
- Application-Aware Routing uses BFD Hello packets to continuously measure packet loss, latency, and jitter across all data plane tunnels, comparing results against configured SLA classes.
- Statistics are collected in six sliding-window buckets over a one-hour period, with the multiplier controlling how many buckets are used for the mean calculation.
- AAR follows a detailed tunnel selection flowchart that evaluates SLA compliance, preferred colors, remote preferred colors, strict mode, and fallback-to-best-path before making a forwarding decision.
- When both AAR and Data Policy match a packet, Data Policy makes the final decision with consideration for the AAR SLA evaluation.
- Troubleshooting AAR follows the same structured workflow as control policies — starting from policy commit verification on vSmart, through OMP peering checks, policy assignment validation, and CLI-to-XML translation inspection.
In the next lesson, we will explore how SD-WAN leverages these policies alongside additional WAN optimization features to maximize usable bandwidth across multiple ISP links.