State Management and Drift Detection
Objective
In this lesson you will learn how to manage Terraform state for IOS‑XE devices and detect/remediate configuration drift using device-native YANG outputs (NETCONF/RESTCONF-style CLI) and Terraform state commands. This matters in production because drift between the declared infrastructure (Terraform state) and the actual device configuration causes outages, security gaps, and compliance failures. Real-world scenario: an automation pipeline applies standardized interface and management configurations to hundreds of Catalyst 9000 switches; later, a manual hotfix changes an interface on one switch — you must detect that drift and reconcile it back to the desired state without disrupting service.
Quick Recap
Use the same topology from Lesson 1. No new devices are added in this lesson — we operate from the Terraform host (automation runner) against your IOS‑XE devices (Catalyst 9K family running IOS XE). The device outputs we will use are the YANG-modeled CLI outputs available via the device CLI:
show run | format netconf -xmlshow run | format restconf -jsonshow interfaces(YANG/JSON-like operational output)
Note: Throughout this lesson we rely on the device CLI YANG output formats demonstrated previously. These are the canonical source of truth for drift detection when working with model-driven automation.
ASCII topology (reference topology; no new IPs added here):
[Terraform Host] 198.51.100.10 ---mgmt--- [Catalyst9K-1] 198.51.100.11 (Mgmt)
(Management connectivity only; Terraform host talks to devices' management plane)
Key Concepts
- Terraform state: Terraform maintains a record (the state file) that maps declared resources to real-world objects. In network automation the state reflects device resources (interfaces, VLANs, etc.) as Terraform last applied them. Treat the state as the authoritative mapping that Terraform uses to compute plans.
- Drift: Any change made on-device outside Terraform (manual CLI change, another tool) creates divergence between the current device config and Terraform state. Terraform detects drift when
terraform planshows differences between the provider’s read state and the local desired configuration. - YANG-model outputs: IOS‑XE exposes configuration and operational data in YANG-modeled formats. CLI conversions like
show run | format netconf -xmlandshow run | format restconf -jsonproduce machine-friendly representations used for comparison. - Detection flow: Pull device config/state (NETCONF/RESTCONF or CLI-format), compare to Terraform’s view (state file or
terraform plan), then choose remediation — either push a corrective Terraform apply or perform a controlled on-device change and thenterraform import/terraform statemanipulation. - Confirm before change: In production, use candidate datastores, change windows, and confirm-commit patterns (NETCONF supports these) to avoid accidental disruption. When using Terraform, leverage planning and approval steps to ensure safe remediation.
Step-by-step configuration
Step 1: Export the device configuration in NETCONF (YANG/XML) format
What we are doing: Retrieve the device configuration in a YANG-modeled XML format so an automation pipeline (or a human) can compare the device’s live configuration against Terraform state. This format is directly mappable to the native YANG models the device supports.
enable
show run | format netconf -xml
What just happened:
enableelevates to privileged EXEC so you can run show commands that access full configuration.show run | format netconf -xmloutputs the running configuration converted into NETCONF/XML format based on the device’s native YANG models. This representation omits non-modeled, vendor-opaque CLI lines and returns the configuration as data nodes aligned to YANG — ideal for automated diffs.
Real-world note: In production, the NETCONF/XML output is consumed by orchestration tools that expect YANG-modeled data; using it reduces parsing errors compared to free-form CLI.
Verify:
show run | format netconf -xml
Expected sample output (excerpt):
<config>
<interfaces xmlns="http://cisco.com/ns/yang/Cisco-IOS-XE-interfaces">
<interface>
<name>GigabitEthernet1/0/1</name>
<description>Uplink-to-Core</description>
<enabled>true</enabled>
<ipv4>
<address>
<ip>198.51.100.11</ip>
<netmask>255.255.255.0</netmask>
</address>
</ipv4>
</interface>
</interfaces>
...
</config>
(Full XML returned by device will include all YANG-modeled nodes present in the running config.)
Step 2: Export the device configuration in RESTCONF (JSON) format
What we are doing: Produce the configuration in JSON (RESTCONF-style) to make it easy to compare with Terraform state or tooling that expects JSON. JSON is easier to diff and integrate into scripting languages.
enable
show run | format restconf -json
What just happened:
- The
show run | format restconf -jsonCLI converts modeled parts of the running configuration to JSON consistent with RESTCONF export schemas. This is particularly useful when your comparison tooling or drift detection pipeline expects JSON for diffing.
Real-world note: Many automation stacks and dashboards prefer JSON because it integrates directly with Elasticsearch, Splunk, or Python scripts for automated comparison and alerting.
Verify:
show run | format restconf -json
Expected sample output (excerpt):
{
"interfaces": {
"interface": [
{
"name": "GigabitEthernet1/0/1",
"description": "Uplink-to-Core",
"enabled": true,
"ipv4": {
"address": [
{
"ip": "198.51.100.11",
"netmask": "255.255.255.0"
}
]
}
}
]
},
...
}
Step 3: Capture operational interface state (YANG/JSON output)
What we are doing: Retrieve operational state for interfaces (administrative/operational status, MAC, speed, statistics) to detect drift that affects operational behavior — changes to admin down/up or speed mismatches are as important as config drift.
enable
show interfaces
What just happened:
show interfacesdisplays interface operational data. When the device provides YANG/JSON IETF encoded output (as in IOS‑XE), you will see structured fields likeif-index,phys-address,last-change, andspeed. These operational state values let you detect changes not visible in the configuration (for example, a device might be administratively up but oper-down due to a transceiver fault).
Real-world note: Some drift is purely operational (broken optics, duplex mismatch) rather than configuration-based — including operational data in your checks reduces false negatives.
Verify:
show interfaces
Expected sample output (JSON-like excerpt):
{
"interfaces": [
{
"name": "GigabitEthernet1/0/1",
"if-index": 1,
"phys-address": "00:50:56:bf:77:ea",
"last-change": "2025-06-04T16:48:26.49+00:00",
"oper-status": "if-oper-state-ready",
"speed": "1000000000",
"statistics": {
"in-octets": 123456789,
"out-octets": 987654321
}
}
]
}
Step 4: Compare device outputs to Terraform state and detect drift
What we are doing: Use Terraform state inspection and plan operations to detect differences between desired state (Terraform configs) and the provider’s view of the device. When Terraform’s provider reads device state, it constructs a plan showing diffs — this is how drift is detected.
(These commands are run on the automation host/CI runner where your Terraform configs live.)
# List tracked resources in the current state
terraform state list
# Pull current state (local or remote) for inspection
terraform state pull > current_state.json
# Generate a plan comparing current config files to the provider's read state
terraform plan -out=tfplan
terraform show -json tfplan > tfplan.json
What just happened:
terraform state listenumerates resources Terraform currently tracks.terraform state pulloutputs the raw state JSON so you can compare it to device exports.terraform planasks the provider to read live device state and then calculates what changes are required to reach the declared configuration. If the provider detects differences between the live device and Terraform’s desired configuration, the plan will contain change actions — that is drift.
Real-world note: In production, run
terraform planin a read-only, scheduled pipeline to detect drift and trigger alerts. Neverapplydrift remediation without an approval gate.
Verify:
terraform plan -out=tfplan
terraform show tfplan
Expected plan excerpt when drift exists:
# Example plan output indicating drift on an interface description
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
~ update in-place
Terraform will perform the following actions:
# cisco_device_interface.Gi1/0/1 will be updated in-place
~ resource "cisco_device_interface" "Gi1/0/1" {
~ description = "Old-Desc" -> "Uplink-to-Core"
...
}
Step 5: Remediate drift safely (apply or import as appropriate)
What we are doing: Decide whether to reconcile by letting Terraform apply the desired configuration (preferred) or to adopt the manual change into Terraform (import/state edit). In most production workflows you will remediate by running terraform apply after review. If the manual change is legitimate and should become the new desired state, use terraform import and update configs.
# Apply corrective changes (after review/approval)
terraform apply tfplan
# Alternatively, if the device change was intentional and you want Terraform to adopt it:
terraform import cisco_device_interface.Gi1/0/1 <device-unique-id>
What just happened:
terraform applypushes the changes computed in the plan to the device using the provider. The provider performs API/CLI operations to converge the device to the declared config.terraform importtells Terraform to map a real-world resource into state so Terraform begins managing it; after import, update your local HCL to match the live resource and runterraform planto confirm no drift.
Real-world note: Importing is a one-time operation to bring externally-created resources into Terraform management. Always update HCL after import and verify with a plan.
Verify:
terraform show
# AND on the device:
enable
show run | format restconf -json
Expected outcome after apply:
Apply complete! Resources: 1 updated, 0 added, 0 destroyed.
# Device RESTCONF JSON excerpt now matches Terraform desired config:
{
"interfaces": {
"interface": [
{
"name": "GigabitEthernet1/0/1",
"description": "Uplink-to-Core",
...
}
]
}
}
Verification Checklist
- Check 1: Device config exported in NETCONF/XML — verify by running
show run | format netconf -xmland confirming key nodes exist (e.g., interface name and ip). - Check 2: Device config exported in RESTCONF/JSON — verify by running
show run | format restconf -jsonand confirming JSON keys (interfaces.interface[]). - Check 3: Terraform detects drift — run
terraform planand verify it lists updates (~ update in-place) if there is drift. - Check 4: Remediation successful — after
terraform apply, re-runshow run | format restconf -jsonand ensure device JSON matches Terraform desired config.
Common Mistakes
| Symptom | Cause | Fix |
|---|---|---|
terraform plan shows no changes, but device has different interface description | Terraform provider cannot read modeled config because the device was returning non-modeled CLI lines | Use `show run |
Device show interfaces shows oper-status down while config is up | Operational problem (cabling, SFP, VLAN mismatch), not configuration drift | Investigate interface physical/DFE/optics; include show interfaces operational fields (last-change, phys-address) in your checks to identify non-config drift. |
terraform apply attempts to change settings unexpectedly | Local HCL does not match team’s intended desired state (stale HCL) | Run terraform plan and review diffs. Communicate with team, update HCL if the device change should be the new desired state, or roll the device back via Terraform after approval. |
After terraform import, terraform plan shows changes | Imported resource attributes not represented in HCL | Update the HCL resource block to include the current attributes found on the device, then run terraform plan and reconcile. |
Key Takeaways
- Always use YANG‑modeled device outputs (
show run | format netconf -xml/show run | format restconf -json) as canonical device state when detecting drift — these map directly to provider data models. - Detect drift by running
terraform planregularly in a read-only pipeline; compare device exports (NETCONF/RESTCONF) with Terraform state when deeper inspection is needed. - Include operational state (from
show interfaces/YANG outputs) in your checks — not all problems are configuration drift; some are physical/operational. - When remediating, prefer
terraform applyafter review. Useterraform importwhen you need Terraform to adopt a legitimate manual change, and always reconcile HCL afterward.
Tip: Automate scheduled drift checks that pull
show run | format restconf -jsonandterraform state pulland compute diffs in your CI/CD system. Alert human operators for approval before automatic remediation in production.