Lesson 4 of 6

State Management and Drift Detection

Objective

In this lesson you will learn how to manage Terraform state for IOS‑XE devices and detect/remediate configuration drift using device-native YANG outputs (NETCONF/RESTCONF-style CLI) and Terraform state commands. This matters in production because drift between the declared infrastructure (Terraform state) and the actual device configuration causes outages, security gaps, and compliance failures. Real-world scenario: an automation pipeline applies standardized interface and management configurations to hundreds of Catalyst 9000 switches; later, a manual hotfix changes an interface on one switch — you must detect that drift and reconcile it back to the desired state without disrupting service.

Quick Recap

Use the same topology from Lesson 1. No new devices are added in this lesson — we operate from the Terraform host (automation runner) against your IOS‑XE devices (Catalyst 9K family running IOS XE). The device outputs we will use are the YANG-modeled CLI outputs available via the device CLI:

  • show run | format netconf -xml
  • show run | format restconf -json
  • show interfaces (YANG/JSON-like operational output)

Note: Throughout this lesson we rely on the device CLI YANG output formats demonstrated previously. These are the canonical source of truth for drift detection when working with model-driven automation.

ASCII topology (reference topology; no new IPs added here):

[Terraform Host] 198.51.100.10 ---mgmt--- [Catalyst9K-1] 198.51.100.11 (Mgmt)

(Management connectivity only; Terraform host talks to devices' management plane)

Key Concepts

  • Terraform state: Terraform maintains a record (the state file) that maps declared resources to real-world objects. In network automation the state reflects device resources (interfaces, VLANs, etc.) as Terraform last applied them. Treat the state as the authoritative mapping that Terraform uses to compute plans.
  • Drift: Any change made on-device outside Terraform (manual CLI change, another tool) creates divergence between the current device config and Terraform state. Terraform detects drift when terraform plan shows differences between the provider’s read state and the local desired configuration.
  • YANG-model outputs: IOS‑XE exposes configuration and operational data in YANG-modeled formats. CLI conversions like show run | format netconf -xml and show run | format restconf -json produce machine-friendly representations used for comparison.
  • Detection flow: Pull device config/state (NETCONF/RESTCONF or CLI-format), compare to Terraform’s view (state file or terraform plan), then choose remediation — either push a corrective Terraform apply or perform a controlled on-device change and then terraform import/terraform state manipulation.
  • Confirm before change: In production, use candidate datastores, change windows, and confirm-commit patterns (NETCONF supports these) to avoid accidental disruption. When using Terraform, leverage planning and approval steps to ensure safe remediation.

Step-by-step configuration

Step 1: Export the device configuration in NETCONF (YANG/XML) format

What we are doing: Retrieve the device configuration in a YANG-modeled XML format so an automation pipeline (or a human) can compare the device’s live configuration against Terraform state. This format is directly mappable to the native YANG models the device supports.

enable
show run | format netconf -xml

What just happened:

  • enable elevates to privileged EXEC so you can run show commands that access full configuration.
  • show run | format netconf -xml outputs the running configuration converted into NETCONF/XML format based on the device’s native YANG models. This representation omits non-modeled, vendor-opaque CLI lines and returns the configuration as data nodes aligned to YANG — ideal for automated diffs.

Real-world note: In production, the NETCONF/XML output is consumed by orchestration tools that expect YANG-modeled data; using it reduces parsing errors compared to free-form CLI.

Verify:

show run | format netconf -xml

Expected sample output (excerpt):

<config>
  <interfaces xmlns="http://cisco.com/ns/yang/Cisco-IOS-XE-interfaces">
    <interface>
      <name>GigabitEthernet1/0/1</name>
      <description>Uplink-to-Core</description>
      <enabled>true</enabled>
      <ipv4>
        <address>
          <ip>198.51.100.11</ip>
          <netmask>255.255.255.0</netmask>
        </address>
      </ipv4>
    </interface>
  </interfaces>
  ...
</config>

(Full XML returned by device will include all YANG-modeled nodes present in the running config.)


Step 2: Export the device configuration in RESTCONF (JSON) format

What we are doing: Produce the configuration in JSON (RESTCONF-style) to make it easy to compare with Terraform state or tooling that expects JSON. JSON is easier to diff and integrate into scripting languages.

enable
show run | format restconf -json

What just happened:

  • The show run | format restconf -json CLI converts modeled parts of the running configuration to JSON consistent with RESTCONF export schemas. This is particularly useful when your comparison tooling or drift detection pipeline expects JSON for diffing.

Real-world note: Many automation stacks and dashboards prefer JSON because it integrates directly with Elasticsearch, Splunk, or Python scripts for automated comparison and alerting.

Verify:

show run | format restconf -json

Expected sample output (excerpt):

{
  "interfaces": {
    "interface": [
      {
        "name": "GigabitEthernet1/0/1",
        "description": "Uplink-to-Core",
        "enabled": true,
        "ipv4": {
          "address": [
            {
              "ip": "198.51.100.11",
              "netmask": "255.255.255.0"
            }
          ]
        }
      }
    ]
  },
  ...
}

Step 3: Capture operational interface state (YANG/JSON output)

What we are doing: Retrieve operational state for interfaces (administrative/operational status, MAC, speed, statistics) to detect drift that affects operational behavior — changes to admin down/up or speed mismatches are as important as config drift.

enable
show interfaces

What just happened:

  • show interfaces displays interface operational data. When the device provides YANG/JSON IETF encoded output (as in IOS‑XE), you will see structured fields like if-index, phys-address, last-change, and speed. These operational state values let you detect changes not visible in the configuration (for example, a device might be administratively up but oper-down due to a transceiver fault).

Real-world note: Some drift is purely operational (broken optics, duplex mismatch) rather than configuration-based — including operational data in your checks reduces false negatives.

Verify:

show interfaces

Expected sample output (JSON-like excerpt):

{
  "interfaces": [
    {
      "name": "GigabitEthernet1/0/1",
      "if-index": 1,
      "phys-address": "00:50:56:bf:77:ea",
      "last-change": "2025-06-04T16:48:26.49+00:00",
      "oper-status": "if-oper-state-ready",
      "speed": "1000000000",
      "statistics": {
        "in-octets": 123456789,
        "out-octets": 987654321
      }
    }
  ]
}

Step 4: Compare device outputs to Terraform state and detect drift

What we are doing: Use Terraform state inspection and plan operations to detect differences between desired state (Terraform configs) and the provider’s view of the device. When Terraform’s provider reads device state, it constructs a plan showing diffs — this is how drift is detected.

(These commands are run on the automation host/CI runner where your Terraform configs live.)

# List tracked resources in the current state
terraform state list

# Pull current state (local or remote) for inspection
terraform state pull > current_state.json

# Generate a plan comparing current config files to the provider's read state
terraform plan -out=tfplan
terraform show -json tfplan > tfplan.json

What just happened:

  • terraform state list enumerates resources Terraform currently tracks.
  • terraform state pull outputs the raw state JSON so you can compare it to device exports.
  • terraform plan asks the provider to read live device state and then calculates what changes are required to reach the declared configuration. If the provider detects differences between the live device and Terraform’s desired configuration, the plan will contain change actions — that is drift.

Real-world note: In production, run terraform plan in a read-only, scheduled pipeline to detect drift and trigger alerts. Never apply drift remediation without an approval gate.

Verify:

terraform plan -out=tfplan
terraform show tfplan

Expected plan excerpt when drift exists:

# Example plan output indicating drift on an interface description
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # cisco_device_interface.Gi1/0/1 will be updated in-place
  ~ resource "cisco_device_interface" "Gi1/0/1" {
      ~ description = "Old-Desc" -> "Uplink-to-Core"
      ...
    }

Step 5: Remediate drift safely (apply or import as appropriate)

What we are doing: Decide whether to reconcile by letting Terraform apply the desired configuration (preferred) or to adopt the manual change into Terraform (import/state edit). In most production workflows you will remediate by running terraform apply after review. If the manual change is legitimate and should become the new desired state, use terraform import and update configs.

# Apply corrective changes (after review/approval)
terraform apply tfplan

# Alternatively, if the device change was intentional and you want Terraform to adopt it:
terraform import cisco_device_interface.Gi1/0/1 <device-unique-id>

What just happened:

  • terraform apply pushes the changes computed in the plan to the device using the provider. The provider performs API/CLI operations to converge the device to the declared config.
  • terraform import tells Terraform to map a real-world resource into state so Terraform begins managing it; after import, update your local HCL to match the live resource and run terraform plan to confirm no drift.

Real-world note: Importing is a one-time operation to bring externally-created resources into Terraform management. Always update HCL after import and verify with a plan.

Verify:

terraform show
# AND on the device:
enable
show run | format restconf -json

Expected outcome after apply:

Apply complete! Resources: 1 updated, 0 added, 0 destroyed.

# Device RESTCONF JSON excerpt now matches Terraform desired config:
{
  "interfaces": {
    "interface": [
      {
        "name": "GigabitEthernet1/0/1",
        "description": "Uplink-to-Core",
        ...
      }
    ]
  }
}

Verification Checklist

  • Check 1: Device config exported in NETCONF/XML — verify by running show run | format netconf -xml and confirming key nodes exist (e.g., interface name and ip).
  • Check 2: Device config exported in RESTCONF/JSON — verify by running show run | format restconf -json and confirming JSON keys (interfaces.interface[]).
  • Check 3: Terraform detects drift — run terraform plan and verify it lists updates (~ update in-place) if there is drift.
  • Check 4: Remediation successful — after terraform apply, re-run show run | format restconf -json and ensure device JSON matches Terraform desired config.

Common Mistakes

SymptomCauseFix
terraform plan shows no changes, but device has different interface descriptionTerraform provider cannot read modeled config because the device was returning non-modeled CLI linesUse `show run
Device show interfaces shows oper-status down while config is upOperational problem (cabling, SFP, VLAN mismatch), not configuration driftInvestigate interface physical/DFE/optics; include show interfaces operational fields (last-change, phys-address) in your checks to identify non-config drift.
terraform apply attempts to change settings unexpectedlyLocal HCL does not match team’s intended desired state (stale HCL)Run terraform plan and review diffs. Communicate with team, update HCL if the device change should be the new desired state, or roll the device back via Terraform after approval.
After terraform import, terraform plan shows changesImported resource attributes not represented in HCLUpdate the HCL resource block to include the current attributes found on the device, then run terraform plan and reconcile.

Key Takeaways

  • Always use YANG‑modeled device outputs (show run | format netconf -xml / show run | format restconf -json) as canonical device state when detecting drift — these map directly to provider data models.
  • Detect drift by running terraform plan regularly in a read-only pipeline; compare device exports (NETCONF/RESTCONF) with Terraform state when deeper inspection is needed.
  • Include operational state (from show interfaces/YANG outputs) in your checks — not all problems are configuration drift; some are physical/operational.
  • When remediating, prefer terraform apply after review. Use terraform import when you need Terraform to adopt a legitimate manual change, and always reconcile HCL afterward.

Tip: Automate scheduled drift checks that pull show run | format restconf -json and terraform state pull and compute diffs in your CI/CD system. Alert human operators for approval before automatic remediation in production.