Lesson 4 of 7

Ansible with NETCONF

Objective

In this lesson you will use Ansible's NETCONF modules — netconf_get and netconf_config — to perform model-driven configuration and retrieval on Nexus devices in our lab fabric. You will learn why NETCONF matters in production (it provides a structured, schema-driven API over SSH), how Ansible invokes NETCONF to retrieve and change device state, and how to verify changes. In production networks this approach is used when you need consistent, auditable configuration changes across many switches (for example, creating VLANs and overlay state in a VXLAN EVPN fabric) without parsing CLI text.

Quick Recap

We continue using the same VXLAN EVPN test fabric introduced in Lesson 1. No new devices are added for this lesson; we will target the leaf and spine switches listed below.

ASCII topology (abbreviated view with management IPs used for Ansible/NETCONF):

                                     +------------+
                                     |   S1       |
                                     | mgmt:      |
                                     | 10.15.1.11 |
                                     +-----+------+
                                           |
                                           |
                                     +-----+------+
                                     |   S2       |
                                     | mgmt:      |
                                     | 10.15.1.12 |
                                     +------------+
                Leaf 1 (L1)           Leaf 2 (L2)          Leaf 3 (L3)
            +---------------+    +---------------+    +---------------+
            | mgmt:10.15.1.13|    | mgmt:10.15.1.14|    | mgmt:10.15.1.15|
            +---------------+    +---------------+    +---------------+

Device table (management IPs used by Ansible/NETCONF):

DeviceRoleManagement IP
S1Spine10.15.1.11
S2Spine10.15.1.12
L1Leaf10.15.1.13
L2Leaf10.15.1.14
L3Leaf10.15.1.15

All Ansible examples in this lesson run from your control workstation and target the devices above.

Key Concepts (theory before hands-on)

  • NETCONF vs CLI: NETCONF is a protocol that uses an RPC-style, XML-encoded, schema-driven exchange over an encrypted transport (usually SSH). Instead of sending unstructured CLI text, NETCONF exchanges structured data (using YANG models) so the control system knows exactly what state it’s changing.

    • In production, this reduces brittle text parsing and increases idempotence and validation before commit.
  • Model-driven automation: With NETCONF, you request or push a data model (YANG). When you push configuration with NETCONF, the device validates against the model and applies changes atomically (depending on device capabilities). Think of NETCONF as speaking directly to the device’s data model — like sending a form with explicit fields rather than free-form text.

  • Ansible netconf_get / netconf_config: Ansible bridges to NETCONF via modules that wrap NETCONF RPCs. netconf_get performs data retrieval (get/get-config), while netconf_config pushes XML configuration. In production, use netconf_get for inventory or state checks and netconf_config to perform controlled changes.

  • Transport and connectivity: NETCONF typically uses SSH (port 830 for NETCONF over SSH by convention, though some devices accept NETCONF over standard SSH sessions). Ensure SSH connectivity, proper credentials, and that the device’s NETCONF server is enabled.

  • Idempotence and verification: Even when pushing model-driven data, always verify with netconf_get. In production, include pre-change get, change, then post-change get in your automation pipeline to ensure desired state and to enable automated rollback on failure.

Tip: Think of NETCONF as a bank transaction API for device state — you submit structured changes and the device can accept, reject, or validate them before committing.


Step-by-step configuration

Step 1: Prepare the control host and install Ansible collections

What we are doing: Create a Python/Ansible environment on the control workstation and install the Cisco NX-OS Ansible collections. This ensures the Ansible runtime and the vendor collections required for NX-OS NETCONF interactions are present.

# Create an isolated Python environment (example using pyenv/virtualenv - see your workstation docs)
% pyenv install 3.9.11
% pyenv virtualenv 3.9.11 ansible
% mkdirmy_ansible_dir
% pyenv local ansible

# Install Ansible and the NX-OS / DCNM collections
% pip install ansible
% ansible-galaxy collection install cisco.nxos cisco.dcnm

What just happened: The virtual environment isolates Python and Ansible versions so automation runs are reproducible. Installing the cisco.nxos collection provides NX-OS specific modules and documentation; cisco.dcnm is included because many DC fabrics use both NX-OS and DCNM integrations. These collections provide the context and helper modules Ansible may use when interacting with NX-OS devices over NETCONF.

Real-world note: In production CI/CD pipelines, collections are installed in build containers or pinned in requirements files to guarantee consistent behavior across jobs.

Verify:

% ansible --version
ansible [core 2.x.x]
  config file = /home/user/ansible/ansible.cfg
  configured module search path = ['/home/user/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/user/.pyenv/versions/ansible/lib/python3.9/site-packages/ansible
  collections path = /home/user/.ansible/collections:/usr/share/ansible/collections

% ansible-galaxy collection list
# /home/user/.ansible/collections/ansible_collections
Collection    Version
cisco.nxos    X.Y.Z
cisco.dcnm    X.Y.Z

(The output shows Ansible is installed and the Cisco collections are available. Versions will vary; ensure collections are present.)


Step 2: Create a NETCONF-capable Ansible inventory

What we are doing: Define an inventory file specifying NETCONF as the connection method and the exact management IPs from the topology. This tells Ansible to connect using the NETCONF transport to each host.

# inventory.yml
---
all:
  vars:
    ansible_connection: ansible.netcommon.netconf
    ansible_user: "lab.nxprep"       # control user example (use lab credentials in your environment)
    ansible_password: "Lab@123"
    ansible_network_os: cisco.nxos.nxos
  children:
    spines:
      hosts:
        10.15.1.11:
        10.15.1.12:
    leafs:
      hosts:
        10.15.1.13:
        10.15.1.14:
        10.15.1.15:

What just happened: The inventory maps each device by management IP and instructs Ansible to use the ansible.netcommon.netconf connection plugin. Setting ansible_network_os helps some modules determine device-specific behavior. Using management IPs (10.15.1.11–15) allows Ansible to open NETCONF sessions to each device instead of using CLI/text parsing.

Real-world note: In production, credentials are handled by a vault or secrets manager rather than being in plain text in an inventory. This example uses Lab@123 for clarity.

Verify:

% cat inventory.yml
---
all:
  vars:
    ansible_connection: ansible.netcommon.netconf
    ansible_user: "lab.nxprep"
    ansible_password: "Lab@123"
    ansible_network_os: cisco.nxos.nxos
  children:
    spines:
      hosts:
        10.15.1.11:
        10.15.1.12:
    leafs:
      hosts:
        10.15.1.13:
        10.15.1.14:
        10.15.1.15:

Step 3: Retrieve device state with netconf_get (pre-change audit)

What we are doing: Use netconf_get to retrieve the current VLAN or interface YANG state from a leaf (L1 at 10.15.1.13). This pre-change audit confirms existing state and shows how NETCONF returns structured data.

# playbook: netconf_get_vlans.yml
- name: NETCONF get VLAN state from leaf
  hosts: 10.15.1.13
  gather_facts: no
  tasks:
    - name: Retrieve VLAN configuration via NETCONF
      ansible.netcommon.netconf_get:
        filter: '<filter><vlans xmlns="http://cisco.com/ns/yang/cisco-nx-os-device"><vlan></vlan></vlans></filter>'
      register: vlan_state
    - name: Show VLAN state
      debug:
        var: vlan_state

What just happened: The playbook opens a NETCONF session to 10.15.1.13 and issues a <get> using the provided XML filter to limit returned data to VLAN information (using the device YANG namespace). The data comes back as structured XML parsed by Ansible, which you can then inspect in your automation logic.

Real-world note: Always do a targeted get with a precise filter in production to reduce load on devices and to make automation deterministic.

Verify:

% ansible-playbook -i inventory.yml netconf_get_vlans.yml

PLAY [NETCONF get VLAN state from leaf] ****************************************

TASK [Retrieve VLAN configuration via NETCONF] *********************************
ok: [10.15.1.13] => {
    "changed": false,
    "rpc": {
        "filter": "<filter><vlans xmlns=\"http://cisco.com/ns/yang/cisco-nx-os-device\"><vlan></vlan></vlans></filter>"
    },
    "response": "<data><vlans xmlns=\"http://cisco.com/ns/yang/cisco-nx-os-device\"><vlan><vlan-id>10</vlan-id><name>Users</name></vlan></vlans></data>"
}

TASK [Show VLAN state] **********************************************************
ok: [10.15.1.13] => {
    "vlan_state": {
        "changed": false,
        "response": "<data><vlans xmlns=\"http://cisco.com/ns/yang/cisco-nx-os-device\"><vlan><vlan-id>10</vlan-id><name>Users</name></vlan></vlans></data>"
    }
}

PLAY RECAP **********************************************************************
10.15.1.13                 : ok=2    changed=0    unreachable=0    failed=0

(The response contains the VLAN data in XML. Exact tags/namespaces depend on device YANG models.)


Step 4: Push configuration with netconf_config (create VLANs)

What we are doing: Use netconf_config to create VLAN 100 ("Overlay") and VLAN 101 ("Web Servers") on the leaf devices via NETCONF. This shows how to express configuration as model-driven XML and apply it without using CLI text.

# playbook: netconf_config_vlans.yml
- name: Push VLANs via NETCONF
  hosts: leafs
  gather_facts: no
  tasks:
    - name: Configure VLANs using NETCONF
      ansible.netcommon.netconf_config:
        target: running
        config: |
          <config>
            <vlans xmlns="http://cisco.com/ns/yang/cisco-nx-os-device">
              <vlan>
                <vlan-id>100</vlan-id>
                <name>Overlay</name>
              </vlan>
              <vlan>
                <vlan-id>101</vlan-id>
                <name>Web Servers</name>
              </vlan>
            </vlans>
          </config>

What just happened: The playbook opens NETCONF sessions to each leaf (10.15.1.13–15) and submits an XML <edit-config> payload to the running datastore. The device’s NETCONF server receives structured instructions to create VLAN 100 and 101 and validates them against its schema before committing.

Real-world note: Using target: running applies changes immediately. In production, you might push to a candidate datastore and perform validation before commit when devices support it.

Verify:

% ansible-playbook -i inventory.yml netconf_config_vlans.yml

PLAY [Push VLANs via NETCONF] ***************************************************

TASK [Configure VLANs using NETCONF] ********************************************
changed: [10.15.1.13]
changed: [10.15.1.14]
changed: [10.15.1.15]

PLAY RECAP **********************************************************************
10.15.1.13                 : ok=1    changed=1    unreachable=0    failed=0
10.15.1.14                 : ok=1    changed=1    unreachable=0    failed=0
10.15.1.15                 : ok=1    changed=1    unreachable=0    failed=0

(Ansible reports changed when the device accepted and applied configuration.)


Step 5: Post-change verification with netconf_get (confirm changes)

What we are doing: Re-run a netconf_get to retrieve VLANs and confirm that VLANs 100 and 101 are present on the leaves. This is the post-change verification step every production change should include.

# playbook: netconf_get_vlans_post.yml
- name: NETCONF get VLAN state after change
  hosts: leafs
  gather_facts: no
  tasks:
    - name: Retrieve VLAN configuration via NETCONF
      ansible.netcommon.netconf_get:
        filter: '<filter><vlans xmlns="http://cisco.com/ns/yang/cisco-nx-os-device"><vlan></vlan></vlans></filter>'
      register: vlan_state_post
    - name: Show VLAN state
      debug:
        var: vlan_state_post.response

What just happened: The playbook fetched VLAN data from each leaf. The debug output will include the XML containing VLAN 100 and 101 if the prior netconf_config succeeded and the device returned the expected state.

Real-world note: Always include automated post-change asserts in your automation pipeline. If an expected element is missing, treat the run as failed and trigger rollback or alerting.

Verify:

% ansible-playbook -i inventory.yml netconf_get_vlans_post.yml

PLAY [NETCONF get VLAN state after change] **************************************

TASK [Retrieve VLAN configuration via NETCONF] *********************************
ok: [10.15.1.13] => {
    "changed": false,
    "response": "<data><vlans xmlns=\"http://cisco.com/ns/yang/cisco-nx-os-device\"><vlan><vlan-id>10</vlan-id><name>Users</name></vlan><vlan><vlan-id>100</vlan-id><name>Overlay</name></vlan><vlan><vlan-id>101</vlan-id><name>Web Servers</name></vlan></vlans></data>"
}
ok: [10.15.1.14] => {
    "changed": false,
    "response": "<data><vlans xmlns=\"http://cisco.com/ns/yang/cisco-nx-os-device\"><vlan><vlan-id>100</vlan-id><name>Overlay</name></vlan><vlan><vlan-id>101</vlan-id><name>Web Servers</name></vlan></vlans></data>"
}
ok: [10.15.1.15] => {
    "changed": false,
    "response": "<data><vlans xmlns=\"http://cisco.com/ns/yang/cisco-nx-os-device\"><vlan><vlan-id>100</vlan-id><name>Overlay</name></vlan><vlan><vlan-id>101</vlan-id><name>Web Servers</name></vlan></vlans></data>"
}

TASK [Show VLAN state] **********************************************************
ok: [10.15.1.13] => {
    "vlan_state_post.response": "<data><vlans xmlns=\"http://cisco.com/ns/yang/cisco-nx-os-device\">...</vlans></data>"
}
ok: [10.15.1.14] => {
    "vlan_state_post.response": "<data><vlans xmlns=\"http://cisco.com/ns/yang/cisco-nx-os-device\">...</vlans></data>"
}
ok: [10.15.1.15] => {
    "vlan_state_post.response": "<data><vlans xmlns=\"http://cisco.com/ns/yang/cisco-nx-os-device\">...</vlans></data>"
}

PLAY RECAP **********************************************************************
10.15.1.13                 : ok=2    changed=0    unreachable=0    failed=0
10.15.1.14                 : ok=2    changed=0    unreachable=0    failed=0
10.15.1.15                 : ok=2    changed=0    unreachable=0    failed=0

(Responses show the presence of VLAN 100 and 101 in the XML payload.)


Verification Checklist

  • Check 1: Ansible and Cisco collections installed — verify with ansible --version and ansible-galaxy collection list.
  • Check 2: Inventory configured to use NETCONF and contains management IPs 10.15.1.11–15 — verify by cat inventory.yml.
  • Check 3: Pre-change state retrieved via netconf_get shows current VLANs — verify with the netconf_get_vlans.yml output.
  • Check 4: After netconf_config, netconf_get shows VLAN 100 and 101 on leaves 10.15.1.13–15.

Common Mistakes

SymptomCauseFix
Ansible fails to connect: "Unable to open NETCONF session"Inventory uses wrong connection plugin (e.g., network_cli) or NETCONF not enabled on deviceSet ansible_connection: ansible.netcommon.netconf in inventory and ensure device NETCONF server is enabled
Playbook tasks show "failed" with authentication errorWrong username/password in inventoryVerify credentials and use a vault or secrets manager; for lab use user lab.nxprep and password Lab@123 if configured
netconf_get returns empty data or no VLAN entriesXML filter is incorrect or uses the wrong YANG namespaceUse device-specific YANG namespace; start with a broader filter to discover namespaces, then narrow down
Devices report changed: false on netconf_config although expected to changeThe payload matches existing state (idempotence) or the target datastore not correctConfirm desired state differs, or push to target: running; inspect returned RPC errors for validation issues
Firewall or network blocking NETCONF sessionsSSH/NETCONF port blocked between control host and deviceVerify reachability (ping/SSH) and ensure NETCONF/SSH ports allowed; check management plane ACLs

Key Takeaways

  • NETCONF is a model-driven, structured protocol over SSH that provides reliable, schema-validated configuration and retrieval compared to free-form CLI parsing.
  • Ansible’s netconf_get and netconf_config let you perform pre-change audit, push structured changes, and verify post-change state in an idempotent manner.
  • Always design automation with pre-change gets and post-change validation. Treat device responses as authoritative and program defensively (validate, verify, and enable rollback where supported).
  • In production, secrets should be stored securely (vaults, secrets managers), and NETCONF payloads should be scoped with precise filters to avoid unnecessary device load.

Warning: When converting existing CLI-based workflows to NETCONF, validate that the device YANG models cover the state you need — not all CLI commands map one-to-one to YANG elements. Always test in staging before production.


This completes Lesson 4: "Ansible with NETCONF". In the next lesson we will integrate these model-driven playbooks into a CI/CD pipeline to perform staged deployments from Git-managed intent.