Multi-Device Automation Scripts
Objective
This lesson teaches you how to build and run multi-device Python automation scripts that configure several IOS XE devices concurrently. You will learn how to structure an inventory, use threading (concurrency) to push configuration to multiple devices, add robust error handling and retries, and implement logging so you can audit what happened. In production, these techniques let you provision hundreds or thousands of switches consistently and quickly — reducing human error and mean time to deploy.
Real-world scenario: An operations team must deploy a standardized management VLAN and NTP configuration to dozens of access switches after a maintenance window. Using a multi-device Python script with concurrency and logging turns a hours-long manual job into a repeatable, auditable task.
Quick Recap
Refer to the topology used in Lesson 1 for connectivity. This lesson does not add new devices or new IP addresses; it operates against the same IOS XE devices from Lesson 1. Example hostnames used in scripts are:
- sw1.lab.nhprep.com
- sw2.lab.nhprep.com
- sw3.lab.nhprep.com
(We use hostnames in examples so you can substitute the corresponding management IPs from your lab inventory.)
Topology (reference)
Use the same physical/logical topology from Lesson 1. This lesson targets the management plane of the devices (their management IPs reachable from your automation workstation).
Device Table
| Device | Role | Management FQDN |
|---|---|---|
| sw1 | Access switch | sw1.lab.nhprep.com |
| sw2 | Access switch | sw2.lab.nhprep.com |
| sw3 | Access switch | sw3.lab.nhprep.com |
Key Concepts — theory before hands-on
- Concurrency (Threading): Running multiple configuration operations concurrently reduces total elapsed time. Python threads allow parallel network I/O-bound tasks; in production, this means faster mass changes. Note: network I/O is typically the bottleneck, so threads yield large gains.
- Idempotence & Templates: Use configuration templates so repeated runs cause no adverse effects. Think of templates like a recipe — the same ingredients produce the same cake.
- Error handling & retries: Network devices can fail transiently. Implement retries with exponential backoff and capture device error responses to avoid partial deployments.
- Logging & Auditing: Detailed logs (timestamp, device, action, result) are essential for post-change troubleshooting and compliance. In production, logs feed ticketing and change management.
- Verification loops: Always verify the change with explicit show commands after pushing config — automation must include verification to be trustworthy.
Step-by-step configuration
Step 1: Create an inventory and configuration template
What we are doing: Define which devices to target and what configuration to apply. A structured inventory (YAML or JSON) and a Jinja-like template keep the script generic and reusable.
! No device CLI commands for this step. The following are shell/Python files created on the automation host.
Create inventory.json:
{
"devices": [
{"host": "sw1.lab.nhprep.com", "username": "admin", "password": "Lab@123"},
{"host": "sw2.lab.nhprep.com", "username": "admin", "password": "Lab@123"},
{"host": "sw3.lab.nhprep.com", "username": "admin", "password": "Lab@123"}
]
}
Create mgmt_template.txt (simple config to ensure management VLAN and NTP):
hostname {{ hostname }}
management vlan 10
interface Vlan10
ip address {{ mgmt_ip }} 255.255.255.0
!
ntp server 198.18.1.1
What just happened: You created a device list with credentials and a configuration template. The template variables (hostname, mgmt_ip) let one template serve many devices. This matters because templates ensure consistency across multiple devices and support idempotence when the automation engine renders them.
Real-world note: Never store plaintext credentials in production inventory. Use vaults or secure credential stores. Here we use plain text for lab simplicity.
Verify:
! On the automation workstation:
cat inventory.json
Expected output:
{
"devices": [
{"host": "sw1.lab.nhprep.com", "username": "admin", "password": "Lab@123"},
{"host": "sw2.lab.nhprep.com", "username": "admin", "password": "Lab@123"},
{"host": "sw3.lab.nhprep.com", "username": "admin", "password": "Lab@123"}
]
}
Step 2: Build a simple single-threaded push script (baseline)
What we are doing: Create a baseline Python script that reads inventory, renders the template per device, and pushes config serially using RESTCONF (HTTP POST/PUT). A baseline script establishes functionality before adding concurrency.
! No device CLI commands for this step.
Create push_baseline.py:
#!/usr/bin/env python3
import json
import requests
from string import Template
with open('inventory.json') as f:
inv = json.load(f)
template = Template(open('mgmt_template.txt').read())
for dev in inv['devices']:
payload = template.substitute(hostname=dev['host'].split('.')[0],
mgmt_ip='192.0.2.10') # lab value; replace with real mgmt IP
url = f"https://{dev['host']}/restconf/data/Cisco-IOS-XE-native:native"
r = requests.put(url, data=payload, auth=(dev['username'], dev['password']),
headers={'Content-Type': 'application/yang-data+json'}, verify=False)
print(dev['host'], r.status_code, r.text)
What just happened: The script reads inventory, fills the template, and issues RESTCONF PUT to replace config under the native YANG path. The HTTP response status indicates success (200/201) or failure. Starting with a single-threaded approach simplifies debugging and ensures the RESTCONF path is correct.
Real-world note: RESTCONF endpoints and YANG paths must match the device model. Test on a single device before scaling.
Verify:
! On automation host, run:
python3 push_baseline.py
Expected output (example successful run):
sw1.lab.nhprep.com 200 {"message":"Configuration applied"}
sw2.lab.nhprep.com 200 {"message":"Configuration applied"}
sw3.lab.nhprep.com 200 {"message":"Configuration applied"}
Step 3: Add threading for concurrency and timeouts
What we are doing: Convert the baseline into a concurrent script using ThreadPoolExecutor to push to devices in parallel, and set per-request timeouts to avoid hung threads.
! No device CLI commands for this step.
Create push_concurrent.py:
#!/usr/bin/env python3
import json
import requests
from string import Template
from concurrent.futures import ThreadPoolExecutor, as_completed
with open('inventory.json') as f:
inv = json.load(f)
template = Template(open('mgmt_template.txt').read())
def push_config(dev):
payload = template.substitute(hostname=dev['host'].split('.')[0],
mgmt_ip='192.0.2.10')
url = f"https://{dev['host']}/restconf/data/Cisco-IOS-XE-native:native"
try:
r = requests.put(url, data=payload, auth=(dev['username'], dev['password']),
headers={'Content-Type': 'application/yang-data+json'},
timeout=10, verify=False)
return (dev['host'], r.status_code, r.text)
except requests.exceptions.RequestException as e:
return (dev['host'], 'ERROR', str(e))
with ThreadPoolExecutor(max_workers=5) as ex:
futures = [ex.submit(push_config, d) for d in inv['devices']]
for fut in as_completed(futures):
print(fut.result())
What just happened: You introduced concurrency via a thread pool, so multiple devices are configured in parallel. Each request has a 10-second timeout to prevent a blocked thread. This design reduces overall run time and prevents the script from stalling on an unreachable device.
Real-world note: Choose max_workers based on your workstation and network device CPU to avoid overwhelming devices or the network control plane.
Verify:
! On automation host:
python3 push_concurrent.py
Expected output (example):
('sw2.lab.nhprep.com', 200, '{"message":"Configuration applied"}')
('sw1.lab.nhprep.com', 200, '{"message":"Configuration applied"}')
('sw3.lab.nhprep.com', 200, '{"message":"Configuration applied"}')
Step 4: Implement error handling, retries, and structured logging
What we are doing: Enhance reliability by retrying transient failures and writing structured logs (timestamp, device, outcome) to a file. This gives you both robustness and an audit trail.
! No device CLI commands for this step.
Create push_robust.py:
#!/usr/bin/env python3
import json, time, logging
import requests
from string import Template
from concurrent.futures import ThreadPoolExecutor, as_completed
logging.basicConfig(filename='push.log', level=logging.INFO,
format='%(asctime)s %(levelname)s %(message)s')
with open('inventory.json') as f:
inv = json.load(f)
template = Template(open('mgmt_template.txt').read())
def push_config_with_retries(dev, retries=3, backoff=2):
attempt = 0
while attempt < retries:
attempt += 1
payload = template.substitute(hostname=dev['host'].split('.')[0],
mgmt_ip='192.0.2.10')
url = f"https://{dev['host']}/restconf/data/Cisco-IOS-XE-native:native"
try:
r = requests.put(url, data=payload, auth=(dev['username'], dev['password']),
headers={'Content-Type': 'application/yang-data+json'},
timeout=10, verify=False)
if r.status_code in (200, 201, 204):
logging.info(f"{dev['host']} SUCCESS status={r.status_code}")
return (dev['host'], 'SUCCESS', r.status_code, r.text)
else:
logging.warning(f"{dev['host']} BAD_STATUS status={r.status_code} body={r.text}")
# Consider retrying on 5xx
except requests.exceptions.RequestException as e:
logging.error(f"{dev['host']} EXC {e}")
time.sleep(backoff ** attempt)
logging.error(f"{dev['host']} FAILED after {retries} attempts")
return (dev['host'], 'FAILED', None, None)
with ThreadPoolExecutor(max_workers=5) as ex:
futures = [ex.submit(push_config_with_retries, d) for d in inv['devices']]
for fut in as_completed(futures):
print(fut.result())
What just happened: The script now retries transient errors with exponential backoff, logs all important events to push.log, and returns structured tuples for easy downstream processing. Logs are crucial for troubleshooting and change auditing.
Real-world note: Logging to centralized systems (syslog, ELK, or SIEM) is recommended for enterprise visibility and compliance.
Verify:
! Run the script:
python3 push_robust.py
! Inspect the log:
cat push.log
Expected push_robust.py stdout:
('sw1.lab.nhprep.com', 'SUCCESS', 200, '{"message":"Configuration applied"}')
('sw2.lab.nhprep.com', 'SUCCESS', 200, '{"message":"Configuration applied"}')
('sw3.lab.nhprep.com', 'SUCCESS', 200, '{"message":"Configuration applied"}')
Expected push.log entries:
2026-04-02 12:00:01,234 INFO sw1.lab.nhprep.com SUCCESS status=200
2026-04-02 12:00:01,567 INFO sw2.lab.nhprep.com SUCCESS status=200
2026-04-02 12:00:02,001 INFO sw3.lab.nhprep.com SUCCESS status=200
Step 5: Verify device configuration with show commands and collect output
What we are doing: After pushing config, run verification show commands on each device to confirm the desired state. This script uses SSH (or a RESTCONF GET) to pull verification output and logs it for auditing.
! Example verification via device CLI (SSH). The following are commands you run on the network devices to verify configuration.
Verification show commands (run on each device or via automation that executes them):
show running-config | include hostname
hostname sw1
show running-config | section interface Vlan10
interface Vlan10
ip address 192.0.2.10 255.255.255.0
show ntp status
Clock is synchronized, stratum 2, reference is 198.18.1.1
What just happened: These show commands confirm the hostname, management VLAN interface, and NTP status. The outputs demonstrate that the template rendered correctly and the devices are synchronized to the designated NTP server.
Real-world note: Verification must be part of your automation pipeline. Automated rollbacks or alerting should trigger if verification fails.
Verify (automation host approach):
! Example automated verification using RESTCONF GET for the hostname (conceptual):
curl -k -u admin:Lab@123 https://sw1.lab.nhprep.com/restconf/data/Cisco-IOS-XE-native:native/hostname
Expected output (conceptual JSON):
{"hostname":"sw1"}
Verification Checklist
- Check 1: Configuration templates rendered correctly — verify by checking
show running-config | include hostnameon each device. - Check 2: Management interface exists with the expected IP — verify with
show running-config | section interface Vlan10. - Check 3: NTP is configured and the device is synchronized — verify with
show ntp status.
Common Mistakes
| Symptom | Cause | Fix |
|---|---|---|
| Script times out on a subset of devices | Wrong hostname or DNS not resolving lab hostnames | Use management IPs or ensure DNS resolves hostnames; verify with ping from automation host |
| HTTP/RESTCONF 401 Unauthorized | Wrong credentials in inventory | Update inventory credentials or use a secure credential store; test with one device first |
| Partial deployment (some devices configured, others not) | No retries and transient network glitches | Add retries/backoff and logging; re-run script with idempotent templates |
| Logs show "SSL certificate verify failed" | Devices use self-signed certs and verify=True | For lab, use verify=False in requests; in production, manage proper CA-signed certs |
Key Takeaways
- Use templates and an inventory to make multi-device changes repeatable and idempotent — this prevents configuration drift.
- Concurrency (threading) speeds up mass deployments; tune thread counts to match your environment and device capabilities.
- Always implement retries, timeouts, and structured logging — this provides resilience and auditability.
- Automation must include verification steps; if verification fails, automation should either rollback or alert operators for manual intervention.
Final real-world insight: In production networks, automation is only as good as its safety net — testing, verification, logging, and secure credential handling are non-negotiable. Start small, verify, then scale your multi-device automation.