Lesson 5 of 6

AI-Powered Code Review

Objective

In this lesson you will use AI-style automated checks to review Infrastructure-as-Code (IaC) artifacts for security issues, policy violations, and best-practice problems before they reach the network. This reduces human error, enforces consistent naming and defaults, and prevents risky configuration from being pushed into production. In real production networks, this is used as a pre-change gate in CI/CD pipelines to catch issues such as incorrect subnet attributes, improper multicast or ARP settings, or inconsistent default suffixes before the controller (APIC) or device configuration is modified.

Topology (quick reference)

Refer to the topology used in Lesson 1. This lesson does not add new physical devices — we operate against the IaC artifacts that define the APIC tenant and the device fabric.

ASCII topology (only interfaces shown here that are referenced by the IaC artifacts and review):

+-----------------+                       +--------------------+
|   L3OUT1         |                       |   d2d (leaf)       |
|  Eth1/1: 1.1.1.1/24|<-----BD1 subnet---->|  lo250: 10.250.250.1|
|  Eth1/2: fd00:0:abcd:1::1/64 |           |                    |
+-----------------+                       +--------------------+
             Bridge-domain BD1 subnets: 1.1.1.1/24 and fd00:0:abcd:1::1/64

Important: The IP addresses and interface names in the diagram are taken exactly from the IaC artifacts used in this lab: 1.1.1.1/24, fd00:0:abcd:1::1/64, and anycast/pim address 10.250.250.1 assigned to fabric loopback lo250.

Device Table

DeviceRole
APIC (IaC target)Controller consuming IaC
d2dLeaf switch referenced by IaC
L3OUT1External L3 connectivity referenced by IaC

Quick Recap

  • The IaC artifacts we are reviewing define tenants and bridge domains (BDs), VRFs, and L3outs.
  • The example artifacts include BD1 with IPv4 1.1.1.1/24 and IPv6 fd00:0:abcd:1::1/64, VRF: VRF1, and fabric settings such as anycast gateway 10.250.250.1.
  • The environment uses a validation tool (nac-validate) and supports programmable rule checks (rules can be written as Python classes). We will use those to run automated checks and an AI-like reviewer (a small automated check script) to flag security and best-practice issues.

Key Concepts (before hands-on)

  • Schema validation: Schema checks verify that the IaC files conform to the expected data model (fields, types, required keys). This is the first line of defense: syntactic validation. In practice, schemas prevent malformed objects from being applied to the controller.
  • Policy rules (semantic checks): Beyond schema, semantic rules ensure settings meet organizational policy (for example, disallowing arp_flooding on production BDs). These are typically enforced by custom rule classes or a rule engine.
  • AI-style linting: Think of AI linting as an intelligent reviewer that combines schema results, policy rules, and heuristic checks (naming consistency, suspicious MAC addresses, or contradictory flags). It produces prioritized findings for operators.
  • Pre-change vs post-change validation: Pre-change validation runs in CI/CD before any device change. Post-change validation verifies the network state matches the intended IaC after deployment (not covered deeply here, but important in production workflows).
  • Why this matters: In production, a single misconfigured bridge-domain flag (for example enabling broad multicast flood) can cause widespread traffic black-holing or excessive CPU on devices. Automated reviews catch such mistakes earlier and faster than manual inspection.

Step-by-step configuration

Step 1: Prepare the IaC artifacts (defaults and tenant)

What we are doing: Create the IaC YAML files that describe the tenant, bridge-domain BD1, and fabric settings. This gives us the content the validator and AI-reviewer will examine. Accurate, canonical YAML is essential because schema and rule checks operate on these files.

YAML files (save as defaults.yaml and tenants.yaml). These are rewritten, simplified examples using the exact values from the reference material:

# defaults.yaml
apic:
  tenants:
    bridge_domains:
      name_suffix: _bd
      unicast_routing: false
# tenants.yaml
apic:
  tenants:
    - name: ABC
      bridge_domains:
        - name: BD1
          alias: ABC_BD1
          mac: 00:22:BD:F8:19:FE
          virtual_mac: 00:23:BD:F8:19:12
          ep_move_detection: true
          arp_flooding: false
          ip_dataplane_learning: false
          limit_ip_learn_to_subnets: false
          multi_destination_flooding: encap-flood
          unknown_unicast: proxy
          unknown_ipv4_multicast: flood
          unknown_ipv6_multicast: flood
          unicast_routing: true
          clear_remote_mac_entries: true
          advertise_host_routes: true
          l3_multicast: false
          multicast_arp_drop: false
          vrf: VRF1
          nd_interface_policy: "ND_INTF_POL1"
          subnets:
            - ip: 1.1.1.1/24
              description: My Desc
              primary_ip: true
              public: false
              shared: false
              virtual: false
              igmp_querier: true
              nd_ra_prefix: true
              no_default_gateway: false
            - ip: fd00:0:abcd:1::1/64
              description: My IPv6 Desc
              primary_ip: true
              public: true
              shared: false
              virtual: false
              igmp_querier: true
              nd_ra_prefix: true
              no_default_gateway: false
          l3outs:
            - L3OUT1
          dhcp_labels:
            - dhcp_relay_policy: DHCP-RELAY1

What just happened: You have defined the tenant and BD in YAML, including both IPv4 and IPv6 subnets and flags such as arp_flooding and igmp_querier. These fields are the input to schema and rule checks.

Real-world note: Using defaults (like name_suffix) prevents inconsistent object names across multiple teams. In production, defaults.yaml is commonly shared in a Git repository to ensure consistency.

Verify:

nac-validate -s .schema ./tenants.yaml

Expected output (complete):

nac-validate -s .schema ./tenants.yaml
Checking file: ./tenants.yaml
Schema validation: PASS
Files checked: 1
Errors: 0
Warnings: 0
Validation complete: All schema checks passed.

Step 2: Run a pre-change semantic rule check (automated policy checks)

What we are doing: Execute the policy rule engine to run semantic checks that go beyond schema (for example: checking that arp_flooding is false in production, ensuring unicast_routing true when advertise_host_routes is true, and confirming that the IPv6 subnet is marked public if required). This detects policy and security issues.

Create a simple Python-based rule checker file ai_review.py (example rule-based scanner). Save this file next to the YAML:

# ai_review.py (sample rule-based scanner; run with python ai_review.py tenants.yaml)
import sys, yaml

def load(file):
    with open(file) as f:
        return yaml.safe_load(f)

def check_tenant(doc):
    findings = []
    tenants = doc.get('apic', {}).get('tenants', [])
    for t in tenants:
        for bd in t.get('bridge_domains', []):
            if bd.get('arp_flooding') is True:
                findings.append(("HIGH", "ARP flooding is enabled on BD1; this can cause broadcast storms"))
            if bd.get('unicast_routing') is False and bd.get('advertise_host_routes'):
                findings.append(("MEDIUM", "advertise_host_routes is true while unicast_routing is false"))
            for s in bd.get('subnets', []):
                if s.get('public') and not s.get('primary_ip'):
                    findings.append(("LOW", "Public subnet without primary_ip marked"))
    return findings

if __name__ == '__main__':
    doc = load(sys.argv[1])
    findings = check_tenant(doc)
    if not findings:
        print("AI-REVIEW: No findings. Config looks healthy.")
    else:
        print("AI-REVIEW: Findings:")
        for sev, msg in findings:
            print(f"{sev}: {msg}")

Run the AI-style review:

python ai_review.py tenants.yaml

What just happened: The Python script scanned the tenant YAML and applied a small set of policy checks. These are examples of the kind of semantic rules you would encode as Python classes in a real validation pipeline. In production, these rules are more comprehensive and include organizational policies, security baselines, and live-device expectations.

Real-world note: Rule checks are often implemented as part of CI (pre-merge or pre-deploy). They flag problems early and are versioned alongside IaC.

Verify:

python ai_review.py tenants.yaml

Expected output (complete):

AI-REVIEW: No findings. Config looks healthy.

(If there are findings, the output lists them by severity and message. You must act on HIGH/CRITICAL findings before deployment.)

Step 3: Simulate an AI suggestion and apply a recommended fix

What we are doing: Suppose the AI review finds a semantic mismatch: advertise_host_routes is set true while unicast_routing is false in defaults (an inconsistent policy). We will update defaults.yaml to set unicast_routing: true to align policy. This demonstrates how AI-driven checks guide safe edits.

Modify defaults.yaml (change unicast_routing to true):

# defaults.yaml (updated)
apic:
  tenants:
    bridge_domains:
      name_suffix: _bd
      unicast_routing: true

What just happened: Enabling unicast_routing in defaults aligns the default behavior of new bridge-domains with advertise_host_routes and VRF-based routing. Protocol-level implication: when unicast routing is enabled in the BD, the BD will allow host routes to be installed and integrated into L3 forwarding; without it, host routes are not handled as expected.

Real-world note: In data center deployments where application subnets are expected to be routed, setting unicast_routing true by default avoids accidental L2-only domains that cannot reach external networks.

Verify:

nac-validate -s .schema ./defaults.yaml

Expected output (complete):

nac-validate -s .schema ./defaults.yaml
Checking file: ./defaults.yaml
Schema validation: PASS
Files checked: 1
Errors: 0
Warnings: 0
Validation complete: All schema checks passed.

Then re-run the AI-style check against the combined merged view (simulate by merging defaults into tenant and running ai_review again). For the lab, re-run the ai_review against tenants.yaml to confirm no semantic issues caused by the original mismatch:

python ai_review.py tenants.yaml

Expected output (complete):

AI-REVIEW: No findings. Config looks healthy.

Step 4: Produce a final AI-review report and prepare for commit

What we are doing: Generate a final report that lists any items that need human attention, summarizes the changes, and is placed in the Git commit message or CI artifact. This provides traceability and a clear remediation path for reviewers.

Create a small textual report file report.txt (example output of the pipeline):

AI REVIEW REPORT
Date: 2025-01-01T12:00:00Z
Files checked: tenants.yaml, defaults.yaml
Schema: PASS
Semantic rules: PASS
Recommendations:
  - Ensure any physical L3-out device config matches 1.1.1.1/24 and IPv6 fd00:0:abcd:1::1/64 configuration shown in tenants.yaml
  - Confirm anycast gateway 10.250.250.1 is configured on fabric loopback lo250 on all relevant devices
Conclusion: No blocking findings. Safe to proceed to staged deployment.

Save and inspect the report:

cat report.txt

What just happened: The report consolidates schema and semantic results for auditors or CI. In production, this file would be archived alongside the merge commit and included in the CI job artifacts.

Real-world note: The line "Confirm anycast gateway 10.250.250.1 is configured on fabric loopback lo250" is a cross-check between IaC intent and device configuration. Pre-deploy checks often include a device-state comparator that will confirm the live switch loopbacks match the IaC anycast value.

Verify:

cat report.txt

Expected output (complete):

AI REVIEW REPORT
Date: 2025-01-01T12:00:00Z
Files checked: tenants.yaml, defaults.yaml
Schema: PASS
Semantic rules: PASS
Recommendations:
  - Ensure any physical L3-out device config matches 1.1.1.1/24 and IPv6 fd00:0:abcd:1::1/64 configuration shown in tenants.yaml
  - Confirm anycast gateway 10.250.250.1 is configured on fabric loopback lo250 on all relevant devices
Conclusion: No blocking findings. Safe to proceed to staged deployment.

Verification Checklist

  • Check 1: Schema validation passes for every IaC file — verify with:

    nac-validate -s .schema ./tenants.yaml
    

    Expected: "Schema validation: PASS" and "Errors: 0".

  • Check 2: Semantic (AI-style) checks produce zero HIGH findings — verify with:

    python ai_review.py tenants.yaml
    

    Expected: "AI-REVIEW: No findings. Config looks healthy."

  • Check 3: The final report contains the anycast and subnet cross-check suggestions — verify with:

    cat report.txt
    

    Expected: Report text containing the anycast and subnet lines shown above.

Common Mistakes

SymptomCauseFix
Schema validation fails with "missing required key 'bridge_domains'"The YAML structure does not match expected schema (wrong indentation or missing top-level key)Reformat YAML to match schema; ensure apic: tenants: and bridge_domains: keys exist with correct indentation and types
AI-review flags "advertise_host_routes true while unicast_routing false"Defaults or BD settings are inconsistent — routing cannot advertise host routes unless unicast routing is enabledSet unicast_routing: true in defaults.yaml or in the BD definition
Report warns "anycast gateway mismatch"IaC declares pim_anycast_ip: 10.250.250.1 but device loopback not configured or configured to a different valueUpdate device loopback lo250 to 10.250.250.1 on fabric devices or correct IaC if IaC is wrong
Public IPv6 subnet flagged but primary_ip is falseIPv6 subnet is marked public but not set as primary in IaC, causing inconsistent deployment behaviorSet primary_ip: true for the IPv6 subnet or change public: false per policy

Key Takeaways

  • Use schema validation (nac-validate) as the first automated gate; it prevents syntactic errors from ever reaching devices.
  • Implement semantic rules (Python rule classes or a rule engine) to capture organizational policy and security expectations; these detect dangerous combinations of flags before deployment.
  • AI-style reviews are most effective when they provide actionable findings (severity, explanation, remediation). Treat them as a guidance layer — always require human sign-off for HIGH/CRITICAL findings.
  • In production, tie these checks into CI/CD: run them on PRs, fail merges on HIGH issues, and archive review reports for compliance and auditability.

Final note: Think of IaC validation as the network equivalent of static code analysis in software development — it finds subtle, logic-level problems early. In enterprise deployments where uptime and security matter, automated pre-change validation is essential to move faster with confidence.