Lesson 7 of 7

ISE Performance Tuning

Objective

This is Lesson 7 of 7 for Lab 34: ISE Deployment — Planning to Production.

Lesson title: ISE Performance Tuning

In this lesson you will learn how to tune Cisco ISE for performance by optimizing profiling probes, logging and accounting, and purge (retention) policies. We explain why each setting matters in production, give concrete configuration examples using the automation artifacts referenced in the course material, and show how to verify the changes. In a real enterprise, these settings determine whether ISE can handle peak TPS (transactions per second), keep useful logs for compliance, and avoid CPU/disk overloads that impact authentication and authorization.

Real-world scenario: In a large campus with thousands of mobile and IoT endpoints, profiling probes and verbose logging can quickly swamp an ISE deployment. Tuning probes and retention policies reduces CPU/TPS load and limits database growth so authentication latency remains low and forensic logs are retained only as required by policy.

Quick Recap

This lesson builds on the topology and nodes you configured earlier in Lesson 1. No new physical devices or IP addresses are added in this lesson — we work with the existing ISE nodes and the automation playbooks introduced previously (iseee.*.yaml). All actions target the ISE control plane and its configuration management.

Key Concepts

  • Profiling probes: Probes collect endpoint telemetry (DHCP, RADIUS, SNMP, NetFlow) to classify devices. Excessive probes or overly broad sampling increases CPU and TPS. In production you tune probe sampling and enable only the necessary probes for your use case.
    • Protocol behavior: probes generate transactions to the ISE profiling engine and write classification events to the Policy Service Node (PSN). This contributes to TPS.
  • Logging & Accounting: Accounting and audit logging generate continuous writes to disk/DB. Accounting events are required for RADIUS accounting and troubleshooting; audit logs are required for compliance. Retain only what is necessary to control disk usage and query performance.
    • Packet flow: RADIUS accounting packets arrive at the PSN, are processed and written to the DB — each accounting entry is a TPS cost.
  • Purge (retention) policies: These control how long logs and tracking records are kept. Long retention requires larger DB and more I/O; short retention may violate compliance. Choose retention based on TPS capacity and legal requirements.
  • Controlled restart and certificate operations: From ISE 3.3+, controlled application restarts can be scheduled per node; older versions required full node reboots for some certificate changes. Restart time improvements in recent ISE versions reduce planned maintenance windows.
    • Observed behavior: restart times improved from ~20 minutes (ISE 3.2) to ~5.5 minutes (ISE 3.4) when using optimized restart commands.
  • Automation (ISE Eternal Evaluation playbooks): Use automation artifacts (iseee.*.yaml) for repeatable configuration: certificates, provisioning, patching, purge, and backup. Automation reduces human error and allows consistent tuning across nodes.

Tip: Think of profiling probes as sensors — more sensors give better visibility but also consume processing capacity. Tune sensors to the minimum set that achieves required visibility.


Step-by-step configuration

Each step includes the exact commands/artifacts referenced in the course material, explanation of why they matter, and verification.

Step 1: Disable DTLS requirement for network device authentication (reduce CPU/TPS for devices that don't need DTLS)

What we are doing: We modify the network device authentication settings to avoid forcing DTLS for devices that cannot use it. Disabling DTLS requirement for legacy or non-DTLS-capable devices reduces CPU overhead on the ISE node and can decrease TLS/DTLS session churn.

# Example automation snippet from deployment playbook (iseee.deploy.yaml)
network_device:
  - name: lab-mr46-1
    description: "Access switch in building A"
    profileName: Cisco
    authenticationSettings:
      dtlsRequired: false
      enableKeyWrap: false
      enableMultiSecret: 'false'
      keyEncryptionKey: ''
      keyInputFormat: ASCII

What just happened: The playbook fragment sets dtlsRequired: false for the network device entry. This tells ISE not to require DTLS for secure communication with this authenticator. On devices that support only RADIUS/TLS without DTLS, avoiding DTLS reduces processing and potential DTLS handshake failures that can otherwise cause retries and additional TPS.

Real-world note: In production, only disable DTLS for devices that cannot support it — for high-security integrations (pxGrid, posture), you typically want DTLS enabled.

Verify:

# Run the configure/deploy playbook and obtain facts
iseee.deploy.yaml
iseee.facts.yaml

# Expected snippet from iseee.facts.yaml (verification output)
network_devices:
  - name: lab-mr46-1
    profileName: Cisco
    authenticationSettings:
      dtlsRequired: false
      enableKeyWrap: false

Step 2: Reduce profiling probe sampling and disable unused probes

What we are doing: Tune profiling to sample fewer events or disable probes that are not needed (e.g., NetFlow when DHCP and RADIUS are sufficient). This reduces classification transactions and lowers TPS on the profiling engine.

# Example playbook file names that control configuration and profiling
iseee.configure.yaml
# Within the playbook you would disable or tune probes; this example is the identifier used in the material

What just happened: Using the automation artifact (iseee.configure.yaml) we apply profiling configuration changes centrally. By lowering probe sampling rates or disabling specific probes, fewer telemetry events are processed per second by ISE, lowering CPU and improving classification latency for critical flows.

Real-world note: Start in Monitor Mode when tuning profiling so you see the effect without impacting access. Move to Low-Impact or Closed Mode only after validating your probe configuration.

Verify:

# After applying configuration, fetch current profiling probe summary (via automation facts)
iseee.facts.yaml

# Expected verification lines
profiling_probes:
  DHCP: enabled, sampleRate: 10
  RADIUS: enabled, sampleRate: 100
  NetFlow: disabled
  SNMP: disabled

Step 3: Configure log retention (purge) to control DB growth

What we are doing: Configure retention/purge policies so that accounting and debug logs are retained only for the required period (e.g., 90 days for accounting, 7 days for debug). Proper purge policies prevent unbounded DB growth and improve query performance.

# Use the existing automation artifacts to set retention; example artifact names from the material:
iseee.purge.yaml
iseee.backup.yaml
# These artifacts are used to configure log retention and backup schedules

What just happened: The purge automation applies configured retention settings to the ISE database and log system. This removes old records beyond the retention threshold on a scheduled basis and ensures backups capture the retained dataset without unnecessary bloat.

Real-world note: Balance compliance requirements with capacity — if your legal policy requires multi-year retention, plan additional storage nodes or external log archival (e.g., SIEM).

Verify:

# Query configured retention via automation fact output
iseee.facts.yaml

# Expected output snippet
retention:
  accounting_days: 90
  audit_days: 180
  debug_days: 7
last_purge_run: 2025-03-15T03:00:00Z

Step 4: Schedule controlled application restart to apply certificate or node changes

What we are doing: Use controlled application stop and reload to minimize downtime when applying certificate changes or other updates. This applies to ISE 3.3+ where restarts can be scheduled per node, reducing the blast radius.

# Commands referenced in the material for a controlled restart
application stop ise
reload

What just happened: application stop ise gracefully stops ISE application processes, allowing stateful services to close. reload restarts the node. In modern ISE releases the controlled restart minimizes downtime; the restart time improved meaningfully between versions (for example, ~16 minutes in 3.3 and ~5.5 minutes in 3.4 when using optimized restart sequences).

Real-world note: When installing new admin certificates, older ISE versions required all nodes to reboot; from 3.3 the reboot can be scheduled per node, so plan a rolling restart across nodes.

Verify:

# Example verification steps (run after restart completes)
# Check node status via automation facts
iseee.facts.yaml

# Expected output
node_status:
  - name: ise-node-1
    services: all
    last_restart: 2025-03-15T03:10:00Z
restart_time_estimate: 5.5 minutes

Step 5: Optimize policy sets and network device groups to reduce unnecessary policy evaluations

What we are doing: Reduce the policy evaluation load by organizing devices into Network Device Groups and by ordering policy sets so that common matches are evaluated early. This decreases the number of rules each request must traverse, lowering CPU and TPS use.

# Use automation artifacts to manage network device groups and policy set assignments
iseee.deploy.yaml
iseee.configure.yaml
# Example fragment (identifier from course material)
network_device_groups:
  - name: campus-access
    devices:
      - lab-mr46-1
policy_sets:
  - name: corporate-wired
    order: 1
  - name: guest
    order: 10

What just happened: Devices are grouped logically and policy sets are reordered so that high-volume, well-known traffic maps to fast, simple policy branches. This reduces the number of policy evaluations per authentication request and speeds up the authorization decision path.

Real-world note: In production, maintain a playbook that documents how devices are moved between groups during rollout to avoid sudden spikes in policy evaluation.

Verify:

# Fetch policy set order and group membership via automation facts
iseee.facts.yaml

# Expected output snippet
network_device_groups:
  - name: campus-access
    members: [lab-mr46-1]
policy_sets:
  - name: corporate-wired
    order: 1
  - name: guest
    order: 10

Verification Checklist

  • Check 1: DTLS requirement is disabled for legacy devices — verify with iseee.facts.yaml that dtlsRequired: false for each target device.
  • Check 2: Profiling probes set to expected sampling or disabled — verify probe status and sampleRates in iseee.facts.yaml.
  • Check 3: Retention/purge settings applied and last purge timestamp is recent — verify retention values and last_purge_run in iseee.facts.yaml.
  • Check 4: Controlled restart completed without service loss — verify node status services: all and last_restart timestamp.
  • Check 5: Network Device Groups and Policy Sets ordering reflect optimization — verify groups and policy set ordering in iseee.facts.yaml.

Common Mistakes

SymptomCauseFix
Profiling engine CPU high and many classification delaysAll probes enabled with high sampling ratesDisable non-essential probes and lower sampleRate; start in Monitor Mode, then tune
Disk usage grows rapidly after rolloutDefault long retention for audit/accounting logsApply purge policies with appropriate retention (e.g., 90 days accounting); archive older logs to SIEM
RADIUS auth latency spikes after certificate updateForced full cluster reboot or nodes out-of-syncUse controlled application restart (schedule per node) and verify restart times before full rollout
Network devices fail to connect for pxGrid/TLSDTLS/TLS mismatch or missing certsVerify device authentication settings (dtlsRequired) and ensure certificates (system and pxGrid) are installed and trusted
Policy evaluation slow for high-volume devicesPoor policy set order causing extensive rule traversalReorder policy sets and use Network Device Groups so frequent cases match earlier

Key Takeaways

  • Profiling probes and logging are powerful for visibility but are also the largest contributors to TPS and CPU load; tune sample rates and disable unnecessary probes.
  • Purge/retention policies are essential to control DB growth — align retention with compliance and capacity planning.
  • Controlled application restarts and automation artifacts (iseee.*.yaml) reduce downtime and human error during configuration and certificate operations.
  • Organize devices into Network Device Groups and optimize policy set order to reduce per-request policy evaluation overhead.
  • Always validate changes in Monitor Mode and use automation to apply consistent, repeatable configuration across nodes.

Final note: Performance tuning is iterative — monitor TPS, CPU, and disk metrics after each change, and maintain a playbook for rollback and emergency procedures. Use automation artifacts to keep your deployment predictable and reproducible.


If you want, I can convert the high-level automation artifacts above into a concrete playbook fragment you can use in your lab environment (iseee.configure.yaml and iseee.purge.yaml examples) that follow the identifiers shown in the course material.