Lesson 3 of 6

Anomaly Detection in Network Traffic

Objective

In this lesson you'll enable and validate ML-based anomaly detection on the AIOps platform to identify unusual network behavior such as elephant flows and idle VPN sessions. You will configure adaptive thresholding, an automated detector for idle Remote Access (RA) VPN sessions, and an alert action that recommends remediation. This matters in production because ML detection reduces noise from static thresholds, surfaces previously unseen problems, and can recommend configuration changes that prevent capacity or security incidents.

Real-world scenario: a service desk is seeing sporadic VPN drops and performance issues. ML-based insights detect idle VPN sessions consuming headend resources and recommend a session timeout change to prevent headend overload — allowing operators to make a data-driven policy change.

Quick Recap

Refer to the topology established in Lesson 1. This lesson uses the same monitoring pipeline and adds an AIOps server to process telemetry and apply ML detectors. No new routers or switches are being added; the AIOps appliance is the primary device for configuration.

ASCII Topology (same logical devices used in Lesson 1; only AIOps appliance shown here):

                       +----------------------+
                       |   AIOPS Appliance    |
                       |   IP: 10.10.255.10   |
                       |   Hostname: aiops-1  |
                       +----------+-----------+
                                  |
                                  | NetFlow / Syslog / Telemetry
                                  |
             +--------------------+--------------------+
             |                                         |
     +-------+--------+                         +------+-------+
     | VPN Headend    |                         | NetFlow      |
     | Router         |                         | Exporter     |
     | IP: 10.10.1.1  |                         | IP: 10.10.4.1|
     +----------------+                         +--------------+

Device Table

DeviceRoleManagement IP
aiops-1AIOps ML engine & UI10.10.255.10
vpn-headendVPN concentrator / headend10.10.1.1
netflow-expNetFlow exporter (switch)10.10.4.1

Tip: In production, the AIOps appliance ingests telemetry (NetFlow, syslog, streaming telemetry) from many devices. Here we emulate the same flow with a dedicated exporter and headend.

Key Concepts

  • Baseline and Anomaly Detection
    ML detectors build statistical baselines from historical telemetry and compare live telemetry to that baseline. Think of the baseline as the “normal fingerprint” of traffic; anomalies are deviations from that fingerprint.

  • Adaptive Thresholding
    Instead of fixed thresholds (e.g., alert if >1 Gbps), adaptive thresholding adjusts expected behavior based on historical patterns and excludes outliers from historical baselines. This reduces false positives for periodic spikes.

  • Detectors and Actions
    A detector is a rule powered by ML (e.g., idle VPN session detector). Detectors can generate alerts and suggest or trigger remediation actions (e.g., recommend reducing idle timeout). In production, remediation suggestions are reviewed by operators before applying.

  • Streaming Analytics & Latency
    For near-real-time detection, the system processes streaming telemetry with low latency. This matters because high-latency detection yields stale alerts and reduces utility in live incident response.

  • Signal Types (Elephant Flow vs. Idle Session)

    • Elephant flow detection identifies unusually large flows that may hog bandwidth.
    • Idle session detection identifies sessions that remain established but transfer no meaningful traffic.

When the detector runs, it consumes telemetry data, computes feature vectors (e.g., flow size, duration, bytes/sec), and evaluates them against the baseline model to decide if an event is anomalous.

Step-by-step configuration

Step 1: Enable AutoDetect (ML-based outlier detection)

What we are doing: Turn on the ML AutoDetect capability so the appliance can automatically start building baselines and identifying outliers. This is required for any ML detector to operate.

ssh admin@10.10.255.10
enable
configure terminal
aiops autodetect enable
end
write memory
exit

What just happened:

  • aiops autodetect enable activates the background ML pipeline that ingests telemetry and begins baseline computation.
  • write memory persists the configuration so ML remains enabled after restarts. The platform will now collect and profile historical metrics required for adaptive thresholding.

Real-world note: Enabling AutoDetect in production should be scheduled during a maintenance window if initial model training is expected to generate many informational alerts.

Verify:

ssh admin@10.10.255.10
enable
show aiops status
AIOPS STATUS
AutoDetect: ENABLED
ML Pipeline: RUNNING
Baseline Samples Collected: 0
Last Ingest: 2026-04-02 09:15:10 UTC

Step 2: Enable Adaptive Thresholding

What we are doing: Turn on adaptive thresholding so the platform modifies alert thresholds dynamically using historical baselines rather than static values.

enable
configure terminal
aiops adaptive-thresholding enable
end
write memory
exit

What just happened:

  • Adaptive thresholding is activated; the ML models will now use statistical analysis to suggest thresholds. This prevents noisy alerts from normal cyclical changes in traffic (e.g., daily backup windows). The system excludes historical outliers from baseline calculations to improve sensitivity to true anomalies.

Real-world note: Adaptive thresholding is particularly valuable in environments with diurnal patterns (office hours vs. nights) and seasonal traffic.

Verify:

ssh admin@10.10.255.10
enable
show aiops settings adaptive-thresholding
ADAPTIVE THRESHOLDING CONFIGURATION
Status: ENABLED
Outlier Exclusion: ENABLED
Min Sample Window: 7 days
Model Update Frequency: 24 hours

Step 3: Create an Idle VPN Session Detector

What we are doing: Define a detector that identifies idle RA VPN sessions consuming headend resources and sets a recommended remediation (e.g., reduce idle timeout). This demonstrates ML-driven remediation suggestions.

enable
configure terminal
aiops detector create vpn-idle-sessions
 description "Detect idle RA VPN sessions that exceed historical idle duration"
 type session_idle
 scope device 10.10.1.1
 parameters timeout_seconds 3600
 parameters min_sessions 10
 action recommend "Reduce RA VPN idle timeout to 1800 seconds"
end
write memory
exit

What just happened:

  • A detector named vpn-idle-sessions is created.
  • type session_idle tells the ML pipeline to look for sessions that remain established but transfer negligible bytes.
  • scope device 10.10.1.1 limits the detector to the VPN headend.
  • timeout_seconds 3600 sets a detection sensitivity — sessions idle beyond 3600s are flagged.
  • The action recommend configures an automated recommendation (not an enforced change) to reduce the headend idle timeout to 1800 seconds.

Real-world note: Detectors should default to recommendation mode in production. Automatic enforcement risks breaking legitimate long-lived sessions.

Verify:

ssh admin@10.10.255.10
enable
show aiops detectors
DETECTORS
Name: vpn-idle-sessions
Description: Detect idle RA VPN sessions that exceed historical idle duration
Type: session_idle
Scope: device 10.10.1.1
Parameters:
  timeout_seconds: 3600
  min_sessions: 10
Action: recommend "Reduce RA VPN idle timeout to 1800 seconds"
Status: ACTIVE
Last Anomaly Detected: NONE

Step 4: Enable Elephant Flow Detection on NetFlow Stream

What we are doing: Enable elephant-flow detection on the NetFlow stream coming from the exporter so the system identifies unusually large flows that could indicate data exfiltration or saturating transfers.

enable
configure terminal
aiops detector create elephant-flows
 description "Detect unusually large flows from NetFlow export"
 type flow_size
 scope netflow-source 10.10.4.1
 parameters flow_size_bytes_threshold 125000000
 parameters min_duration_seconds 60
 action alert "Elephant flow detected"
end
write memory
exit

What just happened:

  • The elephant-flows detector watches flows exported from 10.10.4.1.
  • flow_size_bytes_threshold 125000000 sets ~125 MB as a threshold for candidate elephant flows. ML evaluation will compare flows against the baseline to determine whether they are anomalous.
  • min_duration_seconds 60 reduces false positives from short bursts by considering only flows lasting 60+ seconds.

Real-world note: Elephant detection helps identify large transfers like backups or data-exfiltration. In data centers this prevents link saturation; in security contexts it highlights suspicious long flows to unknown destinations.

Verify:

ssh admin@10.10.255.10
enable
show aiops detectors elephant-flows
DETECTOR: elephant-flows
Description: Detect unusually large flows from NetFlow export
Type: flow_size
Scope: netflow-source 10.10.4.1
Parameters:
  flow_size_bytes_threshold: 125000000
  min_duration_seconds: 60
Action: alert "Elephant flow detected"
Status: ACTIVE
Last Anomaly Detected: NONE

Step 5: Review Detected Anomalies and Apply Recommendation

What we are doing: Review any ML-generated anomalies and apply the detector’s recommendation for the VPN idle timeout if the evidence supports it. This demonstrates the human-in-the-loop workflow.

ssh admin@10.10.255.10
enable
show aiops anomalies
ANOMALIES
ID: 2026-04-02-0001
Detector: vpn-idle-sessions
Device: 10.10.1.1
Observed Idle Sessions: 27
Avg Idle Duration: 14,500 seconds
Confidence: HIGH
Recommended Action: Reduce RA VPN idle timeout to 1800 seconds
Status: PENDING

If you accept the recommendation:

ssh admin@10.10.1.1
enable
configure terminal
vpn remote-access
  session timeout 1800
end
write memory
exit

What just happened:

  • show aiops anomalies displays an evidence-backed recommendation with confidence metrics.
  • Applying the change on the VPN headend (session timeout 1800) implements the remediation. This reduces resource consumption from idle sessions and addresses the detected imbalance.

Real-world note: Always validate recommendations with stakeholders — some long-lived sessions may be legitimate (e.g., telemetry collectors).

Verify:

ssh admin@10.10.1.1
enable
show vpn remote-access config
VPN REMOTE-ACCESS CONFIGURATION
session timeout: 1800 seconds
active tunnels: 142
idle sessions > 3600s: 3

Verification Checklist

  • Check 1: AutoDetect is enabled
    • Verify with show aiops status and ensure AutoDetect: ENABLED.
  • Check 2: Detectors are active for target devices
    • Verify with show aiops detectors that vpn-idle-sessions and elephant-flows are Status: ACTIVE.
  • Check 3: Evidence and recommendations appear when anomalies occur
    • Verify with show aiops anomalies that anomalies contain device, observed metrics, and recommendations.

Common Mistakes

SymptomCauseFix
No anomalies show after enabling detectorsNo telemetry (NetFlow/syslog) reaching the AIOps applianceConfirm exporter configuration and connectivity; verify NetFlow source IP (10.10.4.1) is allowed/seen by aiops-1
Adaptive thresholding produces many low-confidence alertsModel training window is too small or outlier exclusion disabledIncrease Min Sample Window (e.g., 7 days) and enable outlier exclusion in adaptive settings
Recommendation applied breaks workflowsAutomatic enforcement without stakeholder reviewUse action recommend (notification only) rather than action enforce; always review show aiops anomalies before acting
Detector targets wrong deviceIncorrect scope specified when creating detectorEdit detector scope to the correct device IP (e.g., 10.10.1.1) and re-run detection

Key Takeaways

  • ML-based anomaly detection builds dynamic baselines and reduces alert noise through adaptive thresholding; this is crucial for complex production environments with variable traffic patterns.
  • Detectors operate on telemetry (NetFlow, syslog, streaming) and should be scoped narrowly (per-device or per-source) to reduce false positives.
  • Always prefer recommendations over automated enforcement in initial deployments; human validation prevents unintended disruptions.
  • Use elephant-flow detection to protect capacity and idle-session detection to manage resources (e.g., VPN headend load); both are practical tools for capacity planning and security incident triage.

Final tip: In production, maintain a feedback loop — validate ML recommendations, tune detector parameters (timeout, min_sessions, thresholds), and retrain models when significant topology or usage changes occur.