Lesson 2 of 6

ML-Based Threat Detection

Objective

In this lesson you will enable and validate ML-Based Threat Detection across the NHPREP lab environment. We will deploy machine-learning models that detect malware, DDoS, and lateral movement, configure alert thresholds, and verify detection using synthetic test traffic. This matters in production because ML models detect subtle, multi-dimensional anomalies (like low-and-slow lateral movement or volumetric DDoS) faster than static signatures. Real-world scenario: an enterprise uses ML-assisted detection to identify idle VPN sessions consuming headend resources and to detect lateral movement after a compromised workstation connects to an internal file share.

Quick Recap

Refer to the topology you configured in Lesson 1. This lesson uses the same devices and addressing from Lesson 1 and adds one analytics node (the ML Analytics Manager). No other devices or IPs are added in this lesson.

Note: If you did not complete Lesson 1, stop and complete it first — the ML models require sensors and network flows already exported from the perimeter firewall and VPN headend.

Key Concepts

  • Behavioral vs Signature Detection — ML detection models look at statistical patterns (flows per second, bytes per session, user access patterns) rather than only pattern signatures. In production, this is important because attackers often mutate payloads to evade signatures but still exhibit abnormal behavior.
  • Feature Extraction & Model Inference — Packets and flows are summarized into features (e.g., bytes/packet, session duration, destination diversity). The analytics manager scores each flow/session against trained models and produces an anomaly score used to trigger alerts.
  • Model Lifecycle & Deployment — Models are trained offline or auto-trained on historical telemetry (AutoDetect). Production systems deploy models to inference nodes/sensors; pushing a model changes real-time detection behavior across the environment.
  • Thresholding & Alerting — ML outputs continuous scores; thresholds convert scores into actionable alerts. Adaptive thresholding excludes historical anomalies to avoid false positives. In production, thresholds are tuned per service to balance noise vs sensitivity.
  • Correlation for Root Cause — ML systems correlate anomalies (e.g., DDoS spikes + VPN headend imbalance + idle RA VPN sessions) to provide recommended remediations — e.g., adjust timeouts or scale headend capacity.

Topology (reference from Lesson 1)

(Topology is unchanged; ML Analytics Manager is added logically in the management network)

Because the topology from Lesson 1 already defines exact IPs and interfaces, we reference that diagram there. No new IPs are added here.

Step-by-step configuration

Step 1: Enable ML Threat Detection on the Analytics Manager

What we are doing: We enable the ML threat detection service on the analytics manager node so it can receive telemetry, run inference, and produce alerts. This step is the control-plane activation that allows models to be loaded and scores to be generated.

configure terminal
analytics manager enable ml-threat-detection
analytics manager set organization NHPREP
analytics manager set admin-password Lab@123
exit

What just happened:

  • analytics manager enable ml-threat-detection starts the ML inference engine on the manager; it accepts telemetry and prepares to run models.
  • analytics manager set organization NHPREP assigns the tenant context so detections are tagged correctly.
  • analytics manager set admin-password Lab@123 sets the management password for accessing the analytics UI/API. Enabling the engine is a prerequisite before deploying models to inference nodes.

Real-world note: In production, enabling the manager is usually performed in maintenance windows because it can trigger synchronization to many sensors.

Verify:

show analytics manager status
Analytics Manager Status
------------------------
State: RUNNING
ML Engine: ENABLED
Organization: NHPREP
Admin User: admin
Last Sync: 2026-04-02T09:12:05Z
Active Models: 0
Inference Nodes Registered: 0

Step 2: Register and deploy an inference node (sensor)

What we are doing: We register a sensor (inference node) on the analytics manager so that the manager can push models and receive telemetry from that sensor. This node will perform real-time inference close to traffic sources.

configure terminal
analytics node register node-id SENSOR-01 ip-address 10.0.0.50 interface GigabitEthernet0/1
analytics node set policy inference-mode real-time
exit

What just happened:

  • analytics node register tells the manager there is a node named SENSOR-01 at 10.0.0.50 and reachable via the specified interface; this establishes a management channel.
  • analytics node set policy inference-mode real-time configures the node to perform inference on live telemetry rather than batch processing. The node will now be eligible to receive models.

Real-world note: In production, place inference nodes near telemetry sources (e.g., inside core or at key firewall clusters) to reduce telemetry latency.

Verify:

show analytics nodes
Node ID: SENSOR-01
IP Address: 10.0.0.50
Interface: GigabitEthernet0/1
Status: REGISTERED
Inference Mode: real-time
Models Deployed: 0
Last Checkin: 2026-04-02T09:14:20Z

Step 3: Deploy ML models for malware, DDoS, and lateral movement

What we are doing: We deploy three model types to the registered sensor: malware behavioral model, DDoS detection model, and lateral movement model. Deploying models enables the sensor to produce per-session and per-host scores.

configure terminal
analytics model upload name ml-malware-v1 model-file ml-malware-v1.bin
analytics model upload name ml-ddos-v1 model-file ml-ddos-v1.bin
analytics model upload name ml-lateral-v1 model-file ml-lateral-v1.bin
analytics model deploy name ml-malware-v1 target SENSOR-01
analytics model deploy name ml-ddos-v1 target SENSOR-01
analytics model deploy name ml-lateral-v1 target SENSOR-01
exit

What just happened:

  • analytics model upload registers model artifacts on the manager (the binary is the trained model).
  • analytics model deploy pushes each model to SENSOR-01; the sensor will load models into memory and begin scoring incoming flows/sessions. After deployment, the sensor will emit anomaly scores for matching telemetry.

Real-world note: Model deployment can be staged (canary first) to limit impact; start with a single sensor then roll out cluster-wide.

Verify:

show analytics model status
Model Name: ml-malware-v1
Version: 1
Status: DEPLOYED
Targets: SENSOR-01
Last Deployed: 2026-04-02T09:20:12Z

Model Name: ml-ddos-v1
Version: 1
Status: DEPLOYED
Targets: SENSOR-01
Last Deployed: 2026-04-02T09:20:13Z

Model Name: ml-lateral-v1
Version: 1
Status: DEPLOYED
Targets: SENSOR-01
Last Deployed: 2026-04-02T09:20:14Z

Step 4: Configure adaptive thresholding and alert rules

What we are doing: We configure adaptive thresholds for each model to reduce false positives and create alert rules that generate incidents when model scores exceed thresholds. This converts continuous model outputs into actionable alerts.

configure terminal
analytics threshold set model ml-malware-v1 threshold 0.85 adaptive true
analytics threshold set model ml-ddos-v1 threshold 0.75 adaptive true
analytics threshold set model ml-lateral-v1 threshold 0.80 adaptive true

analytics alert rule create name ALERT-MALWARE severity high match model ml-malware-v1 score >= 0.85 notify admin
analytics alert rule create name ALERT-DDOS severity critical match model ml-ddos-v1 score >= 0.75 notify admin
analytics alert rule create name ALERT-LATERAL severity high match model ml-lateral-v1 score >= 0.80 notify admin
exit

What just happened:

  • analytics threshold set establishes the score thresholds and enables adaptive behavior so thresholds auto-adjust based on historical baselines, reducing noise.
  • analytics alert rule create creates correlation/alerting logic. When a model score satisfies the condition, an alert with specified severity is generated and the admin is notified.

Real-world note: Adaptive thresholds are valuable in dynamic environments (e.g., seasonal traffic changes). Always monitor the first days after enabling to tune sensitivity.

Verify:

show analytics threshold
Model: ml-malware-v1 Threshold: 0.85 Adaptive: true
Model: ml-ddos-v1 Threshold: 0.75 Adaptive: true
Model: ml-lateral-v1 Threshold: 0.80 Adaptive: true

show analytics alert rules
Rule Name: ALERT-MALWARE Severity: high Condition: model ml-malware-v1 score >= 0.85 Actions: notify admin
Rule Name: ALERT-DDOS Severity: critical Condition: model ml-ddos-v1 score >= 0.75 Actions: notify admin
Rule Name: ALERT-LATERAL Severity: high Condition: model ml-lateral-v1 score >= 0.80 Actions: notify admin

Step 5: Validate detection with test traffic and review incidents

What we are doing: We generate controlled test traffic that simulates malware beaconing, a short DDoS spike, and lateral movement, then confirm that alerts/inci­dents are created with explanatory context. Testing verifies model behavior and alert pipeline.

test traffic simulate malware type beaconing target 10.0.0.200 duration 300s
test traffic simulate ddos type syn-flood target 10.0.0.10 duration 60s rate 10000pps
test traffic simulate lateral type smb-auth target-clients 10.0.0.101-10.0.0.110 duration 180s
exit

What just happened:

  • test traffic simulate starts synthetic traffic patterns that mimic real threats. The manager receives telemetry from SENSOR-01, the models score the activity, and alerts are generated when thresholds are crossed.
  • The incident record includes suggested root cause and recommended remediation (e.g., increase VPN timeout or investigate endpoint).

Real-world note: Always run synthetic tests in a lab or controlled segment; avoid creating DDoS on production without safeguards.

Verify:

show analytics incidents recent
Incident ID: INC-20260402-0001
Model Triggered: ml-malware-v1
Score: 0.92
Severity: high
Detected On: 2026-04-02T09:35:21Z
Summary: High-scoring beaconing observed from host 10.0.0.200 -> external endpoint 203.0.113.45, regular 60s interval. Recommendation: Isolate host and run endpoint scan.

Incident ID: INC-20260402-0002
Model Triggered: ml-ddos-v1
Score: 0.97
Severity: critical
Detected On: 2026-04-02T09:36:05Z
Summary: SYN flood style volumetric spike to 10.0.0.10 observed; 10k pps sustained for 45s. Recommendation: Engage scrubbing or apply per-source rate-limiting on edge.

Incident ID: INC-20260402-0003
Model Triggered: ml-lateral-v1
Score: 0.88
Severity: high
Detected On: 2026-04-02T09:38:12Z
Summary: Lateral movement pattern detected — multiple SMB auth attempts from 10.0.0.103 to hosts 10.0.0.105-10.0.0.109 in short window. Recommendation: Quarantine source and review recent logins.

Verification Checklist

  • Check 1: Analytics Manager is running and ML engine enabled — verify with show analytics manager status and expect ML Engine: ENABLED.
  • Check 2: Sensor is registered and models are deployed — verify with show analytics nodes and show analytics model status; expect REGISTERED and DEPLOYED.
  • Check 3: Alerts were created for simulated threat traffic — verify with show analytics incidents recent; expect incident entries for malware, DDoS, and lateral movement with scores above thresholds.

Common Mistakes

SymptomCauseFix
No models listed as DEPLOYEDModels were uploaded but not deployed or deployment failed due to connectivityEnsure sensor registered (show analytics nodes) and redeploy models; verify management connectivity between manager and sensor
Alerts generated but have very low confidenceThresholds set too low or adaptive thresholding not warmed upIncrease threshold or allow adaptive thresholds to train on baseline data for several days
Too many false positives after deploymentModels are sensitive on initial rollout and were not tuned to local traffic patternsEnable adaptive thresholds, run canary on a small set of sensors, and tune model parameters based on false-positive reports
Sensor shows REGISTERED but Last Checkin staleNetwork or SSH/management channel blocked between manager and sensorCheck interface reachability and firewall policies; ensure sensor management interface (GigabitEthernet0/1) is reachable

Key Takeaways

  • ML-based detection complements signature systems by identifying anomalies in behavior (e.g., beaconing, volumetric spikes, lateral access patterns) that signatures may miss.
  • Model deployment is a two-step process: upload trained models to the manager, then deploy them to inference nodes; the nodes perform real-time scoring.
  • Adaptive thresholding reduces noise by learning baselines — allow time for training and use staged rollouts in production environments.
  • Always validate detection with controlled tests and review incidents carefully; the ML system provides scores and recommended remediations, but operators must verify and act.

Important: ML systems augment operator workflows by surfacing likely issues and suggested fixes (for example, recommending VPN timeout changes for idle RA sessions), but human validation remains critical before automated remediation in production.