ML-Based Threat Detection
Objective
In this lesson you will enable and validate ML-Based Threat Detection across the NHPREP lab environment. We will deploy machine-learning models that detect malware, DDoS, and lateral movement, configure alert thresholds, and verify detection using synthetic test traffic. This matters in production because ML models detect subtle, multi-dimensional anomalies (like low-and-slow lateral movement or volumetric DDoS) faster than static signatures. Real-world scenario: an enterprise uses ML-assisted detection to identify idle VPN sessions consuming headend resources and to detect lateral movement after a compromised workstation connects to an internal file share.
Quick Recap
Refer to the topology you configured in Lesson 1. This lesson uses the same devices and addressing from Lesson 1 and adds one analytics node (the ML Analytics Manager). No other devices or IPs are added in this lesson.
Note: If you did not complete Lesson 1, stop and complete it first — the ML models require sensors and network flows already exported from the perimeter firewall and VPN headend.
Key Concepts
- Behavioral vs Signature Detection — ML detection models look at statistical patterns (flows per second, bytes per session, user access patterns) rather than only pattern signatures. In production, this is important because attackers often mutate payloads to evade signatures but still exhibit abnormal behavior.
- Feature Extraction & Model Inference — Packets and flows are summarized into features (e.g., bytes/packet, session duration, destination diversity). The analytics manager scores each flow/session against trained models and produces an anomaly score used to trigger alerts.
- Model Lifecycle & Deployment — Models are trained offline or auto-trained on historical telemetry (AutoDetect). Production systems deploy models to inference nodes/sensors; pushing a model changes real-time detection behavior across the environment.
- Thresholding & Alerting — ML outputs continuous scores; thresholds convert scores into actionable alerts. Adaptive thresholding excludes historical anomalies to avoid false positives. In production, thresholds are tuned per service to balance noise vs sensitivity.
- Correlation for Root Cause — ML systems correlate anomalies (e.g., DDoS spikes + VPN headend imbalance + idle RA VPN sessions) to provide recommended remediations — e.g., adjust timeouts or scale headend capacity.
Topology (reference from Lesson 1)
(Topology is unchanged; ML Analytics Manager is added logically in the management network)
Because the topology from Lesson 1 already defines exact IPs and interfaces, we reference that diagram there. No new IPs are added here.
Step-by-step configuration
Step 1: Enable ML Threat Detection on the Analytics Manager
What we are doing: We enable the ML threat detection service on the analytics manager node so it can receive telemetry, run inference, and produce alerts. This step is the control-plane activation that allows models to be loaded and scores to be generated.
configure terminal
analytics manager enable ml-threat-detection
analytics manager set organization NHPREP
analytics manager set admin-password Lab@123
exit
What just happened:
analytics manager enable ml-threat-detectionstarts the ML inference engine on the manager; it accepts telemetry and prepares to run models.analytics manager set organization NHPREPassigns the tenant context so detections are tagged correctly.analytics manager set admin-password Lab@123sets the management password for accessing the analytics UI/API. Enabling the engine is a prerequisite before deploying models to inference nodes.
Real-world note: In production, enabling the manager is usually performed in maintenance windows because it can trigger synchronization to many sensors.
Verify:
show analytics manager status
Analytics Manager Status
------------------------
State: RUNNING
ML Engine: ENABLED
Organization: NHPREP
Admin User: admin
Last Sync: 2026-04-02T09:12:05Z
Active Models: 0
Inference Nodes Registered: 0
Step 2: Register and deploy an inference node (sensor)
What we are doing: We register a sensor (inference node) on the analytics manager so that the manager can push models and receive telemetry from that sensor. This node will perform real-time inference close to traffic sources.
configure terminal
analytics node register node-id SENSOR-01 ip-address 10.0.0.50 interface GigabitEthernet0/1
analytics node set policy inference-mode real-time
exit
What just happened:
analytics node registertells the manager there is a node named SENSOR-01 at 10.0.0.50 and reachable via the specified interface; this establishes a management channel.analytics node set policy inference-mode real-timeconfigures the node to perform inference on live telemetry rather than batch processing. The node will now be eligible to receive models.
Real-world note: In production, place inference nodes near telemetry sources (e.g., inside core or at key firewall clusters) to reduce telemetry latency.
Verify:
show analytics nodes
Node ID: SENSOR-01
IP Address: 10.0.0.50
Interface: GigabitEthernet0/1
Status: REGISTERED
Inference Mode: real-time
Models Deployed: 0
Last Checkin: 2026-04-02T09:14:20Z
Step 3: Deploy ML models for malware, DDoS, and lateral movement
What we are doing: We deploy three model types to the registered sensor: malware behavioral model, DDoS detection model, and lateral movement model. Deploying models enables the sensor to produce per-session and per-host scores.
configure terminal
analytics model upload name ml-malware-v1 model-file ml-malware-v1.bin
analytics model upload name ml-ddos-v1 model-file ml-ddos-v1.bin
analytics model upload name ml-lateral-v1 model-file ml-lateral-v1.bin
analytics model deploy name ml-malware-v1 target SENSOR-01
analytics model deploy name ml-ddos-v1 target SENSOR-01
analytics model deploy name ml-lateral-v1 target SENSOR-01
exit
What just happened:
analytics model uploadregisters model artifacts on the manager (the binary is the trained model).analytics model deploypushes each model to SENSOR-01; the sensor will load models into memory and begin scoring incoming flows/sessions. After deployment, the sensor will emit anomaly scores for matching telemetry.
Real-world note: Model deployment can be staged (canary first) to limit impact; start with a single sensor then roll out cluster-wide.
Verify:
show analytics model status
Model Name: ml-malware-v1
Version: 1
Status: DEPLOYED
Targets: SENSOR-01
Last Deployed: 2026-04-02T09:20:12Z
Model Name: ml-ddos-v1
Version: 1
Status: DEPLOYED
Targets: SENSOR-01
Last Deployed: 2026-04-02T09:20:13Z
Model Name: ml-lateral-v1
Version: 1
Status: DEPLOYED
Targets: SENSOR-01
Last Deployed: 2026-04-02T09:20:14Z
Step 4: Configure adaptive thresholding and alert rules
What we are doing: We configure adaptive thresholds for each model to reduce false positives and create alert rules that generate incidents when model scores exceed thresholds. This converts continuous model outputs into actionable alerts.
configure terminal
analytics threshold set model ml-malware-v1 threshold 0.85 adaptive true
analytics threshold set model ml-ddos-v1 threshold 0.75 adaptive true
analytics threshold set model ml-lateral-v1 threshold 0.80 adaptive true
analytics alert rule create name ALERT-MALWARE severity high match model ml-malware-v1 score >= 0.85 notify admin
analytics alert rule create name ALERT-DDOS severity critical match model ml-ddos-v1 score >= 0.75 notify admin
analytics alert rule create name ALERT-LATERAL severity high match model ml-lateral-v1 score >= 0.80 notify admin
exit
What just happened:
analytics threshold setestablishes the score thresholds and enables adaptive behavior so thresholds auto-adjust based on historical baselines, reducing noise.analytics alert rule createcreates correlation/alerting logic. When a model score satisfies the condition, an alert with specified severity is generated and the admin is notified.
Real-world note: Adaptive thresholds are valuable in dynamic environments (e.g., seasonal traffic changes). Always monitor the first days after enabling to tune sensitivity.
Verify:
show analytics threshold
Model: ml-malware-v1 Threshold: 0.85 Adaptive: true
Model: ml-ddos-v1 Threshold: 0.75 Adaptive: true
Model: ml-lateral-v1 Threshold: 0.80 Adaptive: true
show analytics alert rules
Rule Name: ALERT-MALWARE Severity: high Condition: model ml-malware-v1 score >= 0.85 Actions: notify admin
Rule Name: ALERT-DDOS Severity: critical Condition: model ml-ddos-v1 score >= 0.75 Actions: notify admin
Rule Name: ALERT-LATERAL Severity: high Condition: model ml-lateral-v1 score >= 0.80 Actions: notify admin
Step 5: Validate detection with test traffic and review incidents
What we are doing: We generate controlled test traffic that simulates malware beaconing, a short DDoS spike, and lateral movement, then confirm that alerts/incidents are created with explanatory context. Testing verifies model behavior and alert pipeline.
test traffic simulate malware type beaconing target 10.0.0.200 duration 300s
test traffic simulate ddos type syn-flood target 10.0.0.10 duration 60s rate 10000pps
test traffic simulate lateral type smb-auth target-clients 10.0.0.101-10.0.0.110 duration 180s
exit
What just happened:
test traffic simulatestarts synthetic traffic patterns that mimic real threats. The manager receives telemetry from SENSOR-01, the models score the activity, and alerts are generated when thresholds are crossed.- The incident record includes suggested root cause and recommended remediation (e.g., increase VPN timeout or investigate endpoint).
Real-world note: Always run synthetic tests in a lab or controlled segment; avoid creating DDoS on production without safeguards.
Verify:
show analytics incidents recent
Incident ID: INC-20260402-0001
Model Triggered: ml-malware-v1
Score: 0.92
Severity: high
Detected On: 2026-04-02T09:35:21Z
Summary: High-scoring beaconing observed from host 10.0.0.200 -> external endpoint 203.0.113.45, regular 60s interval. Recommendation: Isolate host and run endpoint scan.
Incident ID: INC-20260402-0002
Model Triggered: ml-ddos-v1
Score: 0.97
Severity: critical
Detected On: 2026-04-02T09:36:05Z
Summary: SYN flood style volumetric spike to 10.0.0.10 observed; 10k pps sustained for 45s. Recommendation: Engage scrubbing or apply per-source rate-limiting on edge.
Incident ID: INC-20260402-0003
Model Triggered: ml-lateral-v1
Score: 0.88
Severity: high
Detected On: 2026-04-02T09:38:12Z
Summary: Lateral movement pattern detected — multiple SMB auth attempts from 10.0.0.103 to hosts 10.0.0.105-10.0.0.109 in short window. Recommendation: Quarantine source and review recent logins.
Verification Checklist
- Check 1: Analytics Manager is running and ML engine enabled — verify with
show analytics manager statusand expect ML Engine: ENABLED. - Check 2: Sensor is registered and models are deployed — verify with
show analytics nodesandshow analytics model status; expect REGISTERED and DEPLOYED. - Check 3: Alerts were created for simulated threat traffic — verify with
show analytics incidents recent; expect incident entries for malware, DDoS, and lateral movement with scores above thresholds.
Common Mistakes
| Symptom | Cause | Fix |
|---|---|---|
| No models listed as DEPLOYED | Models were uploaded but not deployed or deployment failed due to connectivity | Ensure sensor registered (show analytics nodes) and redeploy models; verify management connectivity between manager and sensor |
| Alerts generated but have very low confidence | Thresholds set too low or adaptive thresholding not warmed up | Increase threshold or allow adaptive thresholds to train on baseline data for several days |
| Too many false positives after deployment | Models are sensitive on initial rollout and were not tuned to local traffic patterns | Enable adaptive thresholds, run canary on a small set of sensors, and tune model parameters based on false-positive reports |
| Sensor shows REGISTERED but Last Checkin stale | Network or SSH/management channel blocked between manager and sensor | Check interface reachability and firewall policies; ensure sensor management interface (GigabitEthernet0/1) is reachable |
Key Takeaways
- ML-based detection complements signature systems by identifying anomalies in behavior (e.g., beaconing, volumetric spikes, lateral access patterns) that signatures may miss.
- Model deployment is a two-step process: upload trained models to the manager, then deploy them to inference nodes; the nodes perform real-time scoring.
- Adaptive thresholding reduces noise by learning baselines — allow time for training and use staged rollouts in production environments.
- Always validate detection with controlled tests and review incidents carefully; the ML system provides scores and recommended remediations, but operators must verify and act.
Important: ML systems augment operator workflows by surfacing likely issues and suggested fixes (for example, recommending VPN timeout changes for idle RA sessions), but human validation remains critical before automated remediation in production.