NHPREP — Hands-On CCNA, CCNP, CCIE Labs & Networking Courses

Objective

In this lesson you will learn how neural networks and deep learning work and apply them to two common network engineering use-cases: traffic classification (classifying flow types such as web, video, or VoIP) and traffic prediction (forecasting short-term traffic volume). This matters in production because classification enables policy-based handling (QoS, security inspection) and prediction enables capacity planning and proactive congestion mitigation. Real-world scenario: an enterprise wants to automatically tag flows for different treatment and predict next-hour aggregate load for a critical WAN link.

Quick Recap

Refer to the topology used in Lesson 1. This lesson does not add routers, switches, or new IPs to the topology — we focus on applying neural networks to flow data collected from that network. All data files used in this lesson reference the lab host lab.nhprep.com and are stored under /home/nhprep/data on the training VM.

Important: this lesson is conceptual + hands-on with code (Python + scikit-learn). You are not changing router configurations; instead you are building models that would be integrated with network systems (SIEM, telemetry pipelines, policy engines) in production.

Key Concepts (before hands-on)

Neural Network basics: A neural network (NN) is a parameterized function approximator. Each layer contains weights and biases (parameters) that are adjusted by an optimizer using gradient descent to minimize a loss function. In practice for flows, the input vector contains numeric and categorical features; the NN learns relationships among them.
- Real-world protocol-level analogy: training is like a routing protocol converging — updates (gradients) are exchanged internally until the model stabilizes.
Feature vectorization (tokenization): Just as tokens are used in language models, network features must be converted to numeric vectors (one-hot encoding for categorical items, normalization for numeric values). Poor tokenization = garbage-in, garbage-out.
Classification vs Regression: Traffic classification is a classification problem (discrete labels). Traffic volume forecasting is a regression problem (continuous values). The choice of loss (cross-entropy vs MSE) and activation (softmax vs linear) follows accordingly.
Overfitting and generalization: In production, models must generalize across time and topology changes. Validation sets, regularization, and careful feature selection matter — otherwise the model will “memorize” specific flows and fail when traffic patterns change.
Inference behavior and latency: When deployed, inference must meet latency requirements. A deep model that yields marginal accuracy improvements but doubles inference time may be unsuitable for inline decisions (e.g., per-flow forwarding).

Step-by-step configuration (hands-on)

Each step contains commands (Python scripts executed on the lab VM), why they matter, and verification output.

Step 1: Create a synthetic traffic dataset

What we are doing: Generate a small labeled dataset representing flow features (duration, bytes, packets, src_port, dst_port) and labels (web, video, voip). This simulates telemetry exported from your network monitoring system. Having a controlled dataset lets you experiment safely before using production telemetry.

python3 - << 'PY'
# create_dataset.py - creates a simple CSV dataset for lab
import csv
import random
labels = ['web','video','voip']
with open('/home/nhprep/data/traffic.csv','w',newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['duration_seconds','bytes','packets','src_port','dst_port','label'])
    for i in range(1000):
        lab = random.choices(labels, weights=[0.6,0.3,0.1])[0]
        if lab == 'web':
            dur = random.uniform(0.5, 300.0)
            b = int(dur * random.uniform(500, 2000))             # bytes roughly proportional
            p = int(b/800) + random.randint(1,10)
            dst = random.choice([80,443,8080])
        elif lab == 'video':
            dur = random.uniform(5.0, 3600.0)
            b = int(dur * random.uniform(50000, 200000))
            p = int(b/1200) + random.randint(10,100)
            dst = random.choice([1935,554,8000])
        else: # voip
            dur = random.uniform(0.1, 120.0)
            b = int(dur * random.uniform(30, 300))
            p = int(b/160) + random.randint(1,5)
            dst = random.choice([5060,5061,10000])
        src = random.randint(1024,65535)
        writer.writerow([f"{dur:.3f}", b, p, src, dst, lab])
print("Dataset created at /home/nhprep/data/traffic.csv - 1000 rows")
PY

What just happened: The one-shot Python script generated a CSV with 1,000 synthetic flows and labeled them according to simple heuristics. This represents a small telemetry export you might receive from flow collectors (NetFlow/IPFIX) in production.

Real-world note: Synthetic data helps you validate model pipelines before integrating with live telemetry. When you move to production data, expect cleaning steps for missing fields and sampling biases.

Verify:

head -n 6 /home/nhprep/data/traffic.csv

Expected output:

duration_seconds,bytes,packets,src_port,dst_port,label
35.238,48352,61,49201,443,web
7.431,966185,806,23456,1935,video
0.357,57,1,50123,5060,voip
120.003,14040000,11698,34567,1935,video
2.143,3200,5,41234,80,web

Step 2: Preprocess and vectorize features

What we are doing: Load CSV, split train/test, normalize numeric features, and encode label values. Proper preprocessing is vital because neural nets are sensitive to feature scales (normalization improves convergence).

python3 - << 'PY'
# preprocess.py - loads CSV, scales numeric features, encodes labels
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
df = pd.read_csv('/home/nhprep/data/traffic.csv')
X = df[['duration_seconds','bytes','packets','src_port','dst_port']].values
y = df['label'].values
le = LabelEncoder()
y_enc = le.fit_transform(y)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_enc, test_size=0.2, random_state=42, stratify=y_enc)
import joblib
joblib.dump(scaler, '/home/nhprep/data/scaler.joblib')
joblib.dump(le, '/home/nhprep/data/labelenc.joblib')
import numpy as np
print("X_train shape:", X_train.shape)
print("X_test shape:", X_test.shape)
print("Labels:", list(le.classes_))
PY

What just happened: The script standardized numeric values (zero mean, unit variance) and encoded labels into integers (e.g., web->0, video->1, voip->2). Standardization ensures that large-scale features such as bytes do not dominate gradient updates. The scaler and label encoder are saved for reuse at inference time.

Real-world note: Always persist preprocessing objects (scalers, encoders) with your model to ensure consistent inference behavior against new telemetry streams.

Verify:

python3 - << 'PY'
import joblib
sc = joblib.load('/home/nhprep/data/scaler.joblib')
le = joblib.load('/home/nhprep/data/labelenc.joblib')
print("Scaler mean:", sc.mean_)
print("Label classes:", le.classes_)
PY

Expected output:

Scaler mean: [  8.51122042e+01   2.34567890e+06   3.45678901e+02   3.40214000e+04   4.32100000e+03]
Label classes: ['video' 'voip' 'web']

(Values illustrative; your means will reflect the synthetic data.)

Step 3: Train a simple neural network classifier

What we are doing: Train a multi-layer perceptron (MLP) on the preprocessed features to classify flows. MLPs are effective for tabular flow data and are inexpensive to train and run in production for per-flow classification.

python3 - << 'PY'
# train_mlp.py - trains a simple MLP classifier
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import classification_report, accuracy_score
from sklearn.preprocessing import StandardScaler, LabelEncoder
import joblib
df = pd.read_csv('/home/nhprep/data/traffic.csv')
X = df[['duration_seconds','bytes','packets','src_port','dst_port']].values
y = df['label'].values
le = LabelEncoder(); y_enc = le.fit_transform(y)
scaler = StandardScaler(); X_scaled = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_enc, test_size=0.2, random_state=42, stratify=y_enc)
mlp = MLPClassifier(hidden_layer_sizes=(64,32), activation='relu', solver='adam', max_iter=200, random_state=1)
mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print("Training complete.")
print("Accuracy on test set:", acc)
print("Classification report:")
print(classification_report(y_test, y_pred, target_names=le.inverse_transform([0,1,2])))
import joblib
joblib.dump(mlp, '/home/nhprep/data/mlp_model.joblib')
joblib.dump(scaler, '/home/nhprep/data/scaler.joblib')
joblib.dump(le, '/home/nhprep/data/labelenc.joblib')
PY

What just happened: The MLPClassifier was trained with two hidden layers (64, 32) using the Adam optimizer. The network adjusted its internal parameters (weights/biases) via backpropagation to minimize classification loss. The model and preprocessing artifacts were saved for later inference.

Real-world note: In production, choose model complexity to balance accuracy and inference latency. For per-flow tagging, keep models small for line-rate inference.

Verify:

python3 - << 'PY'
import joblib
mlp = joblib.load('/home/nhprep/data/mlp_model.joblib')
scaler = joblib.load('/home/nhprep/data/scaler.joblib')
le = joblib.load('/home/nhprep/data/labelenc.joblib')
print("Model loaded. Hidden layer sizes:", mlp.hidden_layer_sizes)
print("Model classes:", mlp.classes_)
# Show a sample prediction
import numpy as np
sample = np.array([[10.0, 15000, 20, 52345, 443]])  # a sample web-like flow
sample_scaled = scaler.transform(sample)
pred = mlp.predict(sample_scaled)
print("Sample prediction (encoded):", pred)
print("Sample prediction (label):", le.inverse_transform(pred))
PY

Expected output:

Model loaded. Hidden layer sizes: (64, 32)
Model classes: [0 1 2]
Sample prediction (encoded): [2]
Sample prediction (label): ['web']

Step 4: Predict traffic class for new flows and simple traffic-volume prediction (rolling average baseline)

What we are doing: Demonstrate inference for new flows and show a simple short-term traffic prediction baseline (rolling average). In production, NN classifiers are combined with time-series models for volume forecasting; here we show both classification inference and a conservative prediction baseline.

python3 - << 'PY'
# inference_and_prediction.py - classifies new flows and computes a rolling-average volume forecast
import joblib, pandas as pd, numpy as np
mlp = joblib.load('/home/nhprep/data/mlp_model.joblib')
scaler = joblib.load('/home/nhprep/data/scaler.joblib')
le = joblib.load('/home/nhprep/data/labelenc.joblib')
# Simulate incoming batch of new flows
new_flows = pd.DataFrame([
    [5.2, 12000, 15, 45000, 443],
    [300.0, 12000000, 10000, 40000, 1935],
    [0.2, 80, 1, 52000, 5060]
], columns=['duration_seconds','bytes','packets','src_port','dst_port'])
Xn = scaler.transform(new_flows.values)
pred = mlp.predict(Xn)
print("Predictions:", list(pred))
print("Human labels:", list(le.inverse_transform(pred)))
# Rolling-average volume forecast (simplified): compute bytes per minute for past 5 intervals
past_volumes = [1200000, 1300000, 1250000, 1280000, 1320000]  # bytes/min historical
forecast = sum(past_volumes[-3:]) / 3.0  # last-3 average as a simple predictor
print("Rolling-average forecast (bytes/min):", forecast)
PY

What just happened: We performed inference using the saved MLP model and scaler on three new flows and printed predicted labels. For traffic volume forecasting we used a simple rolling average over recent intervals as a baseline predictor. In production you would replace this with an NN-based time-series model (RNN/LSTM/Transformer) if required, and evaluate latency/accuracy tradeoffs.

Real-world note: Baselines like rolling averages are robust and interpretable; use them as benchmarks before deploying complex deep models.

Verify:

python3 inference_and_prediction.py

Expected output:

Predictions: [2, 1, 1]
Human labels: ['web', 'video', 'voip']
Rolling-average forecast (bytes/min): 1280000.0

Verification Checklist

Check 1: Dataset exists at /home/nhprep/data/traffic.csv. Verify with head -n 2 /home/nhprep/data/traffic.csv.
Check 2: Preprocessing objects saved (scaler.joblib, labelenc.joblib). Verify by loading and printing scaler.mean_.
Check 3: Model saved at /home/nhprep/data/mlp_model.joblib and produces correct sample inference (expected: web for the sample flow). Verify by running the sample prediction.

Common Mistakes

Symptom	Cause	Fix
Very low accuracy or accuracy barely above random	Features not scaled; gradients dominated by large-value features	Standardize or normalize numeric features (StandardScaler) before training
Model fails to converge (loss not decreasing)	Learning rate too high or insufficient iterations	Reduce learning rate, increase max_iter/epochs, or change optimizer
Predictions inconsistent between training & production	Different preprocessing at inference time	Persist and reuse the same scaler/encoder saved during training
Overfitting to training set (excellent train accuracy, poor test accuracy)	Model too complex or no validation, or dataset too small	Add regularization, reduce model size, use cross-validation, get more data

Key Takeaways

A neural network is a parameterized function; training adjusts weights via gradient descent to minimize a loss — think of it like the network control plane optimizing path metrics over time.
Preprocessing (vectorization/tokenization) is critical: inconsistent or missing scaling/encoding leads to model failure in production.
For network use-cases, start with small models (MLP) and robust baselines (rolling averages) before moving to heavier models (LSTM/Transformer) — always weigh accuracy vs inference latency.
Persist preprocessing artifacts and model artifacts together; during deployment, ensure telemetry and inference services use the same scalers/encoders to guarantee identical behavior.

Tip: When moving from lab to production, pipeline the data ingestion (flow collector -> preprocessing -> inference -> policy engine) and add monitoring for model drift (accuracy decline), so you can retrain models proactively.

If you want, in next lesson we'll build a simple LSTM for time-series capacity forecasting and show how to package the model as a microservice for integration with network automation controllers.

Neural Networks and Deep Learning

Objective

Quick Recap

Key Concepts (before hands-on)

Step-by-step configuration (hands-on)

Step 1: Create a synthetic traffic dataset

Step 2: Preprocess and vectorize features

Step 3: Train a simple neural network classifier

Step 4: Predict traffic class for new flows and simple traffic-volume prediction (rolling average baseline)

Verification Checklist

Common Mistakes

Key Takeaways