Lesson 4 of 7

Neural Networks and Deep Learning

Objective

In this lesson you will learn how neural networks and deep learning work and apply them to two common network engineering use-cases: traffic classification (classifying flow types such as web, video, or VoIP) and traffic prediction (forecasting short-term traffic volume). This matters in production because classification enables policy-based handling (QoS, security inspection) and prediction enables capacity planning and proactive congestion mitigation. Real-world scenario: an enterprise wants to automatically tag flows for different treatment and predict next-hour aggregate load for a critical WAN link.

Quick Recap

Refer to the topology used in Lesson 1. This lesson does not add routers, switches, or new IPs to the topology — we focus on applying neural networks to flow data collected from that network. All data files used in this lesson reference the lab host lab.nhprep.com and are stored under /home/nhprep/data on the training VM.

Important: this lesson is conceptual + hands-on with code (Python + scikit-learn). You are not changing router configurations; instead you are building models that would be integrated with network systems (SIEM, telemetry pipelines, policy engines) in production.

Key Concepts (before hands-on)

  • Neural Network basics: A neural network (NN) is a parameterized function approximator. Each layer contains weights and biases (parameters) that are adjusted by an optimizer using gradient descent to minimize a loss function. In practice for flows, the input vector contains numeric and categorical features; the NN learns relationships among them.
    • Real-world protocol-level analogy: training is like a routing protocol converging — updates (gradients) are exchanged internally until the model stabilizes.
  • Feature vectorization (tokenization): Just as tokens are used in language models, network features must be converted to numeric vectors (one-hot encoding for categorical items, normalization for numeric values). Poor tokenization = garbage-in, garbage-out.
  • Classification vs Regression: Traffic classification is a classification problem (discrete labels). Traffic volume forecasting is a regression problem (continuous values). The choice of loss (cross-entropy vs MSE) and activation (softmax vs linear) follows accordingly.
  • Overfitting and generalization: In production, models must generalize across time and topology changes. Validation sets, regularization, and careful feature selection matter — otherwise the model will “memorize” specific flows and fail when traffic patterns change.
  • Inference behavior and latency: When deployed, inference must meet latency requirements. A deep model that yields marginal accuracy improvements but doubles inference time may be unsuitable for inline decisions (e.g., per-flow forwarding).

Step-by-step configuration (hands-on)

Each step contains commands (Python scripts executed on the lab VM), why they matter, and verification output.

Step 1: Create a synthetic traffic dataset

What we are doing: Generate a small labeled dataset representing flow features (duration, bytes, packets, src_port, dst_port) and labels (web, video, voip). This simulates telemetry exported from your network monitoring system. Having a controlled dataset lets you experiment safely before using production telemetry.

python3 - << 'PY'
# create_dataset.py - creates a simple CSV dataset for lab
import csv
import random
labels = ['web','video','voip']
with open('/home/nhprep/data/traffic.csv','w',newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['duration_seconds','bytes','packets','src_port','dst_port','label'])
    for i in range(1000):
        lab = random.choices(labels, weights=[0.6,0.3,0.1])[0]
        if lab == 'web':
            dur = random.uniform(0.5, 300.0)
            b = int(dur * random.uniform(500, 2000))             # bytes roughly proportional
            p = int(b/800) + random.randint(1,10)
            dst = random.choice([80,443,8080])
        elif lab == 'video':
            dur = random.uniform(5.0, 3600.0)
            b = int(dur * random.uniform(50000, 200000))
            p = int(b/1200) + random.randint(10,100)
            dst = random.choice([1935,554,8000])
        else: # voip
            dur = random.uniform(0.1, 120.0)
            b = int(dur * random.uniform(30, 300))
            p = int(b/160) + random.randint(1,5)
            dst = random.choice([5060,5061,10000])
        src = random.randint(1024,65535)
        writer.writerow([f"{dur:.3f}", b, p, src, dst, lab])
print("Dataset created at /home/nhprep/data/traffic.csv - 1000 rows")
PY

What just happened: The one-shot Python script generated a CSV with 1,000 synthetic flows and labeled them according to simple heuristics. This represents a small telemetry export you might receive from flow collectors (NetFlow/IPFIX) in production.

Real-world note: Synthetic data helps you validate model pipelines before integrating with live telemetry. When you move to production data, expect cleaning steps for missing fields and sampling biases.

Verify:

head -n 6 /home/nhprep/data/traffic.csv

Expected output:

duration_seconds,bytes,packets,src_port,dst_port,label
35.238,48352,61,49201,443,web
7.431,966185,806,23456,1935,video
0.357,57,1,50123,5060,voip
120.003,14040000,11698,34567,1935,video
2.143,3200,5,41234,80,web

Step 2: Preprocess and vectorize features

What we are doing: Load CSV, split train/test, normalize numeric features, and encode label values. Proper preprocessing is vital because neural nets are sensitive to feature scales (normalization improves convergence).

python3 - << 'PY'
# preprocess.py - loads CSV, scales numeric features, encodes labels
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
df = pd.read_csv('/home/nhprep/data/traffic.csv')
X = df[['duration_seconds','bytes','packets','src_port','dst_port']].values
y = df['label'].values
le = LabelEncoder()
y_enc = le.fit_transform(y)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_enc, test_size=0.2, random_state=42, stratify=y_enc)
import joblib
joblib.dump(scaler, '/home/nhprep/data/scaler.joblib')
joblib.dump(le, '/home/nhprep/data/labelenc.joblib')
import numpy as np
print("X_train shape:", X_train.shape)
print("X_test shape:", X_test.shape)
print("Labels:", list(le.classes_))
PY

What just happened: The script standardized numeric values (zero mean, unit variance) and encoded labels into integers (e.g., web->0, video->1, voip->2). Standardization ensures that large-scale features such as bytes do not dominate gradient updates. The scaler and label encoder are saved for reuse at inference time.

Real-world note: Always persist preprocessing objects (scalers, encoders) with your model to ensure consistent inference behavior against new telemetry streams.

Verify:

python3 - << 'PY'
import joblib
sc = joblib.load('/home/nhprep/data/scaler.joblib')
le = joblib.load('/home/nhprep/data/labelenc.joblib')
print("Scaler mean:", sc.mean_)
print("Label classes:", le.classes_)
PY

Expected output:

Scaler mean: [  8.51122042e+01   2.34567890e+06   3.45678901e+02   3.40214000e+04   4.32100000e+03]
Label classes: ['video' 'voip' 'web']

(Values illustrative; your means will reflect the synthetic data.)

Step 3: Train a simple neural network classifier

What we are doing: Train a multi-layer perceptron (MLP) on the preprocessed features to classify flows. MLPs are effective for tabular flow data and are inexpensive to train and run in production for per-flow classification.

python3 - << 'PY'
# train_mlp.py - trains a simple MLP classifier
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import classification_report, accuracy_score
from sklearn.preprocessing import StandardScaler, LabelEncoder
import joblib
df = pd.read_csv('/home/nhprep/data/traffic.csv')
X = df[['duration_seconds','bytes','packets','src_port','dst_port']].values
y = df['label'].values
le = LabelEncoder(); y_enc = le.fit_transform(y)
scaler = StandardScaler(); X_scaled = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_enc, test_size=0.2, random_state=42, stratify=y_enc)
mlp = MLPClassifier(hidden_layer_sizes=(64,32), activation='relu', solver='adam', max_iter=200, random_state=1)
mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print("Training complete.")
print("Accuracy on test set:", acc)
print("Classification report:")
print(classification_report(y_test, y_pred, target_names=le.inverse_transform([0,1,2])))
import joblib
joblib.dump(mlp, '/home/nhprep/data/mlp_model.joblib')
joblib.dump(scaler, '/home/nhprep/data/scaler.joblib')
joblib.dump(le, '/home/nhprep/data/labelenc.joblib')
PY

What just happened: The MLPClassifier was trained with two hidden layers (64, 32) using the Adam optimizer. The network adjusted its internal parameters (weights/biases) via backpropagation to minimize classification loss. The model and preprocessing artifacts were saved for later inference.

Real-world note: In production, choose model complexity to balance accuracy and inference latency. For per-flow tagging, keep models small for line-rate inference.

Verify:

python3 - << 'PY'
import joblib
mlp = joblib.load('/home/nhprep/data/mlp_model.joblib')
scaler = joblib.load('/home/nhprep/data/scaler.joblib')
le = joblib.load('/home/nhprep/data/labelenc.joblib')
print("Model loaded. Hidden layer sizes:", mlp.hidden_layer_sizes)
print("Model classes:", mlp.classes_)
# Show a sample prediction
import numpy as np
sample = np.array([[10.0, 15000, 20, 52345, 443]])  # a sample web-like flow
sample_scaled = scaler.transform(sample)
pred = mlp.predict(sample_scaled)
print("Sample prediction (encoded):", pred)
print("Sample prediction (label):", le.inverse_transform(pred))
PY

Expected output:

Model loaded. Hidden layer sizes: (64, 32)
Model classes: [0 1 2]
Sample prediction (encoded): [2]
Sample prediction (label): ['web']

Step 4: Predict traffic class for new flows and simple traffic-volume prediction (rolling average baseline)

What we are doing: Demonstrate inference for new flows and show a simple short-term traffic prediction baseline (rolling average). In production, NN classifiers are combined with time-series models for volume forecasting; here we show both classification inference and a conservative prediction baseline.

python3 - << 'PY'
# inference_and_prediction.py - classifies new flows and computes a rolling-average volume forecast
import joblib, pandas as pd, numpy as np
mlp = joblib.load('/home/nhprep/data/mlp_model.joblib')
scaler = joblib.load('/home/nhprep/data/scaler.joblib')
le = joblib.load('/home/nhprep/data/labelenc.joblib')
# Simulate incoming batch of new flows
new_flows = pd.DataFrame([
    [5.2, 12000, 15, 45000, 443],
    [300.0, 12000000, 10000, 40000, 1935],
    [0.2, 80, 1, 52000, 5060]
], columns=['duration_seconds','bytes','packets','src_port','dst_port'])
Xn = scaler.transform(new_flows.values)
pred = mlp.predict(Xn)
print("Predictions:", list(pred))
print("Human labels:", list(le.inverse_transform(pred)))
# Rolling-average volume forecast (simplified): compute bytes per minute for past 5 intervals
past_volumes = [1200000, 1300000, 1250000, 1280000, 1320000]  # bytes/min historical
forecast = sum(past_volumes[-3:]) / 3.0  # last-3 average as a simple predictor
print("Rolling-average forecast (bytes/min):", forecast)
PY

What just happened: We performed inference using the saved MLP model and scaler on three new flows and printed predicted labels. For traffic volume forecasting we used a simple rolling average over recent intervals as a baseline predictor. In production you would replace this with an NN-based time-series model (RNN/LSTM/Transformer) if required, and evaluate latency/accuracy tradeoffs.

Real-world note: Baselines like rolling averages are robust and interpretable; use them as benchmarks before deploying complex deep models.

Verify:

python3 inference_and_prediction.py

Expected output:

Predictions: [2, 1, 1]
Human labels: ['web', 'video', 'voip']
Rolling-average forecast (bytes/min): 1280000.0

Verification Checklist

  • Check 1: Dataset exists at /home/nhprep/data/traffic.csv. Verify with head -n 2 /home/nhprep/data/traffic.csv.
  • Check 2: Preprocessing objects saved (scaler.joblib, labelenc.joblib). Verify by loading and printing scaler.mean_.
  • Check 3: Model saved at /home/nhprep/data/mlp_model.joblib and produces correct sample inference (expected: web for the sample flow). Verify by running the sample prediction.

Common Mistakes

SymptomCauseFix
Very low accuracy or accuracy barely above randomFeatures not scaled; gradients dominated by large-value featuresStandardize or normalize numeric features (StandardScaler) before training
Model fails to converge (loss not decreasing)Learning rate too high or insufficient iterationsReduce learning rate, increase max_iter/epochs, or change optimizer
Predictions inconsistent between training & productionDifferent preprocessing at inference timePersist and reuse the same scaler/encoder saved during training
Overfitting to training set (excellent train accuracy, poor test accuracy)Model too complex or no validation, or dataset too smallAdd regularization, reduce model size, use cross-validation, get more data

Key Takeaways

  • A neural network is a parameterized function; training adjusts weights via gradient descent to minimize a loss — think of it like the network control plane optimizing path metrics over time.
  • Preprocessing (vectorization/tokenization) is critical: inconsistent or missing scaling/encoding leads to model failure in production.
  • For network use-cases, start with small models (MLP) and robust baselines (rolling averages) before moving to heavier models (LSTM/Transformer) — always weigh accuracy vs inference latency.
  • Persist preprocessing artifacts and model artifacts together; during deployment, ensure telemetry and inference services use the same scalers/encoders to guarantee identical behavior.

Tip: When moving from lab to production, pipeline the data ingestion (flow collector -> preprocessing -> inference -> policy engine) and add monitoring for model drift (accuracy decline), so you can retrain models proactively.

If you want, in next lesson we'll build a simple LSTM for time-series capacity forecasting and show how to package the model as a microservice for integration with network automation controllers.