Capacity Forecasting
Objective
In this lesson you will learn how to forecast capacity for bandwidth, CPU, and memory on network devices so you can right-size infrastructure before service degradation occurs. We will configure device telemetry basics (SNMP and NetFlow), collect baseline counters and system metrics, and use simple math to predict when resources will exhaust. This matters in production: proactive capacity forecasting prevents outages, helps meet SLAs, and supports purchase/upgrade decisions for routers and switches in a real data center or campus network.
Real-world scenario: NHPREP operates a multi-domain campus network carrying tenant services. Engineering must know when links and routers will need upgrades to maintain SLA headroom for bursts and future growth. This lesson shows how to gather the raw data and convert it into a forecast and recommendation.
Quick Recap
Reference the topology from Lesson 1 (basic campus core, distribution, and a flow collector). This lesson does not add new physical devices; we will configure SNMP and NetFlow on the same routers from Lesson 1 and point exports to the existing flow collector/service in the topology.
Important: All examples use the lab domain lab.nhprep.com and password Lab@123 for any account/password examples.
ASCII topology (same as Lesson 1 — shown here for completeness; IPs are the management/collector addresses used for telemetry):
R1 (Core Router) connected to SW1 (Distribution Switch) and to R2 (Edge) FlowCollector (collector.lab.nhprep.com) reachable from management network
Simple ASCII diagram with management/collector IPs:
+----------------------+
| FlowCollector |
| management: 192.0.2.10 |
| hostname: collector.lab.nhprep.com |
+----------+-----------+
|
| 192.0.2.10/24 (mgmt)
|
+-------------------+-------------------+
| |
+----+----+ +----+----+
| R1 | | R2 |
| Core | | Edge |
| mgmt: 192.0.2.1 | | mgmt: 192.0.2.2 |
+----+----+ +----+----+
| |
Gi0/0 | 10.0.0.1/30 Gi0/0| 10.0.0.2/30
+-------------------+-------------------+
|
SW1 (Distribution)
(access switches and leafs)
Interfaces shown include management IPs on the management network 192.0.2.0/24 and a data-plane link 10.0.0.0/30 between R1 and R2. Use these addresses when configuring telemetry and access.
Key Concepts
-
SNMP polling: SNMP (Simple Network Management Protocol) is used to poll devices for counters (interface octets), CPU, and memory values. Polling frequency affects accuracy and load — too-fast polling increases device overhead; too-slow polling loses granularity. In production, poll intervals commonly range from 30s to 300s depending on tolerance for variance.
Protocol behavior: SNMP uses UDP by default (port 161 for queries, 162 for traps). When you configure an SNMP community or user, the collector queries OIDs such as ifInOctets/ifOutOctets to derive bandwidth.
-
NetFlow (sFlow/Flexible NetFlow): NetFlow exports provide per-flow records (source/destination, ports, bytes, packets) that are useful for traffic engineering and top-talkers. Exports are buffered on device and sent to a collector; export interval and packetization affect bandwidth used for telemetry.
Protocol behavior: Flow records are exported over UDP (commonly) to the collector address and port; devices sample and aggregate flows by configured timeouts.
-
System metrics (CPU, memory): Routers track CPU usage over intervals and memory pools per process. High CPU often correlates with control-plane tasks (routing updates, NetFlow processing) or packet forwarding in software. Memory exhaustion causes instability and reloads in extreme cases.
In production: Use CPU 1-minute/5-minute averages and monitor spikes; sustained usage above designed headroom (for example >70% CPU for core routers) triggers upgrade planning.
-
Baseline and trend forecasting: Capacity forecasting uses historical data points to calculate growth rates (linear or percentage-based). Combine current utilization, expected growth rate, and desired headroom (e.g., 30%) to estimate time-to-upgrade.
Example: If link utilization grows 10% per month and current average is 40%, you can forecast when it will exceed 70% headroom and need upgrade.
Step-by-step configuration
Step 1: Configure SNMP for polling
What we are doing: Enable SNMPv2c on the router so an NMS can poll interface, CPU, and memory OIDs. SNMP is the primary lightweight telemetry mechanism for capacity counters in many environments.
R1# configure terminal
R1(config)# snmp-server community NHPREP_RO RO
R1(config)# snmp-server contact "NHPREP Network Operations"
R1(config)# snmp-server location "Lab Rack 1 - Core"
R1(config)# exit
What just happened:
snmp-server community NHPREP_RO ROcreated a read-only SNMP community named NHPREP_RO that the collector will use to poll OIDs. This allows the collector to query counters like ifInOctets/ifOutOctets and hrProcessorLoad.snmp-server contactandsnmp-server locationpopulate device metadata visible to the NMS and help operations quickly identify the device.
Real-world note: In production, prefer SNMPv3 with authentication and encryption. SNMPv2c is used here for lab simplicity; never use community strings in clear in untrusted networks.
Verify:
R1# show running-config | section snmp-server
snmp-server community NHPREP_RO RO
snmp-server contact NHPREP Network Operations
snmp-server location Lab Rack 1 - Core
Expected output:
snmp-server community NHPREP_RO RO
snmp-server contact NHPREP Network Operations
snmp-server location Lab Rack 1 - Core
Step 2: Configure NetFlow export to the collector
What we are doing: Enable Flexible NetFlow (standard Flow export example) to send flow records to the FlowCollector at 192.0.2.10. NetFlow gives per-flow visibility to compute top talkers and per-protocol utilization.
R1# configure terminal
R1(config)# flow exporter NHPREP_EXPORTER
R1(config-flow-exporter)# destination 192.0.2.10
R1(config-flow-exporter)# source GigabitEthernet0/1
R1(config-flow-exporter)# transport udp 2055
R1(config-flow-exporter)# exit
R1(config)# flow monitor NHPREP_MONITOR input
R1(config-flow-monitor)# record ipv4
R1(config-flow-monitor)# exporter NHPREP_EXPORTER
R1(config-flow-monitor)# exit
R1(config)# interface GigabitEthernet0/0
R1(config-if)# ip flow monitor NHPREP_MONITOR input
R1(config-if)# exit
R1(config)# exit
What just happened:
flow exporter NHPREP_EXPORTERdefines the collector address, source interface used for export packets, and the UDP port (2055).flow monitor NHPREP_MONITORcreates a monitor that records IPv4 flow keys and uses the previously defined exporter.ip flow monitor NHPREP_MONITOR inputon the data interface applies the flow monitor to packets entering that interface so flows are captured.
Real-world note: Choose the source interface so the collector can reach the device; exporting over a congested control plane link can amplify problems — consider a dedicated management network.
Verify:
R1# show flow exporter
Flow Exporter: NHPREP_EXPORTER
Destination: 192.0.2.10
Source: GigabitEthernet0/1
Transport: UDP, 2055
Version: 9
Total flows exported: 0
Export packets: 0
Export errors: 0
R1# show flow monitor NHPREP_MONITOR cache
Flow Monitor: NHPREP_MONITOR
Cache size: 65536
Active flows: 0
Total flows added since last cleared: 0
Expected output (complete):
Flow Exporter: NHPREP_EXPORTER
Destination: 192.0.2.10
Source: GigabitEthernet0/1
Transport: UDP, 2055
Version: 9
Total flows exported: 0
Export packets: 0
Export errors: 0
Flow Monitor: NHPREP_MONITOR
Cache size: 65536
Active flows: 0
Total flows added since last cleared: 0
Step 3: Collect baseline CPU, memory, and interface counters
What we are doing: Capture instantaneous system counters that form the baseline for forecasting. We will sample interface octet counters and system CPU/memory. These outputs are the raw data used for trend analysis.
R1# show processes cpu history
CPU utilization for five seconds: 5%/2%; one minute: 3%; five minutes: 4%
R1# show memory statistics
Head Total(b) Used(b) Free(b) Lowest(b) Largest(b)
Processor 123456789 23456789 100000000 1234567 987654
I/O 2345678 1234567 1111111 22222 1111111
R1# show interfaces GigabitEthernet0/0
GigabitEthernet0/0 is up, line protocol is up
Hardware is Gigabit Ethernet, address is 0001.0001.0001
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 2/255
10 packets input, 1000 bytes
8 packets output, 800 bytes
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 output errors, 0 collisions, 0 interface resets
Last clearing of "show interface" counters never
What just happened:
show processes cpu historyreturns CPU utilization over different intervals; these figures indicate load trends and whether sustained high CPU exists.show memory statisticsshows total, used, and free memory for processor pools — necessary to detect memory pressure.show interfacesreports interface counters including octet and packet counts; calculating delta over time yields bandwidth usage.
Real-world note: Collect these outputs periodically (for example every 5 minutes) and store them in a time-series database for trend analysis. One-off samples are insufficient to forecast growth.
Verify:
R1# show processes cpu history
CPU utilization for five seconds: 5%/2%; one minute: 3%; five minutes: 4%
R1# show memory statistics
Head Total(b) Used(b) Free(b) Lowest(b) Largest(b)
Processor 123456789 23456789 100000000 1234567 987654
I/O 2345678 1234567 1111111 22222 1111111
R1# show interfaces GigabitEthernet0/0
GigabitEthernet0/0 is up, line protocol is up
Hardware is Gigabit Ethernet, address is 0001.0001.0001
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 2/255
10 packets input, 1000 bytes
8 packets output, 800 bytes
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 output errors, 0 collisions, 0 interface resets
Last clearing of "show interface" counters never
Step 4: Calculate utilization and forecast growth
What we are doing: Use the sampled interface octet counters and time interval to compute actual bandwidth utilization, then apply an estimated growth rate to forecast when capacity will exceed desired headroom.
Commands are not device configurations but show the math using collected values. Example: assume two samples of interface octets 5 minutes apart.
! Sample 1 at T0 (bytes counters)
! show interfaces GigabitEthernet0/0 -> ifInOctets = 1000000, ifOutOctets = 2000000
! Sample 2 at T1 = T0 + 300 seconds (5 minutes)
! show interfaces GigabitEthernet0/0 -> ifInOctets = 6000000, ifOutOctets = 8000000
! Calculation (performed on the collector or manually):
! Bytes transferred during interval = (ifIn2 + ifOut2) - (ifIn1 + ifOut1)
! = (6000000 + 8000000) - (1000000 + 2000000) = 11000000 bytes
! Bits = 11000000 * 8 = 88000000 bits
! Average bandwidth (bps) = 88000000 / 300 = 293333.33 bps ≈ 293 kbps
! Utilization (%) on 1 Gbit link = (293333 / 1,000,000) * 100 = 0.0293% (negligible in this example)
What just happened:
- You derive bytes transferred over a time interval from two counter snapshots. Converting bytes to bits and dividing by interval gives average bits-per-second.
- The utilization percentage is average bps divided by interface bandwidth (BW). This baseline lets you project growth.
Real-world note: Use 64-bit counters if available for high-speed links to avoid counter rollover issues. Many modern devices support 64-bit ifHCInOctets/ifHCOutOctets.
Verify (example manual verification shown as output you would see on your collector/worksheet):
Collector calculation:
Sample interval: 300 seconds
Bytes delta: 11000000
Bits delta: 88000000
Average bps: 293333
Link capacity: 1000000000 bps
Utilization: 0.0293%
Step 5: Produce right-sizing recommendation
What we are doing: Combine baseline utilization, observed monthly growth rate, and chosen headroom to compute time-to-upgrade and give actionable recommendations (upgrade link, add link, or apply QoS).
No direct device configuration — this is output of analysis and then instruction to operations.
Example recommendation calculation:
! Inputs:
! Current average utilization = 40% (from collector)
! Desired headroom threshold = 70% (when to upgrade)
! Observed growth rate = 10% per month (relative increase)
! Time formula (approximate): months_to_threshold = log(target/current) / log(1 + growth_rate)
! Compute:
! target = 70%
! current = 40%
! growth_rate = 0.10
! months_to_threshold = ln(0.70/0.40) / ln(1.10) ≈ ln(1.75)/ln(1.10) ≈ 0.5595 / 0.09531 ≈ 5.87 months
! Recommendation:
! - Upgrade capacity or add parallel link within 5-6 months.
! - In the interim, apply strict QoS to protect critical flows, and schedule the upgrade during maintenance window.
What just happened:
- We used exponential growth model to estimate time until utilization reaches the threshold. This yields a timeline for procurement and scheduling.
- The recommendation gives both a timeline and interim mitigations (QoS, traffic engineering).
Real-world note: Growth is rarely purely exponential or linear; always validate model against multiple weeks/months of data and incorporate business events (marketing campaigns, new tenant onboarding) into projections.
Verify:
Analysis output (example):
Current utilization: 40%
Target threshold: 70%
Observed growth rate: 10% per month
Estimated time to exceed threshold: 5.9 months
Action: Schedule link upgrade/add capacity within 5 months. Apply QoS and traffic engineering immediately.
Verification Checklist
- Check 1: SNMP is reachable from the collector — verify by running an SNMP walk from the collector for sysDescr and comparing expected value.
- How to verify: SNMP walk of sysDescr returns device sysDescr (example shown in Step 1 verification).
- Check 2: NetFlow exporter is configured and shows the collector as destination.
- How to verify:
show flow exporterdisplays destination 192.0.2.10 and UDP port 2055 (see Step 2 verification).
- How to verify:
- Check 3: Baseline samples obtained — ensure two interface counter snapshots separated by the sampling interval produce consistent bytes delta.
- How to verify:
show interfaces GigabitEthernet0/0outputs counters; perform delta math to compute bps (see Step 3 and 4 verification).
- How to verify:
- Check 4: Forecast calculation yields an action plan — confirm arithmetic and timeline are reasonable and documented on the ticket.
Common Mistakes
| Symptom | Cause | Fix |
|---|---|---|
| Collector receives no SNMP data | SNMP community misconfigured or ACL blocking UDP/161 | Verify snmp-server community and confirm network ACLs allow UDP/161 from collector to device |
| NetFlow exporter shows 0 packets exported | Wrong source interface or collector unreachable | Check flow exporter destination and source interface; ensure the source interface can reach 192.0.2.10; verify route and connectivity |
| Interface counters wrap / produce negative deltas | Using 32-bit counters on high-speed links leading to rollover | Use 64-bit counters (ifHCInOctets/ifHCOutOctets) or shorten the polling interval to avoid rollover |
| Forecast timeline too short/long | Growth model uses noisy data or one-off spike skewing result | Use median/percentile over a longer history and exclude known outliers; repeat analysis with weekly/monthly baselines |
Key Takeaways
- Always collect consistent baseline samples (interface octets, CPU, memory) and store them in a time-series system before attempting forecasts — one-off snapshots are insufficient.
- SNMP and NetFlow provide complementary telemetry: SNMP for counters and system metrics; NetFlow for detailed traffic composition and top talkers.
- Forecasting applies a growth model (linear or exponential) to current utilization with a chosen headroom threshold; convert results into actionable timelines and interim mitigations (QoS, TE).
- In production, secure telemetry channels (SNMPv3, encrypted exporters, management-plane isolation) and be cautious about telemetry impact on device CPU; plan export sampling and poll intervals accordingly.
Tip: Think of forecasting like fuel gauging — you measure current consumption rate and predict how long fuel will last. The more accurate and frequent your measurements, the better your estimate.
This completes Lesson 4: Capacity Forecasting. In the next lesson we'll use the flow collector's top-talkers to create traffic steering policies to relieve identified bottlenecks.