Lesson 4 of 6

Capacity Forecasting

Objective

In this lesson you will learn how to forecast capacity for bandwidth, CPU, and memory on network devices so you can right-size infrastructure before service degradation occurs. We will configure device telemetry basics (SNMP and NetFlow), collect baseline counters and system metrics, and use simple math to predict when resources will exhaust. This matters in production: proactive capacity forecasting prevents outages, helps meet SLAs, and supports purchase/upgrade decisions for routers and switches in a real data center or campus network.

Real-world scenario: NHPREP operates a multi-domain campus network carrying tenant services. Engineering must know when links and routers will need upgrades to maintain SLA headroom for bursts and future growth. This lesson shows how to gather the raw data and convert it into a forecast and recommendation.

Quick Recap

Reference the topology from Lesson 1 (basic campus core, distribution, and a flow collector). This lesson does not add new physical devices; we will configure SNMP and NetFlow on the same routers from Lesson 1 and point exports to the existing flow collector/service in the topology.

Important: All examples use the lab domain lab.nhprep.com and password Lab@123 for any account/password examples.

ASCII topology (same as Lesson 1 — shown here for completeness; IPs are the management/collector addresses used for telemetry):

R1 (Core Router) connected to SW1 (Distribution Switch) and to R2 (Edge) FlowCollector (collector.lab.nhprep.com) reachable from management network

Simple ASCII diagram with management/collector IPs:

                           +----------------------+
                           |   FlowCollector      |
                           |   management: 192.0.2.10 |
                           |   hostname: collector.lab.nhprep.com |
                           +----------+-----------+
                                      |
                                      | 192.0.2.10/24 (mgmt)
                                      |
                  +-------------------+-------------------+
                  |                                       |
             +----+----+                             +----+----+
             |   R1    |                             |   R2    |
             | Core    |                             | Edge    |
             | mgmt: 192.0.2.1 |                     | mgmt: 192.0.2.2 |
             +----+----+                             +----+----+
                  |                                       |
            Gi0/0 | 10.0.0.1/30                      Gi0/0| 10.0.0.2/30
                  +-------------------+-------------------+
                                      |
                                   SW1 (Distribution)
                                   (access switches and leafs)

Interfaces shown include management IPs on the management network 192.0.2.0/24 and a data-plane link 10.0.0.0/30 between R1 and R2. Use these addresses when configuring telemetry and access.

Key Concepts

  • SNMP polling: SNMP (Simple Network Management Protocol) is used to poll devices for counters (interface octets), CPU, and memory values. Polling frequency affects accuracy and load — too-fast polling increases device overhead; too-slow polling loses granularity. In production, poll intervals commonly range from 30s to 300s depending on tolerance for variance.

    Protocol behavior: SNMP uses UDP by default (port 161 for queries, 162 for traps). When you configure an SNMP community or user, the collector queries OIDs such as ifInOctets/ifOutOctets to derive bandwidth.

  • NetFlow (sFlow/Flexible NetFlow): NetFlow exports provide per-flow records (source/destination, ports, bytes, packets) that are useful for traffic engineering and top-talkers. Exports are buffered on device and sent to a collector; export interval and packetization affect bandwidth used for telemetry.

    Protocol behavior: Flow records are exported over UDP (commonly) to the collector address and port; devices sample and aggregate flows by configured timeouts.

  • System metrics (CPU, memory): Routers track CPU usage over intervals and memory pools per process. High CPU often correlates with control-plane tasks (routing updates, NetFlow processing) or packet forwarding in software. Memory exhaustion causes instability and reloads in extreme cases.

    In production: Use CPU 1-minute/5-minute averages and monitor spikes; sustained usage above designed headroom (for example >70% CPU for core routers) triggers upgrade planning.

  • Baseline and trend forecasting: Capacity forecasting uses historical data points to calculate growth rates (linear or percentage-based). Combine current utilization, expected growth rate, and desired headroom (e.g., 30%) to estimate time-to-upgrade.

    Example: If link utilization grows 10% per month and current average is 40%, you can forecast when it will exceed 70% headroom and need upgrade.

Step-by-step configuration

Step 1: Configure SNMP for polling

What we are doing: Enable SNMPv2c on the router so an NMS can poll interface, CPU, and memory OIDs. SNMP is the primary lightweight telemetry mechanism for capacity counters in many environments.

R1# configure terminal
R1(config)# snmp-server community NHPREP_RO RO
R1(config)# snmp-server contact "NHPREP Network Operations"
R1(config)# snmp-server location "Lab Rack 1 - Core"
R1(config)# exit

What just happened:

  • snmp-server community NHPREP_RO RO created a read-only SNMP community named NHPREP_RO that the collector will use to poll OIDs. This allows the collector to query counters like ifInOctets/ifOutOctets and hrProcessorLoad.
  • snmp-server contact and snmp-server location populate device metadata visible to the NMS and help operations quickly identify the device.

Real-world note: In production, prefer SNMPv3 with authentication and encryption. SNMPv2c is used here for lab simplicity; never use community strings in clear in untrusted networks.

Verify:

R1# show running-config | section snmp-server
snmp-server community NHPREP_RO RO
snmp-server contact NHPREP Network Operations
snmp-server location Lab Rack 1 - Core

Expected output:

snmp-server community NHPREP_RO RO
snmp-server contact NHPREP Network Operations
snmp-server location Lab Rack 1 - Core

Step 2: Configure NetFlow export to the collector

What we are doing: Enable Flexible NetFlow (standard Flow export example) to send flow records to the FlowCollector at 192.0.2.10. NetFlow gives per-flow visibility to compute top talkers and per-protocol utilization.

R1# configure terminal
R1(config)# flow exporter NHPREP_EXPORTER
R1(config-flow-exporter)# destination 192.0.2.10
R1(config-flow-exporter)# source GigabitEthernet0/1
R1(config-flow-exporter)# transport udp 2055
R1(config-flow-exporter)# exit
R1(config)# flow monitor NHPREP_MONITOR input
R1(config-flow-monitor)# record ipv4
R1(config-flow-monitor)# exporter NHPREP_EXPORTER
R1(config-flow-monitor)# exit
R1(config)# interface GigabitEthernet0/0
R1(config-if)# ip flow monitor NHPREP_MONITOR input
R1(config-if)# exit
R1(config)# exit

What just happened:

  • flow exporter NHPREP_EXPORTER defines the collector address, source interface used for export packets, and the UDP port (2055).
  • flow monitor NHPREP_MONITOR creates a monitor that records IPv4 flow keys and uses the previously defined exporter.
  • ip flow monitor NHPREP_MONITOR input on the data interface applies the flow monitor to packets entering that interface so flows are captured.

Real-world note: Choose the source interface so the collector can reach the device; exporting over a congested control plane link can amplify problems — consider a dedicated management network.

Verify:

R1# show flow exporter
Flow Exporter: NHPREP_EXPORTER
  Destination: 192.0.2.10
  Source: GigabitEthernet0/1
  Transport: UDP, 2055
  Version: 9
  Total flows exported: 0
  Export packets: 0
  Export errors: 0

R1# show flow monitor NHPREP_MONITOR cache
Flow Monitor: NHPREP_MONITOR
  Cache size: 65536
  Active flows: 0
  Total flows added since last cleared: 0

Expected output (complete):

Flow Exporter: NHPREP_EXPORTER
  Destination: 192.0.2.10
  Source: GigabitEthernet0/1
  Transport: UDP, 2055
  Version: 9
  Total flows exported: 0
  Export packets: 0
  Export errors: 0

Flow Monitor: NHPREP_MONITOR
  Cache size: 65536
  Active flows: 0
  Total flows added since last cleared: 0

Step 3: Collect baseline CPU, memory, and interface counters

What we are doing: Capture instantaneous system counters that form the baseline for forecasting. We will sample interface octet counters and system CPU/memory. These outputs are the raw data used for trend analysis.

R1# show processes cpu history
CPU utilization for five seconds: 5%/2%; one minute: 3%; five minutes: 4%

R1# show memory statistics
Head    Total(b)     Used(b)     Free(b)     Lowest(b)  Largest(b)
Processor  123456789  23456789   100000000   1234567    987654
      I/O   2345678     1234567    1111111    22222      1111111

R1# show interfaces GigabitEthernet0/0
GigabitEthernet0/0 is up, line protocol is up
  Hardware is Gigabit Ethernet, address is 0001.0001.0001
  MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
     reliability 255/255, txload 1/255, rxload 2/255
  10 packets input, 1000 bytes
  8 packets output, 800 bytes
  0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
  0 output errors, 0 collisions, 0 interface resets
  Last clearing of "show interface" counters never

What just happened:

  • show processes cpu history returns CPU utilization over different intervals; these figures indicate load trends and whether sustained high CPU exists.
  • show memory statistics shows total, used, and free memory for processor pools — necessary to detect memory pressure.
  • show interfaces reports interface counters including octet and packet counts; calculating delta over time yields bandwidth usage.

Real-world note: Collect these outputs periodically (for example every 5 minutes) and store them in a time-series database for trend analysis. One-off samples are insufficient to forecast growth.

Verify:

R1# show processes cpu history
CPU utilization for five seconds: 5%/2%; one minute: 3%; five minutes: 4%

R1# show memory statistics
Head    Total(b)     Used(b)     Free(b)     Lowest(b)  Largest(b)
Processor  123456789  23456789   100000000   1234567    987654
      I/O   2345678     1234567    1111111    22222      1111111

R1# show interfaces GigabitEthernet0/0
GigabitEthernet0/0 is up, line protocol is up
  Hardware is Gigabit Ethernet, address is 0001.0001.0001
  MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
     reliability 255/255, txload 1/255, rxload 2/255
  10 packets input, 1000 bytes
  8 packets output, 800 bytes
  0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
  0 output errors, 0 collisions, 0 interface resets
  Last clearing of "show interface" counters never

Step 4: Calculate utilization and forecast growth

What we are doing: Use the sampled interface octet counters and time interval to compute actual bandwidth utilization, then apply an estimated growth rate to forecast when capacity will exceed desired headroom.

Commands are not device configurations but show the math using collected values. Example: assume two samples of interface octets 5 minutes apart.

! Sample 1 at T0 (bytes counters)
! show interfaces GigabitEthernet0/0  -> ifInOctets = 1000000, ifOutOctets = 2000000

! Sample 2 at T1 = T0 + 300 seconds (5 minutes)
! show interfaces GigabitEthernet0/0  -> ifInOctets = 6000000, ifOutOctets = 8000000

! Calculation (performed on the collector or manually):
! Bytes transferred during interval = (ifIn2 + ifOut2) - (ifIn1 + ifOut1)
! = (6000000 + 8000000) - (1000000 + 2000000) = 11000000 bytes
! Bits = 11000000 * 8 = 88000000 bits
! Average bandwidth (bps) = 88000000 / 300 = 293333.33 bps ≈ 293 kbps
! Utilization (%) on 1 Gbit link = (293333 / 1,000,000) * 100 = 0.0293% (negligible in this example)

What just happened:

  • You derive bytes transferred over a time interval from two counter snapshots. Converting bytes to bits and dividing by interval gives average bits-per-second.
  • The utilization percentage is average bps divided by interface bandwidth (BW). This baseline lets you project growth.

Real-world note: Use 64-bit counters if available for high-speed links to avoid counter rollover issues. Many modern devices support 64-bit ifHCInOctets/ifHCOutOctets.

Verify (example manual verification shown as output you would see on your collector/worksheet):

Collector calculation:
  Sample interval: 300 seconds
  Bytes delta: 11000000
  Bits delta: 88000000
  Average bps: 293333
  Link capacity: 1000000000 bps
  Utilization: 0.0293%

Step 5: Produce right-sizing recommendation

What we are doing: Combine baseline utilization, observed monthly growth rate, and chosen headroom to compute time-to-upgrade and give actionable recommendations (upgrade link, add link, or apply QoS).

No direct device configuration — this is output of analysis and then instruction to operations.

Example recommendation calculation:

! Inputs:
! Current average utilization = 40% (from collector)
! Desired headroom threshold = 70% (when to upgrade)
! Observed growth rate = 10% per month (relative increase)
! Time formula (approximate): months_to_threshold = log(target/current) / log(1 + growth_rate)

! Compute:
! target = 70%
! current = 40%
! growth_rate = 0.10

! months_to_threshold = ln(0.70/0.40) / ln(1.10) ≈ ln(1.75)/ln(1.10) ≈ 0.5595 / 0.09531 ≈ 5.87 months

! Recommendation:
! - Upgrade capacity or add parallel link within 5-6 months.
! - In the interim, apply strict QoS to protect critical flows, and schedule the upgrade during maintenance window.

What just happened:

  • We used exponential growth model to estimate time until utilization reaches the threshold. This yields a timeline for procurement and scheduling.
  • The recommendation gives both a timeline and interim mitigations (QoS, traffic engineering).

Real-world note: Growth is rarely purely exponential or linear; always validate model against multiple weeks/months of data and incorporate business events (marketing campaigns, new tenant onboarding) into projections.

Verify:

Analysis output (example):
  Current utilization: 40%
  Target threshold: 70%
  Observed growth rate: 10% per month
  Estimated time to exceed threshold: 5.9 months
  Action: Schedule link upgrade/add capacity within 5 months. Apply QoS and traffic engineering immediately.

Verification Checklist

  • Check 1: SNMP is reachable from the collector — verify by running an SNMP walk from the collector for sysDescr and comparing expected value.
    • How to verify: SNMP walk of sysDescr returns device sysDescr (example shown in Step 1 verification).
  • Check 2: NetFlow exporter is configured and shows the collector as destination.
    • How to verify: show flow exporter displays destination 192.0.2.10 and UDP port 2055 (see Step 2 verification).
  • Check 3: Baseline samples obtained — ensure two interface counter snapshots separated by the sampling interval produce consistent bytes delta.
    • How to verify: show interfaces GigabitEthernet0/0 outputs counters; perform delta math to compute bps (see Step 3 and 4 verification).
  • Check 4: Forecast calculation yields an action plan — confirm arithmetic and timeline are reasonable and documented on the ticket.

Common Mistakes

SymptomCauseFix
Collector receives no SNMP dataSNMP community misconfigured or ACL blocking UDP/161Verify snmp-server community and confirm network ACLs allow UDP/161 from collector to device
NetFlow exporter shows 0 packets exportedWrong source interface or collector unreachableCheck flow exporter destination and source interface; ensure the source interface can reach 192.0.2.10; verify route and connectivity
Interface counters wrap / produce negative deltasUsing 32-bit counters on high-speed links leading to rolloverUse 64-bit counters (ifHCInOctets/ifHCOutOctets) or shorten the polling interval to avoid rollover
Forecast timeline too short/longGrowth model uses noisy data or one-off spike skewing resultUse median/percentile over a longer history and exclude known outliers; repeat analysis with weekly/monthly baselines

Key Takeaways

  • Always collect consistent baseline samples (interface octets, CPU, memory) and store them in a time-series system before attempting forecasts — one-off snapshots are insufficient.
  • SNMP and NetFlow provide complementary telemetry: SNMP for counters and system metrics; NetFlow for detailed traffic composition and top talkers.
  • Forecasting applies a growth model (linear or exponential) to current utilization with a chosen headroom threshold; convert results into actionable timelines and interim mitigations (QoS, TE).
  • In production, secure telemetry channels (SNMPv3, encrypted exporters, management-plane isolation) and be cautious about telemetry impact on device CPU; plan export sampling and poll intervals accordingly.

Tip: Think of forecasting like fuel gauging — you measure current consumption rate and predict how long fuel will last. The more accurate and frequent your measurements, the better your estimate.

This completes Lesson 4: Capacity Forecasting. In the next lesson we'll use the flow collector's top-talkers to create traffic steering policies to relieve identified bottlenecks.