Lesson 3 of 5

AI Fundamentals: ML, Deep Learning, and LLMs

AI Fundamentals: ML, Deep Learning, and LLMs

Introduction

Artificial intelligence is transforming the way networks are designed, operated, and troubleshot. As a network engineer, understanding the foundational layers of AI — from machine learning to deep learning to large language models — is no longer optional. These technologies are already embedded in the platforms and tools you work with every day, from wireless optimization to SD-WAN operations.

In this lesson, you will learn the core hierarchy of AI disciplines, understand how each layer builds on the one before it, explore what large language models are and how they work, and see how techniques like Retrieval Augmented Generation solve real problems in network operations. By the end, you will have a solid conceptual foundation to carry into the more hands-on lessons that follow.

Key Concepts

The AI Hierarchy

AI is not a single technology — it is a set of nested disciplines, each more specialized than the last. Understanding this hierarchy is critical before diving into any practical application.

LayerDefinition
Artificial IntelligenceThe broadest category — AI encompasses the entire world of machine learning and deep learning
Machine LearningAn AI technology where the rules are not set in the program but are learned while the program is used. ML enables systems to learn from data to perform tasks without explicit programming
Deep LearningA form of ML that uses neural networks to model and interpret complex patterns in large datasets
Generative AIAI that generates content — text, images, video, and more

Think of these as concentric circles. AI is the outermost ring. Inside it sits machine learning. Inside machine learning sits deep learning. And inside deep learning sits generative AI. Each inner layer is a more specialized subset of the one surrounding it.

Neural Networks and the Perceptron

Deep learning is built on neural networks, which are computational structures inspired by the human brain. The fundamental building block of a neural network is the perceptron (also called a neuron). Perceptrons connect to each other through parameters (analogous to synapses in the brain). The advances in high-density, high-performance GPUs have made it possible to train neural networks with billions and even trillions of parameters.

To put this in perspective: the human brain contains approximately 86 billion neurons and over 100 trillion synaptic connections. Modern AI models are approaching and in some cases exceeding these numbers in terms of raw parameter count, scaling from billions of parameters into the trillions.

The Attention Mechanism

One of the most important breakthroughs in modern AI is the attention mechanism. Consider the sentence: "I swam across the river to get to the other bank." You have no problem interpreting the word "bank" as a riverbank, not a financial institution. A machine, however, needs help making that distinction. The attention mechanism adds contextual information to words in a sentence, allowing the model to understand meaning based on surrounding context. This mechanism is the foundation for transformer models, which power today's large language models.

Large Language Models (LLMs)

Large Language Models are designed to understand and generate content in human language. LLMs are trained on massive amounts of data sets through a process involving tokenization, vectorization, and fine-tuning the learning.

Key characteristics of LLMs:

  • They answer queries using natural language, based on similarity search results
  • They provide advanced features like tool calling and function calling
  • They are trained on enormous datasets, with parameters ranging from billions to trillions

LLMs come in two broad categories:

TypeExamples
Closed SourceChatGPT, Claude, Gemini
Open SourceLlama, Mistral, Mixtral, Phi, Orca, Gemma, Vicuna, Wizard, Zephyr, Dolphin

When selecting an LLM for your use case, you must consider data privacy, cost, and specific use cases. Not every problem needs the largest model, and not every environment can tolerate sending data to a cloud-hosted model.

How It Works

The Hallucination Problem

Generic LLM models have a fundamental limitation: they do not have access to real-time data, domain-specific data, or the latest data. When an LLM lacks the information needed to answer a question accurately, it may generate plausible-sounding but incorrect responses. This is known as hallucination, and it is one of the biggest challenges when applying LLMs to network engineering tasks where accuracy is critical.

For example, if you ask a generic LLM about the current state of your SD-WAN fabric, it has no way to know your topology, your policies, or your live telemetry. It will either refuse to answer or fabricate a response.

Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation (RAG) solves the hallucination problem by allowing an LLM to access external sources of data. Instead of relying solely on its training data, a RAG-enabled system retrieves matches from a provided dataset and uses the LLM to create high-confidence content based on that real data.

The RAG workflow involves several steps:

  1. Data Collection — Configuration and operational data is collected from network management platforms (such as an SD-WAN Manager) using APIs
  2. Data Transformation — The raw data is parsed and transformed into a usable format
  3. Embedding Generation — The transformed data is converted into numerical representations (embeddings)
  4. Vector Database Storage — These embeddings are stored in a vector database for efficient similarity searching
  5. Query Processing — When a user asks a question, the system searches the vector database for relevant matches and feeds those matches to the LLM along with the question
  6. Response Generation — The LLM generates a response grounded in the actual retrieved data, dramatically reducing hallucination

LangChain as an Integration Framework

LangChain is a framework that streamlines the development of RAG solutions. Its key capabilities include:

  • Seamlessly integrates dataset imports, models, vector databases, and LLMs
  • Modular design enables easy swapping of apps, LLM models, and vector databases
  • Streamlines development of Retrieval Augmented Generation (RAG) solutions

LangChain acts as the glue between your data sources, your vector database, and your chosen LLM, making it significantly easier to build AI-powered network tools without writing everything from scratch.

Configuration Example

While AI and ML concepts are not configured with CLI commands the way a router or switch would be, the practical application in networking involves interacting with management platforms through APIs and AI assistants. Here is an example of how LLM tool calling works in an SD-WAN context.

Conversational Network Troubleshooting with LLM Tool Calling

Consider a scenario where you need to troubleshoot a path issue in your SD-WAN fabric. Rather than manually navigating dashboards and running API calls, you interact with an AI assistant built on LangChain and an LLM:

User: Can you help run trace & analyze it?

Behind the scenes, the system performs the following:

  1. The LLM uses tool calling to interact with the SD-WAN Manager and start an NWPI (Network-Wide Path Insight) trace
  2. The SD-WAN Manager executes the trace and collects insights from the fabric
  3. The LLM receives the NWPI insights and provides a summarized analysis along with remediation suggestions

This workflow uses two LLM instances working together through LangChain — one handling the tool calling to start the NWPI trace and collect data, and another performing the analysis and generating human-readable remedy actions based on the NWPI insights.

Real-World Application

AIOps for Day-2 Network Operations

AI and ML are already being applied to simplify day-2 network operations in several key areas:

  • Predictive Path Recommendations — AI models analyze traffic patterns and predict optimal paths before congestion occurs, helping mitigate issues before they impact users
  • Root Cause Analysis — When issues do occur, AI-powered analysis reduces Mean Time To Resolution (MTTR) by quickly identifying the underlying cause
  • Bandwidth Forecasting and Capacity Planning — AI/ML-based forecasting helps network teams plan for growth and optimize network and application performance for higher operational efficiency
  • AI Assistant for Networking — Interactive LLM-based AI assistants allow engineers to query network state and receive analysis in natural language

AI-Enhanced Wireless with RRM

AI-enhanced Radio Resource Management (RRM) is a practical example of AI/ML in production wireless networks. The system works as follows:

  1. Anonymized RF data is collected from Wave 2 and Wi-Fi 6/6E access points
  2. This data flows to the AI cloud where AI-enhanced RRM algorithms process it
  3. AI-based data and events are sent to the network controller
  4. Decisions are configured through automation
  5. The result is an optimized wireless experience with minimal manual intervention

In real-world deployments, AI-enhanced RRM has demonstrated strong results:

  • Initial convergence takes approximately 3 hours
  • Changes are typically made during nighttime maintenance windows
  • Network health has stayed above 85%, which is considered very good under load
  • When manual changes are introduced, the decrease in efficiency is easy to spot compared to the AI-optimized baseline

Best Practice: Let the AI-enhanced RRM system run uninterrupted for at least the initial convergence period before evaluating results. Manual overrides during this window will interfere with the learning process.

Design Considerations

When planning AI/ML integration into your network operations, keep these factors in mind:

  • Data Privacy — Evaluate whether your data can be sent to cloud-hosted LLMs or whether you need on-premises open-source models
  • Cost — Larger closed-source models offer more capability but at higher cost; open-source alternatives may be sufficient for specific use cases
  • Hallucination Risk — Always use RAG or similar grounding techniques when accuracy matters; never rely on a generic LLM for operational decisions without data retrieval

Summary

  • AI is a hierarchy: Artificial Intelligence contains Machine Learning, which contains Deep Learning, which contains Generative AI — each layer is a more specialized subset
  • LLMs understand and generate human language through tokenization, vectorization, and fine-tuning on massive datasets, and they offer advanced capabilities like tool calling
  • Hallucination is the core risk when using generic LLMs that lack access to real-time, domain-specific, or current data
  • RAG eliminates hallucination by retrieving real data from external sources (APIs, vector databases) and grounding the LLM response in that data
  • AI/ML is already in production networks through AI-enhanced RRM for wireless, AIOps for predictive operations, and LLM-powered assistants for SD-WAN troubleshooting

In the next lesson, we will explore how these AI/ML concepts are applied to specific network automation workflows, building on the RAG and LangChain foundations covered here.