Lesson 5 of 6

RAG for Network Knowledge

Objective

Build a Retrieval-Augmented Generation (RAG) system that can answer operational network questions by retrieving and summarizing your existing network documentation and device configurations. We will ingest text (runbooks, configs, diagrams), convert them into embeddings, store them in a vector index, and build a retrieval + generative pipeline that returns context-qualified answers with source citations. In production, this reduces time-to-answer for NetOps, helps with troubleshooting, and lowers the risk of hallucination by grounding responses in your own documentation. Real-world scenario: a network engineer asks “How was BGP configured for the data-center edge?” and receives a concise answer quoting the exact config snippets and runbook pages.

Quick Recap

Refer to the topology and devices configured in Lesson 1 — this lesson does not add any new routers, switches, or IP addresses. We operate from an operations workstation that has access to the network documentation repository (local or mounted share). All actions in this lesson happen on that workstation or in a contained application environment. No additional network device configuration is required.

Key Concepts (Theory + Practical)

  • Embeddings and Vector Search

    • Theory: Embeddings map text to high-dimensional numeric vectors where semantic similarity corresponds to geometric closeness.
    • Practical: We create embeddings for each chunk of config/runbook; similar questions retrieve closely matching chunks. This is how RAG finds relevant context.
  • Chunking and Context Windows

    • Theory: Large documents must be split into chunks smaller than the model’s context window so retrieval can provide useful, focused context.
    • Practical: Use overlapping chunking to preserve context across boundaries; too-large chunks waste tokens, too-small chunks lose meaning.
  • Retrieval → Augmentation → Generation Pipeline

    • Theory: Retrieval gathers supporting context; the generator (LLM) uses that context to produce an answer. Proper prompt engineering tells the model to cite sources.
    • Practical: The retrieval step reduces hallucination by anchoring the model with actual text from documentation; the generation step summarizes and synthesizes.
  • Vector Store Persistence and Re-indexing

    • Theory: Vector stores (FAISS, etc.) hold embedding vectors for fast nearest-neighbor lookup; they must be persisted and refreshed when documents change.
    • Practical: In production, schedule re-ingestion after nightly config backups or trigger on config-change events.
  • Verification & Attribution

    • Theory: Confidence scores and document provenance matter — always return source references.
    • Practical: Build the RAG pipeline to return source file names and text snippets alongside answers.

Tip: Think of the vector store like a library index — embeddings are the index cards; retrieval gets the cards closest to your query, then the LLM reads those pages to form its answer.


Step-by-step configuration

Step 1: Prepare the document corpus and chunk the text

What we are doing: Collect your network docs and device configs into a single directory, then split large files into smaller, overlapping chunks suitable for embedding. Chunking preserves meaning and prevents context-window overflow during generation.

# Create workspace and copy documents (assumes docs are on a mounted share)
mkdir -p ~/nhprep_rag/corpus
cp /mnt/docs/*.txt ~/nhprep_rag/corpus/
cp /mnt/configs/*.cfg ~/nhprep_rag/corpus/

# Install required Python packages (one-time)
python3 -m pip install --upgrade pip
python3 -m pip install langchain faiss-cpu sentence-transformers

# Create a Python script to chunk documents
cat > ~/nhprep_rag/chunk_documents.py << 'PY'
from langchain.text_splitter import RecursiveCharacterTextSplitter
import os

input_dir = os.path.expanduser('~/nhprep_rag/corpus')
output_dir = os.path.expanduser('~/nhprep_rag/chunks')
os.makedirs(output_dir, exist_ok=True)

splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100)

for fname in sorted(os.listdir(input_dir)):
    if not fname.endswith(('.txt', '.cfg')):
        continue
    path = os.path.join(input_dir, fname)
    with open(path, 'r', encoding='utf-8', errors='ignore') as f:
        text = f.read()
    chunks = splitter.split_text(text)
    for i, c in enumerate(chunks):
        outname = f"{fname}__chunk_{i+1}.txt"
        with open(os.path.join(output_dir, outname), 'w', encoding='utf-8') as out:
            out.write(c)
print(f"Created chunks in: {output_dir}")
PY

What just happened:

  • We created a working directory and copied documentation into it so ingestion is centralized.
  • Installed Python libraries: langchain for text splitting and orchestration; faiss-cpu for vector indexing; sentence-transformers will be used later if desired.
  • The script uses a recursive splitter to produce 800-character chunks with 100-character overlap — this preserves sentence continuity and yields chunk sizes that fit most model context windows.

Real-world note: In production, chunking parameters are tuned for the target LLM’s context window and your document styles (configs vs. runbooks).

Verify:

# Run the chunking script and list chunk files
python3 ~/nhprep_rag/chunk_documents.py
ls -l ~/nhprep_rag/chunks | head -n 20

Expected output:

Created chunks in: /home/youruser/nhprep_rag/chunks
total 96
-rw-r--r-- 1 youruser youruser  812 Apr  1 10:01 bgp_runbook.txt__chunk_1.txt
-rw-r--r-- 1 youruser youruser  760 Apr  1 10:01 bgp_runbook.txt__chunk_2.txt
-rw-r--r-- 1 youruser youruser  795 Apr  1 10:01 ospf_config.cfg__chunk_1.txt
-rw-r--r-- 1 youruser youruser  790 Apr  1 10:01 ospf_config.cfg__chunk_2.txt
# etc...

Step 2: Create embeddings for each chunk and build a FAISS index

What we are doing: Compute vector embeddings for every chunk and store them in a persisted FAISS vector index. This enables fast nearest-neighbor retrieval for queries.

# Create the embedding + index script
cat > ~/nhprep_rag/build_faiss_index.py << 'PY'
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.schema import Document
from langchain.vectorstores import FAISS
import os

chunks_dir = os.path.expanduser('~/nhprep_rag/chunks')
docs = []
for fname in sorted(os.listdir(chunks_dir)):
    path = os.path.join(chunks_dir, fname)
    with open(path, 'r', encoding='utf-8') as f:
        text = f.read()
    metadata = {"source": fname, "domain": "lab.nhprep.com", "org": "NHPREP"}
    docs.append(Document(page_content=text, metadata=metadata))

# Use a local HuggingFace sentence-transformer model for embeddings
emb = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

index = FAISS.from_documents(docs, emb)
index.save_local(os.path.expanduser('~/nhprep_rag/faiss_index'))
print("FAISS index built and saved.")
PY

# Run the script (takes some time for embeddings)
python3 ~/nhprep_rag/build_faiss_index.py

What just happened:

  • We loaded each chunk as a Document with metadata (source filename, domain lab.nhprep.com, org NHPREP). Metadata aids attribution.
  • We used a local sentence-transformer embedding model (a common, open model family) to compute embeddings. These vectors populate a FAISS index for efficient similarity search.
  • The index is saved locally so it can be reused without recomputing embeddings.

Real-world note: Using local embedding models keeps sensitive docs on-prem; cloud APIs are an alternative when you need higher-quality embeddings or managed services.

Verify:

# List persisted FAISS index files and show a quick count using a small probe script
ls -l ~/nhprep_rag/faiss_index
python3 - << 'PY'
from langchain.vectorstores import FAISS
import os
index = FAISS.load_local(os.path.expanduser('~/nhprep_rag/faiss_index'), embeddings=None)
print("Loaded FAISS index. Approximate number of vectors:", index.index.ntotal)
PY

Expected output:

total 24
-rw-r--r-- 1 youruser youruser  4096 Apr  1 10:30 index.faiss
-rw-r--r-- 1 youruser youruser 16384 Apr  1 10:30 vectors.pkl
Loaded FAISS index. Approximate number of vectors: 48

Step 3: Build the Retriever + RAG answering script

What we are doing: Create a simple retrieval-augmented generator that: accepts a user query, retrieves the top-k relevant chunks from FAISS, and calls a text-generation model to produce a concise answer that cites sources.

# Create the RAG answering script
cat > ~/nhprep_rag/rag_answer.py << 'PY'
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.llms import OpenAI  # Placeholder wrapper; configure per your LLM
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
import os

# Load index and embeddings
emb = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
index = FAISS.load_local(os.path.expanduser('~/nhprep_rag/faiss_index'), embeddings=emb)
retriever = index.as_retriever(search_type="similarity", search_kwargs={"k": 4})

# Prompt template instructs the LLM to answer using context and cite sources
template = """You are a network documentation assistant for the organization NHPREP.
Use ONLY the provided context to answer the question. For any factual claim, include source filenames in square brackets.
If the answer is not in the context, say "Information not present in the documentation."

Context:
{context}

Question:
{question}

Answer with citations:
"""
prompt = PromptTemplate(template=template, input_variables=["context", "question"])

# Replace OpenAI with the LLM wrapper you have available; configure API keys or local models as needed.
llm = OpenAI(temperature=0.0)  # deterministic answers

qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True)
query = "How do I configure OSPF on the distribution switches?"
result = qa({"query": query})

print("Answer:")
print(result['result'])
print("\nSource documents and snippets:")
for doc in result['source_documents']:
    print("-", doc.metadata.get('source', 'unknown'), "->", doc.page_content[:200].replace("\n", " "))
PY

# Run a sample query
python3 ~/nhprep_rag/rag_answer.py

What just happened:

  • We loaded the FAISS index and created a retriever that returns the top 4 semantically similar chunks for any query.
  • The prompt template forces the generator to only use provided context and to attach sources. This reduces hallucination and provides actionable provenance.
  • The script runs a sample query and prints the answer plus the source filenames and snippets.

Real-world note: In production, replace the LLM wrapper with your sanctioned model and manage API keys or local models via secure vaults. Keep temperature low for operational answers.

Verify:

python3 ~/nhprep_rag/rag_answer.py

Expected output:

Answer:
Configure OSPF area 0 on distribution interfaces by enabling the OSPF process, assigning area 0 to the interfaces, and ensuring correct network statements. Example steps: 1) router ospf 1 2) network 10.1.0.0 0.0.255.255 area 0 3) verify neighbor adjacency and LSDB [ospf_config.cfg__chunk_1.txt] [ospf_runbook.txt__chunk_3.txt]

Source documents and snippets:
- ospf_config.cfg__chunk_1.txt -> router ospf 1
 network 10.1.0.0 0.0.255.255 area 0
 interface GigabitEthernet1/0
  ip ospf 1 area 0
- ospf_runbook.txt__chunk_3.txt -> Verify OSPF adjacency using 'show ip ospf neighbor' and check LSDB with 'show ip ospf database'

Step 4: Add provenance, confidence, and periodic re-indexing

What we are doing: Enhance the pipeline to return provenance and a rough confidence score; schedule re-indexing so the vector store reflects nightly backups or on-demand changes.

# Simple reindexing script (rebuild index); intended for cron or CI trigger
cat > ~/nhprep_rag/reindex.sh << 'SH'
#!/bin/bash
# Rebuild embeddings and FAISS index (safe to run overnight)
python3 ~/nhprep_rag/chunk_documents.py
python3 ~/nhprep_rag/build_faiss_index.py
echo "Reindex completed at $(date)"
SH
chmod +x ~/nhprep_rag/reindex.sh

# Example cron line to run nightly at 02:00 (edit with crontab -e)
# 0 2 * * * /home/youruser/nhprep_rag/reindex.sh >> /home/youruser/nhprep_rag/reindex.log 2>&1

What just happened:

  • We created a safe reindexing script that re-runs chunking and index building. Scheduling this via cron ensures your RAG system uses up-to-date configs and runbooks.
  • In the RAG answer script, you can compute similarity scores returned by the retriever to present a confidence metric (higher similarity → higher provenance confidence).

Real-world note: For high-change environments, trigger re-indexing via your configuration management system (when configs push to Git) rather than time-based cron for minimal staleness.

Verify:

# Trigger reindex manually
~/nhprep_rag/reindex.sh
tail -n 20 ~/nhprep_rag/reindex.log

Expected output:

Reindex completed at Thu Apr  1 02:00:05 UTC 2026
# reindex.log entries:
Created chunks in: /home/youruser/nhprep_rag/chunks
FAISS index built and saved.
Reindex completed at Thu Apr  1 02:00:05 UTC 2026

Verification Checklist

  • Check 1: Corpus chunking completed — verify ls ~/nhprep_rag/chunks shows chunk files.
    • How to verify: ls -l ~/nhprep_rag/chunks (expected non-empty list of chunk files).
  • Check 2: FAISS index built and persisted — verify ~/nhprep_rag/faiss_index contains index files and printed vector count.
    • How to verify: python3 - <<'PY' from langchain.vectorstores import FAISS; index=FAISS.load_local('~/nhprep_rag/faiss_index', embeddings=None); print(index.index.ntotal) PY (expected integer > 0).
  • Check 3: RAG answering returns cited sources — run python3 ~/nhprep_rag/rag_answer.py and confirm output includes filenames in brackets and context snippets.
    • How to verify: output should show an "Answer:" block followed by source filenames and snippets.

Common Mistakes

SymptomCauseFix
Very short or irrelevant search resultsChunk size too small or no overlap; embeddings miss contextIncrease chunk_size and chunk_overlap in chunk_documents.py; reindex
Answers that hallucinate or make stuff upGeneration prompt lacks instruction to only use provided contextUpdate prompt template to explicitly require using only the context and to cite sources
Index build fails or is emptyEmbedding library not installed or wrong model nameEnsure required packages are installed: pip install langchain faiss-cpu sentence-transformers; use valid model name
Stale answers after config changesIndex not re-built after document updatesRun the reindex.sh script after updates or trigger via automation (CI/CM tool)

Key Takeaways

  • RAG grounds generative responses by combining semantic retrieval (embeddings + vector search) with an LLM — this dramatically reduces hallucination when answering operational network questions.
  • Proper chunking, metadata (source, domain lab.nhprep.com, org NHPREP), and prompt engineering (force citation and “use only provided context”) are essential for trustworthy answers.
  • Persist and automate re-indexing (cron or event-triggered) so your RAG system reflects the latest configs and runbooks.
  • In production, prefer local or controlled embedding models for sensitive network documents; store API keys and secrets securely (never hard-code them). Always present source filenames and snippets so engineers can verify the answer against the original documentation.

Warning: Never expose unredacted sensitive credentials in your documentation corpus. If test data includes passwords for labs, mark them clearly or scrub them. For example, use the prescribed lab password placeholder Lab@123 only in test docs and not in production configs.


This lesson showed how to transform existing network documentation into a RAG system that provides accurate, cited answers for NetOps staff. In the next lesson we will integrate RAG responses into an incident management workflow and explore conversational RAG with historical context.