Artificial Intelligence

Build a GPT-4 RAG Chatbot with n8n and Qdrant: The Complete Guide // Step-by-Step AI Agent Tutorial

This guide goes beyond the basics: you'll get the architecture decisions, working code snippets, prompt engineering strategies, and production-ready insights to deploy a chatbot that actually performs.

Alex Genovese

18 Feb 2026 • 7 min read

Photo by Malgorzata Bujalska / Unsplash

A RAG (Retrieval-Augmented Generation) chatbot combines a vector database for precise semantic retrieval with a powerful LLM like GPT-4 to deliver grounded, factual answers — and n8n lets you wire this entire pipeline together visually, without writing complex application code.

Why RAG Beats Plain LLMs

Standard LLM chatbots are limited to their training cutoff and prone to hallucinations when asked domain-specific questions. RAG solves this by injecting real, retrieved context into every prompt. The LLM doesn't need to know your product catalog — it just needs to read the relevant chunk you hand it.

The three-layer architecture is simple:

Ingestion layer — your data is chunked, embedded, and stored in a vector DB
Retrieval layer — user queries are embedded and matched semantically to stored chunks
Generation layer — the LLM synthesizes an answer from the retrieved context

This separation makes each component swappable.

Changing your LLM from GPT-4 to Claude or Mistral doesn't break your vector index. Adding a new data source only requires re-running the ingestion workflow.

The Stack

Component	Role	Recommended Option
n8n	Workflow orchestration	Cloud or self-hosted (Docker)
OpenAI GPT-4o	Response generation	`gpt-4o` or `gpt-4o-mini`
Qdrant	Vector database	Qdrant Cloud (free tier available)
OpenAI Embeddings	Text-to-vector conversion	`text-embedding-3-small` (1536-dim)
n8n Window Buffer Memory	Conversation history	Built-in LangChain node

Dimension planning matters: if you later switch embedding models (e.g., from text-embedding-3-small at 1536 dimensions to Voyage AI's voyage-3 at 1024 dimensions), you must destroy and recreate your Qdrant collection entirely. Choose your embedding model before you start indexing.

Phase 1 — Data Ingestion Pipeline

This is the foundation. Before the chatbot can answer anything, your data must be chunked, embedded, and stored.

Step 1: Prepare and Enrich Your Data

Raw text without context produces mediocre retrieval. The secret is metadata enrichment: prepend structured metadata to each chunk before embedding, so the vector captures both semantic content and context.

For an e-commerce product catalog, a pre-processing function in an n8n Code node would look like this:

// n8n Code Node — Enrichment before embedding
const items = $input.all();

return items.map(item => {
  const product = item.json;

  const enrichedContent =
    `Product: ${product.name}\n` +
    `Category: ${product.category}\n` +
    `Sizes Available: ${product.sizes.join(', ')}\n` +
    `Brand: ${product.brand}\n` +
    `Material: ${product.material}\n` +
    `Price: €${product.price}\n` +
    `Colors: ${product.colors.join(', ')}\n\n` +
    product.description;

  return {
    json: {
      ...product,
      enrichedContent,
      metadata: {
        product_id: product.id,
        category: product.category,
        brand: product.brand,
        sizes: product.sizes,
        price: product.price,
        in_stock: product.in_stock
      }
    }
  };
});

Without this enrichment, a chunk that says "available in cotton blend" carries no information about which product, brand, or price range it refers to. With it, the embedding model captures the full semantic context.

Step 2: The n8n Ingestion Workflow Nodes

Connect these nodes in sequence:

Manual Trigger (or HTTP Request to your product API)
Code Node — enrichment function above
Qdrant Vector Store (Operation: Insert Documents)
- Connect Embeddings OpenAI → text-embedding-3-small
- Connect Default Data Loader
- Connect Recursive Character Text Splitter → Chunk size: 1500, Overlap: 300

Step 3: Create the Qdrant Collection

Before running the workflow, create the collection via the Qdrant REST API or dashboard. Here's the API call to do it programmatically:

import httpx

QDRANT_URL = "https://your-cluster.qdrant.io"
QDRANT_API_KEY = "your-qdrant-api-key"

# text-embedding-3-small produces 1536-dimensional vectors
payload = {
    "vectors": {
        "size": 1536,
        "distance": "Cosine"
    }
}

response = httpx.put(
    f"{QDRANT_URL}/collections/ecommerce_products",
    headers={"api-key": QDRANT_API_KEY},
    json=payload
)
print(response.json())  # {"result": true, "status": "ok"}

Or alternatively, verify your collection exists using Python before triggering the n8n workflow:

import httpx

def check_collection(collection_name: str) -> dict:
    r = httpx.get(
        f"{QDRANT_URL}/collections/{collection_name}",
        headers={"api-key": QDRANT_API_KEY}
    )
    return r.json()

info = check_collection("ecommerce_products")
print(f"Vectors count: {info['result']['vectors_count']}")
print(f"Indexed vectors: {info['result']['indexed_vectors_count']}")

⚠️ Common Qdrant + n8n error: 400 Bad Request: Wrong input: Not existing vector name. This means you created the collection with a named vector ("default") but n8n expects an unnamed vector. Always create the collection using the simple "vectors" key (not "named_vectors"), as shown above.

Phase 2 — The Chat Workflow

With data indexed, you build the retrieval and generation pipeline. The entire conversational RAG system fits in 6 nodes in n8n.

  [Chat Trigger]
      ↓
 [AI Agent] ←→ [OpenAI Chat Model: gpt-4o]
      ↕               ↕
[Window Buffer    [Vector Store Tool]
   Memory]              ↓
                 [Qdrant Vector Store: Retrieve]
                        ↓
                 [Embeddings OpenAI]

Step 4: Configure the AI Agent Node

In the AI Agent node, set the Agent Type to Tools Agent and define a system prompt that instructs the model to stay grounded in retrieved data:

You are an expert e-commerce assistant for [Brand Name].

RULES:
1. ALWAYS use the product_search tool before answering any product question.
2. Only recommend products that exist in the retrieved results. Never invent products.
3. If a product is not found in the vector store, say: "I couldn't find that in our current catalog."
4. When recommending products, always include: name, available sizes, price, and a direct link if available.
5. Keep responses concise and helpful. Use bullet points for product lists.

Your tone is friendly, knowledgeable, and focused on helping users find the right fit.

Step 5: Configure the Vector Store Tool

Connect a Vector Store Tool node to the AI Agent. This is what transforms a standard chatbot into a RAG chatbot.

Configure:

Tool Name: product_search
Description: Use this tool to search the product catalog. Input a query describing what the user is looking for, including size, category, or style preferences.
Limit results: 8 (retrieve top 8 most relevant chunks)
Operation Mode: Retrieve Documents (For Agent/Chain)

Connect the Qdrant Vector Store and Embeddings OpenAI nodes to this tool.

Step 6: Add Conversation Memory

Connect a Window Buffer Memory node to the AI Agent. Set the buffer size to 20 messages. This enables multi-turn conversations: the user can say "what about size L?" and the agent understands it refers to the previous query without needing full context re-injection each time.

Phase 3 — Practical E-commerce Example

Here is the full flow for the use case: "I wear size M — which items are in stock?"

User message arrives at the Chat Trigger: "I wear size M. What t-shirts do you have in stock?"
The AI Agent decides to call the product_search tool with query: "size M t-shirts in stock"
The Embeddings OpenAI node converts that query to a 1536-dim vector
Qdrant performs cosine similarity search, returning 8 product chunks
The AI Agent feeds those 8 chunks + the original question to GPT-4o
GPT-4o generates a structured, accurate response like:

Here are the t-shirts available in size M:

• **Essential Cotton Tee** – White, Black, Navy | €29 | 100% organic cotton
• **Premium Slim Fit** – Olive, Burgundy | €45 | Merino wool blend
• **Graphic Print Series** – 3 designs available | €35 | Recycled polyester

All items are currently in stock. Would you like more details on any of these?

The chatbot only surfaces products that actually exist in the vector index — no hallucinations.

Phase 4 — Hybrid Search and Prompt Engineering

Hybrid Search for Better Precision

Pure vector search can miss exact keyword matches (e.g., a specific SKU code or brand name).

Hybrid search combines semantic similarity with keyword filtering. In Qdrant, you can apply payload filters at retrieval time using the n8n HTTP Request node to call Qdrant's API directly:

import httpx

def hybrid_search(query_vector: list, size_filter: str, top_k: int = 8) -> list:
    """Combine semantic search with metadata filtering."""
    
    payload = {
        "vector": query_vector,
        "limit": top_k,
        "with_payload": True,
        "filter": {
            "must": [
                {
                    "key": "metadata.sizes",
                    "match": {"any": [size_filter]}
                },
                {
                    "key": "metadata.in_stock",
                    "match": {"value": True}
                }
            ]
        }
    }
    
    response = httpx.post(
        f"{QDRANT_URL}/collections/ecommerce_products/points/search",
        headers={"api-key": QDRANT_API_KEY},
        json=payload
    )
    
    return response.json()["result"]

# Example: find size-M items above semantic threshold
results = hybrid_search(
    query_vector=embed("cotton t-shirt casual"),
    size_filter="M",
    top_k=8
)

This approach ensures retrieved products actually come in the user's size, not just semantically related items that happen to mention size differently.

Advanced Prompt Engineering

Beyond the system prompt, structure the context injection explicitly so GPT-4 treats retrieved data as authoritative:

# Python equivalent of what the AI Agent constructs internally
def build_rag_prompt(user_query: str, retrieved_chunks: list[dict]) -> str:
    context_block = "\n\n---\n\n".join([
        f"[Product {i+1}]\n{chunk['payload']['enrichedContent']}"
        for i, chunk in enumerate(retrieved_chunks)
    ])
    
    return f"""RETRIEVED PRODUCT DATA (use ONLY this data for recommendations):

{context_block}

---

USER QUESTION: {user_query}

IMPORTANT: Base your answer exclusively on the product data above. 
If no product matches the query, say so clearly."""

This pattern — called context grounding — dramatically reduces hallucinations by making the boundary between retrieved knowledge and LLM inference explicit.n8n+1

Production Considerations

Moving from a demo to a live chatbot requires addressing several factors:

Rate limits: OpenAI's text-embedding-3-small has TPM limits. For bulk indexing, batch your embedding calls (8–16 chunks per request) and implement exponential backoff. At Voyage AI's free tier (3 RPM), 2,500 chunks can take over an hour without batching
Re-indexing strategy: Add a webhook trigger in the ingestion workflow so it fires automatically when your product catalog is updated — no manual re-runs needed
Collection recreation: If you ever change embedding models, you must drop and recreate the Qdrant collection with the new vector dimensions. There is no migration path
Retrieval evaluation: Test your pipeline by asking known questions and checking whether the correct chunks are in the top-5 results. If precision is poor, experiment with smaller chunk sizes (800 characters) or higher overlap (200 characters)
Temperature settings: Use temperature: 0.2–0.3 for factual product queries. Higher temperatures increase creativity but reduce grounding accuracy

Enabling Auto-Reindexing in n8n

Here is the n8n webhook trigger configuration that fires the ingestion workflow whenever your catalog updates:

// n8n Code Node — triggered by webhook from your e-commerce platform
// Payload example from Shopify/WooCommerce product update event

const event = $input.first().json;

// Only reindex if product data changed
const relevantFields = ['title', 'description', 'variants', 'tags', 'status'];
const hasChanges = relevantFields.some(field => event.changed_fields?.includes(field));

if (!hasChanges) {
  return [{ json: { action: 'skip', reason: 'No relevant fields changed' } }];
}

return [{
  json: {
    action: 'reindex',
    product_id: event.id,
    product_data: {
      name: event.title,
      description: event.body_html?.replace(/<[^>]*>/g, ''), // strip HTML
      category: event.product_type,
      sizes: event.variants.map(v => v.option1).filter(Boolean),
      price: parseFloat(event.variants[0]?.price || 0),
      brand: event.vendor,
      in_stock: event.variants.some(v => v.inventory_quantity > 0),
      colors: event.options.find(o => o.name === 'Color')?.values || []
    }
  }
}];

Connect this Code node output directly into the Qdrant Vector Store node with operation mode set to Upsert Documents to handle both new and updated products.

Key Architectural Takeaways

A production-grade RAG chatbot on n8n, GPT-4, and Qdrant delivers a system that is modular and component-swappable, grounded in real data with minimal hallucination risk, and scalable to thousands of indexed documents without architecture changes.

The most impactful improvements, in order of ROI, are:

metadata enrichment before embedding
hybrid search with payload filtering
a well-structured system prompt with explicit grounding instructions.

These three changes alone will take a basic RAG chatbot from "sometimes useful" to "reliably accurate" — which is the difference between a demo and a tool your customers will trust.