Build a GPT-4 RAG Chatbot with n8n and Qdrant: The Complete Guide // Step-by-Step AI Agent Tutorial
This guide goes beyond the basics: you'll get the architecture decisions, working code snippets, prompt engineering strategies, and production-ready insights to deploy a chatbot that actually performs.
A RAG (Retrieval-Augmented Generation) chatbot combines a vector database for precise semantic retrieval with a powerful LLM like GPT-4 to deliver grounded, factual answers — and n8n lets you wire this entire pipeline together visually, without writing complex application code.
Why RAG Beats Plain LLMs
Standard LLM chatbots are limited to their training cutoff and prone to hallucinations when asked domain-specific questions. RAG solves this by injecting real, retrieved context into every prompt. The LLM doesn't need to know your product catalog — it just needs to read the relevant chunk you hand it.
The three-layer architecture is simple:
- Ingestion layer — your data is chunked, embedded, and stored in a vector DB
- Retrieval layer — user queries are embedded and matched semantically to stored chunks
- Generation layer — the LLM synthesizes an answer from the retrieved context
This separation makes each component swappable.
Changing your LLM from GPT-4 to Claude or Mistral doesn't break your vector index. Adding a new data source only requires re-running the ingestion workflow.
The Stack
| Component | Role | Recommended Option |
|---|---|---|
| n8n | Workflow orchestration | Cloud or self-hosted (Docker) |
| OpenAI GPT-4o | Response generation | gpt-4o or gpt-4o-mini |
| Qdrant | Vector database | Qdrant Cloud (free tier available) |
| OpenAI Embeddings | Text-to-vector conversion | text-embedding-3-small (1536-dim) |
| n8n Window Buffer Memory | Conversation history | Built-in LangChain node |
Dimension planning matters: if you later switch embedding models (e.g., fromtext-embedding-3-smallat 1536 dimensions to Voyage AI'svoyage-3at 1024 dimensions), you must destroy and recreate your Qdrant collection entirely. Choose your embedding model before you start indexing.
Phase 1 — Data Ingestion Pipeline
This is the foundation. Before the chatbot can answer anything, your data must be chunked, embedded, and stored.
Step 1: Prepare and Enrich Your Data
Raw text without context produces mediocre retrieval. The secret is metadata enrichment: prepend structured metadata to each chunk before embedding, so the vector captures both semantic content and context.
For an e-commerce product catalog, a pre-processing function in an n8n Code node would look like this:
// n8n Code Node — Enrichment before embedding
const items = $input.all();
return items.map(item => {
const product = item.json;
const enrichedContent =
`Product: ${product.name}\n` +
`Category: ${product.category}\n` +
`Sizes Available: ${product.sizes.join(', ')}\n` +
`Brand: ${product.brand}\n` +
`Material: ${product.material}\n` +
`Price: €${product.price}\n` +
`Colors: ${product.colors.join(', ')}\n\n` +
product.description;
return {
json: {
...product,
enrichedContent,
metadata: {
product_id: product.id,
category: product.category,
brand: product.brand,
sizes: product.sizes,
price: product.price,
in_stock: product.in_stock
}
}
};
});
Without this enrichment, a chunk that says "available in cotton blend" carries no information about which product, brand, or price range it refers to. With it, the embedding model captures the full semantic context.
Step 2: The n8n Ingestion Workflow Nodes
Connect these nodes in sequence:
- Manual Trigger (or HTTP Request to your product API)
- Code Node — enrichment function above
- Qdrant Vector Store (Operation: Insert Documents)
- Connect Embeddings OpenAI →
text-embedding-3-small - Connect Default Data Loader
- Connect Recursive Character Text Splitter → Chunk size:
1500, Overlap:300
- Connect Embeddings OpenAI →
Step 3: Create the Qdrant Collection
Before running the workflow, create the collection via the Qdrant REST API or dashboard. Here's the API call to do it programmatically:
import httpx
QDRANT_URL = "https://your-cluster.qdrant.io"
QDRANT_API_KEY = "your-qdrant-api-key"
# text-embedding-3-small produces 1536-dimensional vectors
payload = {
"vectors": {
"size": 1536,
"distance": "Cosine"
}
}
response = httpx.put(
f"{QDRANT_URL}/collections/ecommerce_products",
headers={"api-key": QDRANT_API_KEY},
json=payload
)
print(response.json()) # {"result": true, "status": "ok"}
Or alternatively, verify your collection exists using Python before triggering the n8n workflow:
import httpx
def check_collection(collection_name: str) -> dict:
r = httpx.get(
f"{QDRANT_URL}/collections/{collection_name}",
headers={"api-key": QDRANT_API_KEY}
)
return r.json()
info = check_collection("ecommerce_products")
print(f"Vectors count: {info['result']['vectors_count']}")
print(f"Indexed vectors: {info['result']['indexed_vectors_count']}")
⚠️ Common Qdrant + n8n error:400 Bad Request: Wrong input: Not existing vector name. This means you created the collection with a named vector ("default") but n8n expects an unnamed vector. Always create the collection using the simple"vectors"key (not"named_vectors"), as shown above.
Phase 2 — The Chat Workflow
With data indexed, you build the retrieval and generation pipeline. The entire conversational RAG system fits in 6 nodes in n8n.
[Chat Trigger]
↓
[AI Agent] ←→ [OpenAI Chat Model: gpt-4o]
↕ ↕
[Window Buffer [Vector Store Tool]
Memory] ↓
[Qdrant Vector Store: Retrieve]
↓
[Embeddings OpenAI]
Step 4: Configure the AI Agent Node
In the AI Agent node, set the Agent Type to Tools Agent and define a system prompt that instructs the model to stay grounded in retrieved data:
You are an expert e-commerce assistant for [Brand Name]. RULES: 1. ALWAYS use the product_search tool before answering any product question. 2. Only recommend products that exist in the retrieved results. Never invent products. 3. If a product is not found in the vector store, say: "I couldn't find that in our current catalog." 4. When recommending products, always include: name, available sizes, price, and a direct link if available. 5. Keep responses concise and helpful. Use bullet points for product lists. Your tone is friendly, knowledgeable, and focused on helping users find the right fit.
Step 5: Configure the Vector Store Tool
Connect a Vector Store Tool node to the AI Agent. This is what transforms a standard chatbot into a RAG chatbot.
Configure:
- Tool Name:
product_search - Description:
Use this tool to search the product catalog. Input a query describing what the user is looking for, including size, category, or style preferences. - Limit results:
8(retrieve top 8 most relevant chunks) - Operation Mode: Retrieve Documents (For Agent/Chain)
Connect the Qdrant Vector Store and Embeddings OpenAI nodes to this tool.
Step 6: Add Conversation Memory
Connect a Window Buffer Memory node to the AI Agent. Set the buffer size to 20 messages. This enables multi-turn conversations: the user can say "what about size L?" and the agent understands it refers to the previous query without needing full context re-injection each time.
Phase 3 — Practical E-commerce Example
Here is the full flow for the use case: "I wear size M — which items are in stock?"
- User message arrives at the Chat Trigger:
"I wear size M. What t-shirts do you have in stock?" - The AI Agent decides to call the
product_searchtool with query:"size M t-shirts in stock" - The Embeddings OpenAI node converts that query to a 1536-dim vector
- Qdrant performs cosine similarity search, returning 8 product chunks
- The AI Agent feeds those 8 chunks + the original question to GPT-4o
- GPT-4o generates a structured, accurate response like:
Here are the t-shirts available in size M: • **Essential Cotton Tee** – White, Black, Navy | €29 | 100% organic cotton • **Premium Slim Fit** – Olive, Burgundy | €45 | Merino wool blend • **Graphic Print Series** – 3 designs available | €35 | Recycled polyester All items are currently in stock. Would you like more details on any of these?
The chatbot only surfaces products that actually exist in the vector index — no hallucinations.
Phase 4 — Hybrid Search and Prompt Engineering
Hybrid Search for Better Precision
Pure vector search can miss exact keyword matches (e.g., a specific SKU code or brand name).
Hybrid search combines semantic similarity with keyword filtering. In Qdrant, you can apply payload filters at retrieval time using the n8n HTTP Request node to call Qdrant's API directly:
import httpx
def hybrid_search(query_vector: list, size_filter: str, top_k: int = 8) -> list:
"""Combine semantic search with metadata filtering."""
payload = {
"vector": query_vector,
"limit": top_k,
"with_payload": True,
"filter": {
"must": [
{
"key": "metadata.sizes",
"match": {"any": [size_filter]}
},
{
"key": "metadata.in_stock",
"match": {"value": True}
}
]
}
}
response = httpx.post(
f"{QDRANT_URL}/collections/ecommerce_products/points/search",
headers={"api-key": QDRANT_API_KEY},
json=payload
)
return response.json()["result"]
# Example: find size-M items above semantic threshold
results = hybrid_search(
query_vector=embed("cotton t-shirt casual"),
size_filter="M",
top_k=8
)
This approach ensures retrieved products actually come in the user's size, not just semantically related items that happen to mention size differently.
Advanced Prompt Engineering
Beyond the system prompt, structure the context injection explicitly so GPT-4 treats retrieved data as authoritative:
# Python equivalent of what the AI Agent constructs internally
def build_rag_prompt(user_query: str, retrieved_chunks: list[dict]) -> str:
context_block = "\n\n---\n\n".join([
f"[Product {i+1}]\n{chunk['payload']['enrichedContent']}"
for i, chunk in enumerate(retrieved_chunks)
])
return f"""RETRIEVED PRODUCT DATA (use ONLY this data for recommendations):
{context_block}
---
USER QUESTION: {user_query}
IMPORTANT: Base your answer exclusively on the product data above.
If no product matches the query, say so clearly."""
This pattern — called context grounding — dramatically reduces hallucinations by making the boundary between retrieved knowledge and LLM inference explicit.n8n+1
Production Considerations
Moving from a demo to a live chatbot requires addressing several factors:
- Rate limits: OpenAI's
text-embedding-3-smallhas TPM limits. For bulk indexing, batch your embedding calls (8–16 chunks per request) and implement exponential backoff. At Voyage AI's free tier (3 RPM), 2,500 chunks can take over an hour without batching - Re-indexing strategy: Add a webhook trigger in the ingestion workflow so it fires automatically when your product catalog is updated — no manual re-runs needed
- Collection recreation: If you ever change embedding models, you must drop and recreate the Qdrant collection with the new vector dimensions. There is no migration path
- Retrieval evaluation: Test your pipeline by asking known questions and checking whether the correct chunks are in the top-5 results. If precision is poor, experiment with smaller chunk sizes (
800characters) or higher overlap (200characters) - Temperature settings: Use
temperature: 0.2–0.3for factual product queries. Higher temperatures increase creativity but reduce grounding accuracy
Enabling Auto-Reindexing in n8n
Here is the n8n webhook trigger configuration that fires the ingestion workflow whenever your catalog updates:
// n8n Code Node — triggered by webhook from your e-commerce platform
// Payload example from Shopify/WooCommerce product update event
const event = $input.first().json;
// Only reindex if product data changed
const relevantFields = ['title', 'description', 'variants', 'tags', 'status'];
const hasChanges = relevantFields.some(field => event.changed_fields?.includes(field));
if (!hasChanges) {
return [{ json: { action: 'skip', reason: 'No relevant fields changed' } }];
}
return [{
json: {
action: 'reindex',
product_id: event.id,
product_data: {
name: event.title,
description: event.body_html?.replace(/<[^>]*>/g, ''), // strip HTML
category: event.product_type,
sizes: event.variants.map(v => v.option1).filter(Boolean),
price: parseFloat(event.variants[0]?.price || 0),
brand: event.vendor,
in_stock: event.variants.some(v => v.inventory_quantity > 0),
colors: event.options.find(o => o.name === 'Color')?.values || []
}
}
}];
Connect this Code node output directly into the Qdrant Vector Store node with operation mode set to Upsert Documents to handle both new and updated products.
Key Architectural Takeaways
A production-grade RAG chatbot on n8n, GPT-4, and Qdrant delivers a system that is modular and component-swappable, grounded in real data with minimal hallucination risk, and scalable to thousands of indexed documents without architecture changes.
The most impactful improvements, in order of ROI, are:
- metadata enrichment before embedding
- hybrid search with payload filtering
- a well-structured system prompt with explicit grounding instructions.