Published on: 2025-08-11
This guide demonstrates how to integrate Ollama with FastAPI and a self-hosted n8n workflow, enabling AI-powered question answering based on data scraped from Shopify documentation.
π Overview
The workflow leverages n8n for orchestrating tasks, FastAPI for processing requests, and Ollama for local AI model inference.
In this setup, FastAPI handles a single request that performs both embedding generation and chat-based answering, using search results gathered via the Serper API.
βοΈ Workflow Steps
1 Trigger via Webhook
A webhook initiates the process. On this workflow example:
http://localhost:5678/webhook-test/shopify-dev-scrape?question=implement+hydrogen+and+oxygen+for+theme
2 Format Search Query
JavaScript code formats the query to target searches exclusively on the shopify.dev website.
3 Search Shopify.dev
Executes a search query using Serper API (requires a free API key from serper.dev).
4 Get Top Result
Selects only the top search result and returns it as json.url.
5 Fetch Shopify Page
Retrieves the HTML content from the top search result.
6 HTML Extraction
Extracts specific content from the HTML body, saving it as datahtml.
7 HTML File Conversion
Converts datahtml into a file, which will be attached as a document for the βAsk Ollamaβ workflow.
8 Ask Ollama
Sends the processed HTML document to the FastAPI endpoint for answering.
Requires both FastAPI and Ollama to be running via ngrok, with updated forwarding URLs.
πΉ FastAPI Setup
Python Environment Setup (Using uv):
uv venv
.venv\Scripts\activate
uv pip install -r requirements.txt
requirements.txt
fastapi
uvicorn
torch
openai
ollama
lxml
python-multipart
numpy
aiollama.py
#aiollama.py
import torch
import ollama
import re
from fastapi import FastAPI, UploadFile, File, Form
from fastapi.responses import JSONResponse
from openai import OpenAI, OpenAIError
app = FastAPI()
client = OpenAI(
base_url='https://YOUR_FREE_PUBLIC_URL.ngrok-free.app/v1',
api_key='dolphin-llama3'
)
# πΉ Get relevant context
def get_relevant_context(query, vault_embeddings, vault_content, top_k=3):
print("β Generating embedding for query....")
input_embedding = ollama.embeddings(model='mxbai-embed-large', prompt=query)["embedding"]
print("β Query embedding done.")
input_tensor = torch.tensor(input_embedding).unsqueeze(0)
cos_scores = torch.cosine_similarity(input_tensor, vault_embeddings)
top_k = min(top_k, len(cos_scores))
top_indices = torch.topk(cos_scores, k=top_k)[1].tolist()
return [vault_content[i].strip() for i in top_indices]
# πΉ Main chat wrapper
def ollama_chat(
question,
system_message,
vault_embeddings_tensor,
vault_chunks,
conversation_history,
model_name="dolphin-llama3"
):
print("β Getting relevant context...")
relevant_context = get_relevant_context(question, vault_embeddings_tensor, vault_chunks)
print(f"β Got {len(relevant_context)} relevant chunks.")
context_str = "\n".join(relevant_context)
user_input_with_context = context_str + "\n\n" + question if relevant_context else question
conversation_history.append({
"role": "user",
"content": user_input_with_context
})
messages = [{"role": "system", "content": system_message}] + conversation_history
print("β Sending request to Ollama model...")
try:
response = client.chat.completions.create(
model=model_name,
messages=messages
)
except OpenAIError as e:
print("β OpenAIError:", e)
raise
except Exception as e:
print("β Unexpected Error:", e)
raise
print("β Received response from Ollama.")
# β
Append assistant's response to history
assistant_reply = response.choices[0].message.content
conversation_history.append({
"role": "assistant",
"content": assistant_reply
})
return {
"relevant_context": relevant_context,
"response": assistant_reply
}
# πΉ FastAPI Endpoint
@app.post("/ask")
async def ask_from_uploaded_html(file: UploadFile = File(...), question: str = Form(...)):
print("β Reading uploaded file...")
contents = await file.read()
print("β File read.")
text = re.sub(r'\s+', ' ', contents.decode("utf-8")).strip()
print("β Splitting into chunks...")
sentences = re.split(r'(?<=[.!?]) +', text)
chunks = []
current_chunk = ""
for sentence in sentences:
if len(current_chunk) + len(sentence) + 1 < 1000:
current_chunk += sentence + " "
else:
chunks.append(current_chunk.strip())
current_chunk = sentence + " "
if current_chunk:
chunks.append(current_chunk.strip())
print(f"β Created {len(chunks)} chunks.")
print("β Generating embeddings for chunks...")
vault_embeddings = []
for i, chunk in enumerate(chunks):
print(f" βͺ Embedding chunk {i+1}/{len(chunks)}")
response = ollama.embeddings(model='mxbai-embed-large', prompt=chunk)
print(f" β’ Vector: {response['embedding'][:10]}...") # print first 5 numbers only
vault_embeddings.append(response["embedding"])
vault_embeddings_tensor = torch.tensor(vault_embeddings)
print("β All embeddings generated.")
print("β Calling ollama_chat()...")
conversation_history = []
result = ollama_chat(
question=question,
system_message="You are a helpful assistant that extracts the most useful info from uploaded documents.",
vault_embeddings_tensor=vault_embeddings_tensor,
vault_chunks=chunks,
conversation_history=conversation_history,
model_name="dolphin-llama3"
)
print("β ollama_chat() complete.")
print("Final Response:", result["response"])
return {
"question": question,
"relevant_context": result["relevant_context"],
"response": result["response"]
}
Run the API:
uvicorn aiollama:app --reload
πΉ Ollama Setup
Pull required models:
ollama pull mxbai-embed-large
ollama pull dolphin-llama3
πΉ NGROK Setup
Ollama requires traffic policy configuration to accept requests from the FastAPI server. ngrok is used to tunnel requests securely during development.
Configure ngrok traffic policy (docs):
on_http_request:
- actions:
- type: add-headers
config:
headers:
host: localhost
9 Respond to User
The webhook returns the AI-generated answer based on the question.
Behind the python running
Embedding:

Chat API:

π Downloadable n8n Workflow
The n8n JSON file can be downloaded and customized. Update the configuration to match your environment. (N8N Config file)
π TL;DR
Integrating Ollama with FastAPI and n8n enables private, local AI workflows ideal for tasks like:
- Internal or confidential documentation search and Q&A
- Automating technical knowledge retrieval
- Building low-code AI integrations with n8n for domain-specific assistants
This approach combines local AI privacy, low-code orchestration, and flexible FastAPI endpoints for rapid AI-powered tool development.