[FastAPI + Ollama] Hunting the Real 404

🛠️ Debugging Story

The Route Exists. So Why Is It Returning 404?
A FastAPI + Ollama Deep Dive 🔍

I finally got my mini AI-RAG and chatbot service talking to my app. Still a prototype — text only, nothing fancy to look at. But when it all clicked together, that feeling made every late night worth it.

⚡ FastAPI🐳 Docker🤖 Ollama☁️ Cloud Run

🏗️ Service Architecture

📱 App→☁️ Cloud Run→🔒 Cloudflare→🏠 Home Router→🖥️ Prod Server→💻 Dev Server (RTX 3060)

😰As the layers stacked up, strange things started happening

iptables + NordVPN + Docker were interfering with each other — packets were vanishing, and tcpdump was the only thing giving me any direction. My MacBook’s SSH tunnel kept dropping and reconnecting, and at some point the dev server started silently swallowing requests. Nothing in the logs, server still running. Rebooted the MacBook. Fixed immediately.

Once I got through all that, a 404 was waiting for me.

🚨The route exists. So why 404?

POST /internal/llm/chat kept returning 404 inside Docker while working perfectly on the dev server. I printed the route table directly — the endpoint was clearly registered. It wasn’t a proxy issue either, since requests were showing up in the FastAPI logs.

💡 From inside the container (127.0.0.1): 401 Unauthorized. From outside via nginx: 404 Not Found. Same container, same process, different results.

🔎The key clue: response size

The response sizes in the nginx access log looked wrong.

⚠️ FastAPI default 404 {"detail":"Not Found"} = 22 bytes
Actual response sizes = 64–89 bytes
Something else was generating that 404.

Working backwards, it was an Ollama error message:

{“detail”:”model ‘llama3’ not found, try pulling it first”}

🕵️Root cause: upstream error propagation

A client service was reading OLLAMA_MODEL=llama3 from GCP Secret Manager and passing it in the request body. The model actually installed was llama3.1:8b. Ollama’s 404 was propagating straight through FastAPI to the client — making it look like the route didn’t exist. Not a routing problem. An upstream error propagation problem.

🤖Debugging with AI

I had been relying on AI to analyze the flood of logs. At some point, it started nudging me toward tearing down the architecture itself. I stepped back, worked through it with Claude from scratch, and eventually tracked down the real cause.

AI compressed what could have been weeks of debugging into days. But holding the design together was still a human job. The more network layers you add, the easier it is for AI to lose the thread too.

📝What I took away

1️⃣

Check the response size

22 bytes vs 64 bytes. Two 404s can mean completely different things. That single number was the decisive clue.

2️⃣

404 doesn’t always mean routing failure

Upstream errors propagate silently. If you don’t handle them explicitly, they’re very hard to trace.

3️⃣

Wrap upstream errors as 502/503

Returning internal errors and routing errors with the same status code multiplies your debugging time significantly.

4️⃣

When nothing makes sense — reboot. Seriously.

Sometimes the most powerful debugging tool is turning it off and back on. Your mental health will thank you.

When the logs go quiet, widen your view. 🔭
And when nothing makes sense — reboot first.

[FastAPI + Ollama] Hunting the Real 404

The Route Exists. So Why Is It Returning 404?
A FastAPI + Ollama Deep Dive 🔍

More posts

Setting Up an OpenCV CUDA Dev Environment with Docker and Debugging via VSCode

OpenCV CUDA 개발환경 Docker로 구축하고 VSCode에서 디버깅까지

내 WordPress 서버를 직접 해킹해봤다 — 셀프 보안 감사 후기