๐ ๏ธ Debugging Story
The Route Exists. So Why Is It Returning 404?
A FastAPI + Ollama Deep Dive ๐
I finally got my mini AI-RAG and chatbot service talking to my app. Still a prototype โ text only, nothing fancy to look at. But when it all clicked together, that feeling made every late night worth it.
โก FastAPI๐ณ Docker๐ค Ollamaโ๏ธ Cloud Run
๐๏ธ Service Architecture
๐ฑ Appโโ๏ธ Cloud Runโ๐ Cloudflareโ๐ Home Routerโ๐ฅ๏ธ Prod Serverโ๐ป Dev Server (RTX 3060)
๐ฐAs the layers stacked up, strange things started happening
iptables + NordVPN + Docker were interfering with each other โ packets were vanishing, and tcpdump was the only thing giving me any direction. My MacBook’s SSH tunnel kept dropping and reconnecting, and at some point the dev server started silently swallowing requests. Nothing in the logs, server still running. Rebooted the MacBook. Fixed immediately.
Once I got through all that, a 404 was waiting for me.
๐จThe route exists. So why 404?
POST /internal/llm/chat kept returning 404 inside Docker while working perfectly on the dev server. I printed the route table directly โ the endpoint was clearly registered. It wasn’t a proxy issue either, since requests were showing up in the FastAPI logs.
๐ก From inside the container (127.0.0.1): 401 Unauthorized. From outside via nginx: 404 Not Found. Same container, same process, different results.
๐The key clue: response size
The response sizes in the nginx access log looked wrong.
โ ๏ธ FastAPI default 404 {"detail":"Not Found"} = 22 bytes
Actual response sizes = 64โ89 bytes
Something else was generating that 404.
Working backwards, it was an Ollama error message:
{“detail”:”model ‘llama3’ not found, try pulling it first”}
๐ต๏ธRoot cause: upstream error propagation
A client service was reading OLLAMA_MODEL=llama3 from GCP Secret Manager and passing it in the request body. The model actually installed was llama3.1:8b. Ollama’s 404 was propagating straight through FastAPI to the client โ making it look like the route didn’t exist. Not a routing problem. An upstream error propagation problem.
๐คDebugging with AI
I had been relying on AI to analyze the flood of logs. At some point, it started nudging me toward tearing down the architecture itself. I stepped back, worked through it with Claude from scratch, and eventually tracked down the real cause.
AI compressed what could have been weeks of debugging into days. But holding the design together was still a human job. The more network layers you add, the easier it is for AI to lose the thread too.
๐What I took away
1๏ธโฃ
Check the response size
22 bytes vs 64 bytes. Two 404s can mean completely different things. That single number was the decisive clue.
2๏ธโฃ
404 doesn’t always mean routing failure
Upstream errors propagate silently. If you don’t handle them explicitly, they’re very hard to trace.
3๏ธโฃ
Wrap upstream errors as 502/503
Returning internal errors and routing errors with the same status code multiplies your debugging time significantly.
4๏ธโฃ
When nothing makes sense โ reboot. Seriously.
Sometimes the most powerful debugging tool is turning it off and back on. Your mental health will thank you.
When the logs go quiet, widen your view. ๐ญ
And when nothing makes sense โ reboot first.