What I learned
A RAG pipeline is mostly ingestion engineering and operational discipline — the retrieval-augmented generation part is the smallest, most-blogged-about bit. Most of the real work is in: how you respect robots.txt, how you rate-limit politely, how you partition vector namespaces so a tenant’s data doesn’t bleed into another, and how you log enough to debug a bad embedding two weeks later.
Containerizing the whole thing (local dev = prod) is the difference between “demo on my laptop” and “I can hand this to someone else and they can ship it.”
What I did
- Built recursive web scraping with robots.txt + rate-limit awareness and structured logging at every step.
- Designed namespace-based vector partitioning in Pinecone so multiple ingestion targets coexist without leaking.
- Embedded and indexed content with OpenAI embeddings via LangChain; served retrieval + Q&A through a Python API.
- Containerized with Docker / Docker Compose for reproducible local dev and production parity.
- Deployed to Fly.io with a live API endpoint and zero-downtime updates.
What I shipped
A live RAG service: scrape → chunk → embed → retrieve → answer, exposed as a Q&A API endpoint on Fly.io. The whole pipeline is reproducible end-to-end via Docker, with namespace isolation, polite scraping, and structured observability built in.