RAG-as-a-Service — Riley Sklar

What I learned

A RAG pipeline is mostly ingestion engineering and operational discipline — the retrieval-augmented generation part is the smallest, most-blogged-about bit. Most of the real work is in: how you respect robots.txt, how you rate-limit politely, how you partition vector namespaces so a tenant’s data doesn’t bleed into another, and how you log enough to debug a bad embedding two weeks later.

Containerizing the whole thing (local dev = prod) is the difference between “demo on my laptop” and “I can hand this to someone else and they can ship it.”

What I did

Built recursive web scraping with robots.txt + rate-limit awareness and structured logging at every step.
Designed namespace-based vector partitioning in Pinecone so multiple ingestion targets coexist without leaking.
Embedded and indexed content with OpenAI embeddings via LangChain; served retrieval + Q&A through a Python API.
Containerized with Docker / Docker Compose for reproducible local dev and production parity.
Deployed to Fly.io with a live API endpoint and zero-downtime updates.

What I shipped

A live RAG service: scrape → chunk → embed → retrieve → answer, exposed as a Q&A API endpoint on Fly.io. The whole pipeline is reproducible end-to-end via Docker, with namespace isolation, polite scraping, and structured observability built in.

What I learned

What I did

What I shipped

Building something AI-first?