Skip to content
Riley Sklar
Back to case studies

Production RAG Pipeline

RAG-as-a-Service

Built and deployed a production RAG pipeline that scrapes target websites, chunks and embeds content, and serves Q&A over the indexed knowledge through a live API on Fly.io.

RAG Architecture LangChain + Pinecone FastAPI Docker / Fly.io
RAG-as-a-Service — production RAG pipeline on Fly.io

What I learned

A RAG pipeline is mostly ingestion engineering and operational discipline — the retrieval-augmented generation part is the smallest, most-blogged-about bit. Most of the real work is in: how you respect robots.txt, how you rate-limit politely, how you partition vector namespaces so a tenant’s data doesn’t bleed into another, and how you log enough to debug a bad embedding two weeks later.

Containerizing the whole thing (local dev = prod) is the difference between “demo on my laptop” and “I can hand this to someone else and they can ship it.”

What I did

  • Built recursive web scraping with robots.txt + rate-limit awareness and structured logging at every step.
  • Designed namespace-based vector partitioning in Pinecone so multiple ingestion targets coexist without leaking.
  • Embedded and indexed content with OpenAI embeddings via LangChain; served retrieval + Q&A through a Python API.
  • Containerized with Docker / Docker Compose for reproducible local dev and production parity.
  • Deployed to Fly.io with a live API endpoint and zero-downtime updates.

What I shipped

A live RAG service: scrape → chunk → embed → retrieve → answer, exposed as a Q&A API endpoint on Fly.io. The whole pipeline is reproducible end-to-end via Docker, with namespace isolation, polite scraping, and structured observability built in.

Get in touch

Building something AI-first?

I'm open to chat about the intersection of AI and web growth — GEO/AIO strategy, MCP and agentic architecture, marketing-engineering ops. Otherwise, just say hi.