Nov 23, 2025

this comment left me bitter and confused:

Well it took me 2 full-time weeks to properly implement a RAG-based system so that it found actually relevant data and did not hallucinate. Had to:

  • write an evaluation pipeline to automate quality testing

  • add a query rewriting step to explore more options during search

  • add hybrid BM-25+vector search with proper rank fusion

  • tune all the hyperparameters for best results (like weight bias for bm25 vs. vector, how many documents to retrieve for analysis, how to chunk documents based on semantics)

  • parallelize the search pipeline to decrease wait times

  • add moderation

  • add a reranker to find best candidates

  • add background embedding calculation of user documents

  • lots of failure cases to iron out so that the prompt worked for most cases

Each step would take me more than two weeks, and still have gaps. But when you’re working alone, and you’re the only judge of your work, it’s easy to overestimate its quality.