Nov 23, 2025¶
this comment left me bitter and confused:
Well it took me 2 full-time weeks to properly implement a RAG-based system so that it found actually relevant data and did not hallucinate. Had to:
write an evaluation pipeline to automate quality testing
add a query rewriting step to explore more options during search
add hybrid BM-25+vector search with proper rank fusion
tune all the hyperparameters for best results (like weight bias for bm25 vs. vector, how many documents to retrieve for analysis, how to chunk documents based on semantics)
parallelize the search pipeline to decrease wait times
add moderation
add a reranker to find best candidates
add background embedding calculation of user documents
lots of failure cases to iron out so that the prompt worked for most cases
Each step would take me more than two weeks, and still have gaps. But when you’re working alone, and you’re the only judge of your work, it’s easy to overestimate its quality.