ENGINEERING | 6 min | May 20, 2026

RAG is a hiring problem before it is an infra problem

Most retrieval failures trace back to who built the pipeline, not which vector database they chose. A field guide.

By Devlyn

Teams spend weeks comparing vector databases and minutes deciding who builds the retrieval pipeline. That ratio is backwards.

The failure is almost never the database

When a RAG system answers confidently and wrong, the cause is rarely the store. It is usually upstream:

Chunking that splits a table in half or merges two unrelated sections.
Embeddings chosen by popularity, not by how they perform on your data.
No reranking, so the top-k is noisy and the model grounds on the wrong passage.
Context assembly that stuffs the window instead of selecting what matters.
No relevance metric, so nobody can tell whether a change helped or hurt.
Permission gaps where the model sees documents the user should not see, or misses documents the user needs.

Swapping pgvector for a managed service fixes none of these.

Direct answer: RAG is a hiring problem because retrieval quality depends on engineering judgment, measurement and data handling. A vector database stores embeddings. It does not decide how to chunk messy documents, handle permissions, rerank results, cite sources, refuse weak answers or prove relevance improved.

What a context engineer actually owns

A strong RAG & Context Engineer treats retrieval quality as a measurable engineering problem. They build a relevance eval set early, instrument grounding and citations from day one, and know that better context engineering - not a bigger model - fixes most accuracy gaps.

Their work usually spans five layers:

Layer	What can break	What good looks like
Ingestion	Stale, duplicated or malformed documents	Clean pipeline with freshness checks
Chunking	Wrong split size or lost tables	Chunks that preserve meaning and structure
Retrieval	Similar but irrelevant passages	Hybrid search, metadata filters and reranking
Context assembly	Too much or wrong context	Source selection tied to the user question
Answer behavior	No citations or weak refusal	Grounded answers with traceable evidence

That is why the title matters. If a general backend engineer builds the pipeline as a storage task, the first demo may look fine and the production workflow may still fail. The context engineer starts with the questions users actually ask, the sources that should answer them, and the metric that decides whether retrieval got better.

The hiring tell

Ask a candidate how they’d measure whether retrieval improved. If the answer is “we’d look at the outputs,” keep interviewing. If they reach for a labelled relevance set, hybrid search, and a reranking step before they touch the database question, you’ve found the person who can make the model answer from your data without making things up.

During a trial, ask them to bring three artifacts: a relevance eval set, a failure taxonomy and a source-grounding plan. The eval set shows whether they can measure. The failure taxonomy shows whether they understand how RAG breaks. The grounding plan shows whether the user can trust the answer after the demo is over.

The strongest signal is how they handle uncertainty. A good RAG engineer does not force an answer when the evidence is weak. They design refusal, clarification and citation behavior because trust is the product. Better retrieval is not just better search. It is the difference between an assistant your team can defend and an assistant your sales team is afraid to show customers.

Hire the AI engineer your roadmap actually needs.

Book a 30-minute role scope