Knowledge Retrieval: Routing Questions to the Right Search Method

TL;DR

A knowledge retrieval system is a router: four question types (lookup, meaning, connect-the-dots, whole-collection) each demand a different method (keyword, hybrid + rerank, multi-step, summarize-all). Pick the cheapest method that answers the question correctly, and escalate only when the question earns it. The common failure is treating every question as a meaning question, which returns fluent answers from the wrong source.

When people talk about how AI systems "look things up," the conversation almost always turns into an argument about tools. Vector databases versus keyword search. This product versus that one. Which one is fastest, which one is cheapest, which one is winning.

That argument misses the point, and it is the reason so many AI search systems end up fast, cheap, and confidently wrong.

The real decision is which kind of question you are actually asking. Get that right and the tool mostly picks itself. Get it wrong and no tool will save you.

The whole idea, in one sentence

A knowledge retrieval system is a router: the retrieval step underneath every RAG (retrieval-augmented generation) system. Its job is to take a question, figure out what kind of question it is, and send it down the path most likely to return a correct, useful answer.

You already do this without thinking. You look up a phone number differently than you research a vacation. You skim a contract differently than you total up a year of receipts. The method you reach for depends on the shape of the question. Good AI retrieval works the same way. Bad AI retrieval fails because it treats every question as if it were the same shape.

The four kinds of questions

Almost every question you can ask of a body of knowledge falls into one of four buckets. The bucket determines the method.

1. The lookup. You want one specific thing and you basically know it exists. A tracking number. The Wi-Fi password. A particular policy by its ID. There is exactly one right answer and everything else is wrong. You already know what you want and just need to grab it off the shelf.

2. The meaning question. You describe what you want in your own words, and the answer is written in different words. You ask "how do I get my money back" and the helpful document is titled "Returns and Exchanges." Nothing matches word-for-word. The system has to understand that you mean the same thing.

3. The connect-the-dots question. No single document holds the answer. You have to gather a few pieces and combine them. "Was the restaurant I went to last week cheaper than the one near my office?" needs two facts and a comparison. "Which of my subscriptions renews next, and for how much?" needs you to check several and pick one. The answer is assembled from parts. No single document hands it to you.

4. The whole-collection question. The answer is a property of the whole collection. Any single piece tells you almost nothing on its own. "What are the main complaints across these 500 reviews?" or "What position do all our contracts take on cancellations?" You cannot answer this by grabbing the top few results, because the answer only exists once you have looked at all of them. This is the bucket people forget most often, and it is the one that quietly breaks systems built only for the first three.

The skill is recognizing which bucket a question lands in before you go looking. That recognition step is the most important part of the whole system, and it is the part that gets skipped.

The methods, in plain terms

Now the tools. Each one is good at a specific kind of question and bad at others. None of them is "the best."

Keyword search matches the actual words in your question against the actual words in the documents. It is exact and literal. It is excellent for lookups, because if you search for an error code or a product number, you want the exact match. Its weakness is that it does not understand meaning. Search for "remote work" and it will miss the document that says "telecommuting."

Meaning search (you will hear this called "vector search" or "embeddings") does the opposite. It converts text into a representation of its meaning, so "remote work" and "telecommuting" land close together even though they share no words. This is what makes modern AI search feel smart. Its weakness is the mirror image of keyword search: it is bad at exact terms. Ask it for product code SKU-4471 and it will happily return SKU-4472, which looks almost right and is completely wrong. That "almost right" is dangerous, because nothing flags it as a miss.

Hybrid search does both and blends the results. It catches the meaning and respects the exact words. For most everyday meaning questions, this is the sensible default, precisely because real questions often contain both a concept and a specific term.

Reranking is a quality-control pass. After the first search pulls back a handful of candidates, a second, more careful model re-sorts them so the best answer rises to the top. Think of it as a quick search followed by a careful read.

Multi-step (or "agentic") retrieval is for the connect-the-dots questions. The system breaks a hard question into smaller ones, answers each, and combines them. It is powerful and it is the closest thing to reasoning. It is also the slowest and most expensive option by a wide margin, which matters more than it sounds like it should.

Graph search follows the relationships between things. It traverses connections like "who reported to whom" or "what depends on what." When your knowledge is a web of connections, this is the tool that walks the web.

Whole-collection methods answer the fourth kind of question by summarizing across everything, often in layers. For data in a spreadsheet or database this is a total or an average. For a pile of documents it is a heavier summarization process. The defining feature is coverage: the method touches every item in the collection.

The retrieval system as a router: classify the question first, then send it down the path that fits its shape.

Matching the method to the question

Put the two halves together and the whole framework collapses into something you can hold in your head:

The question is a...	Reach for...
Lookup (one known thing)	Keyword search, or a direct database query
Meaning question	Hybrid search, with reranking for quality
Connect-the-dots	Multi-step retrieval, graph traversal, or a query that does the combining
Whole-collection	Summarize across everything; do not grab a top few

That is the entire decision. Everything else (which specific database, which model, how it is tuned) is detail living inside a box. Choose the box first, then worry about its contents.

Where the buzzwords fit

You will hear a parade of branded names: hybrid RAG, graph RAG, agentic RAG, corrective RAG, multimodal RAG. Most of them are the methods you just met, wearing a label.

You will hear...	It is really...
Hybrid RAG	Hybrid search, for a meaning question
Graph RAG	Graph search, for a connect-the-dots question over related entities
Agentic RAG	Multi-step retrieval, for a connect-the-dots question
Corrective RAG	Reranking, plus a check that the retrieved sources actually support the answer

Four of those five are boxes you have already chosen, renamed. The framework tells you when each one earns its place, which is the part the label leaves out.

A matrix mapping five RAG variants to where they operate. Columns are the four question types (lookup, meaning, connect-the-dots, whole-collection) on a question-type axis, plus a separate modality column past a dotted divider. Hybrid RAG and Corrective RAG mark the meaning column, Graph RAG and Agentic RAG mark the connect-the-dots column, and Multimodal RAG marks only the modality column. — Four variants land inside the question-type axis you already know. Multimodal RAG marks a different axis entirely.

The fifth, multimodal RAG, is the exception, and it sits on a different axis. It is about the kind of material you are searching: images, audio, tables, video. The question type still works the same way. You classify the question as before, then run the matching method over a different medium.

The predictable failures

Knowing the framework is less useful than knowing how it fails, because the failures are predictable.

Treating every question as a meaning question. Meaning search is the shiny one, so it becomes the default for everything. Then lookups start returning near-misses, connect-the-dots questions get a single document that answers neither half, and nobody notices because the answers look fine.

Trusting a fluent answer. This is the big one. A retrieval system can be fast, cheap, and return the wrong source, and the AI on top will write a smooth, confident answer based on it. Fluency is not correctness. The only protection is to check whether the answer is actually supported by real sources, every time, and to measure that on purpose rather than assuming it.

Defaulting to the fanciest method. Multi-step retrieval feels like the smart choice, so people apply it everywhere. It is the slow, costly option, and most questions are simple lookups that never needed it. The better instinct runs the other way: use the cheapest method that answers the question correctly, and escalate to an expensive method only when the question genuinely earns it.

Forgetting to ask whether you need retrieval at all. Two cases. If your whole collection is small enough to hand to the AI in one go, just hand it over; a retrieval system is pure overhead. And if your "question" is really a recommendation or a ranking ("what should I watch next"), that is a different problem wearing a search costume, and a retrieval framework will point you at the wrong tools entirely.

A four-question checklist

If you are building or buying one of these, you do not need to memorize the methods. You need to ask four things, in order:

What kind of question is this, really? Which of the four buckets, honestly, not hopefully.
Where does the answer live? In one document, scattered across several, in a database, or in the whole collection at once.
What is the cheapest method that gets it right? Start there. Escalate only when you have to.
How will I know when it is wrong? Decide this before you launch, not after a customer finds the broken answer for you.

The one idea to keep

There is no single best search method. The winning move in knowledge retrieval is routing each question to the method that fits it, and being honest about which questions you are actually asking.

The system that does this rarely looks like the most sophisticated one in the room. It quietly gives the right answer, at the right cost, and tells you when it is not sure. That is the whole game.

Comments