· engineering · 12 min read
How to choose between prompting, RAG, and fine-tuning
Three ways to make an AI feature work, three very different price tags. Here is the decision framework we use to pick the right one, written for the person signing off on the budget.

When a client comes to us wanting to add AI to a product, the conversation almost always arrives at the same fork in the road. They have heard the terms. Someone has told them they need to fine-tune a model. Someone else has told them they need RAG. And a third person has said the whole thing is just a matter of writing better prompts. All three people sound confident, and all three are sometimes right.
The problem is that these are not three flavors of the same thing. They solve different problems, they cost wildly different amounts, and they fail in different ways. Picking the wrong one is one of the more expensive mistakes we see businesses make with AI, because the cost of the mistake is rarely visible until months in.
So here is how we think about it when we scope this work, written for the person signing off on the budget rather than the person writing the code.
The three options, in plain terms
A sharper prompt means giving an off-the-shelf model clearer instructions. You include examples of what good output looks like, you specify the format you want, you tell it what role to play, and you constrain it. Nothing about the model changes. You are just asking the question better.
RAG, which stands for retrieval-augmented generation, means connecting the model to a searchable library of your own documents. When a question comes in, the system finds the most relevant pieces of your material and hands them to the model along with the question. The model answers using that supplied context. Your knowledge lives in the library, not in the model.
Fine-tuning means taking a base model and continuing to train it on your own examples until the new behavior is baked into the model itself. The model’s internal weights actually change. After fine-tuning, you have a custom model that behaves differently from the one you started with.
The one question that decides almost everything
Before anyone talks about cost, there is a single diagnostic question that rules most of the decision: is the thing you want to fix about knowledge, or about behavior?
Knowledge means facts. Your product catalogue, your internal policies, last quarter’s numbers, the contents of a contract. If the model is getting things wrong because it does not have access to the right information, that is a knowledge problem.
Behavior means how the model responds. The tone, the format, the structure, whether it reliably produces clean JSON, whether it sounds like your brand. If the model has the information but presents it inconsistently, that is a behavior problem.
This distinction matters because fine-tuning is bad at solving knowledge problems, and a lot of businesses do not know that. There is a study from Google Research and the Technion, presented at a major AI research conference in 2024, that tested this directly. The finding was blunt: models acquire facts during their original pre-training, and fine-tuning teaches them to use what they already have rather than absorbing new facts. Worse, when you fine-tune a model on new factual information, its tendency to hallucinate goes up. You are not teaching it the facts. You are teaching it to confidently make things up.
So the first cut is simple. Knowledge problem, you are looking at RAG. Behavior problem, you are looking at prompt engineering first and possibly fine-tuning later. Getting this wrong means spending six figures fine-tuning a model to know things it will never reliably know.
Start with the prompt, and do not feel bad about it
There is a quiet bias in this industry toward the most complicated solution, because the complicated solution sounds like more serious engineering. We push back on that. A sharper prompt is the first thing we try, and it is the first thing we recommend you try, because it is faster than anything else and it often just works.
The reason it works more often than people expect is that modern models are already very capable. Most of the time, when a model produces disappointing output, it is not because the model is incapable. It is because the instruction was vague. Tightening the instruction, adding a few examples, and specifying the output format can produce a large jump in quality in an afternoon.
Prompting also has properties that the other two approaches do not. It costs nothing in infrastructure. It stays readable, so it is obvious exactly what the system is being told. And it survives model upgrades. When a better base model is released, a good prompt usually keeps working. A fine-tuned model does not get that for free.
The honest limitations are these. A prompt cannot give a model information it never had, so it does nothing for a knowledge problem. It is bounded by the context window, the amount of text the model can consider at once. And while the headline numbers on context windows are now enormous, the practical reality is messier. A study from Chroma Research in 2025 tested eighteen current models and found that all of them get less reliable as you stuff more text into the context. A model advertised with a very large window starts to degrade well before you reach the limit. So you cannot solve a knowledge problem by simply pasting all your documents into the prompt. It will not hold.
The other limitation is organizational. Prompts are easy to write and easy to lose track of. We have watched a business’s prompt count climb into the hundreds within a few months, with no versioning and no record of which prompt feeds which feature. When one gets edited, something downstream breaks and nobody knows why. That is a manageable problem, but it is a real one, and it is why we treat prompts as versioned assets rather than scraps of text.
When the answer is RAG
If the diagnostic landed on knowledge, RAG is almost certainly where you are headed. It is the right tool when the information the model needs lives in your own documents, when that information changes, or when you need the system to cite where its answers came from.
That last point is worth dwelling on. Because a RAG system answers from documents it just retrieved, it can show you the source. For anyone in a regulated industry, or anyone who simply needs to defend an answer, that traceability is not a nice-to-have. It is the whole reason to choose RAG over the alternatives.
RAG also keeps knowledge current without retraining anything. Update the document, re-index it, and the system uses the new version on the next question. For a business whose information shifts week to week, that is the difference between a system that stays correct and one that quietly goes stale.
What RAG costs is a real infrastructure bill. You need a vector database, which is the searchable library that holds your documents. You pay to convert your documents into a form the system can search, though that particular cost has become very cheap. You pay for the database hosting, and the major providers have recently introduced monthly minimums in the range of twenty-five to fifty dollars a month for their entry plans. And every single question costs slightly more to answer than a bare prompt would, because retrieved text gets added to every request, and you pay for those extra words every time.
For a small production system, the all-in monthly cost tends to land somewhere in the low hundreds to around a thousand dollars. For a serious mid-market deployment, once you account for the engineer time needed to keep it healthy, the fully loaded annual cost runs well into six figures. The infrastructure is not the expensive part. The ongoing attention is.
RAG also has failure modes worth knowing about before you commit. The system can simply fail to retrieve the right document, in which case the model answers from nothing useful. The retrieved documents can contradict each other. And the quality of the whole system depends heavily on how the documents were broken up and indexed in the first place, which is real engineering work rather than a setting you switch on. RAG reduces hallucination considerably. It does not eliminate it.
When fine-tuning is actually worth it
Fine-tuning has a narrow but real set of jobs it does well. It is the right tool when you need a behavior to be reliably consistent, like producing output in an exact format every time, or classifying things the same way without drift, or holding a steady brand voice. It can also let you shorten your prompts, because behavior baked into the model no longer needs to be explained in every request. And at high volume, it can let a smaller, cheaper model do the work of a larger one, which changes the per-question economics.
That last point is the real commercial case for fine-tuning, and it only works above a certain scale. If you are answering a few hundred questions a day, fine-tuning will almost never repay the cost of building it. If you are answering hundreds of thousands, the savings on each question can add up to something that justifies the project. The volume is part of the decision, not an afterthought.
Here is the part that catches businesses off guard. The expensive part of fine-tuning is not the computing. The training run itself can cost a few hundred dollars. The expensive part is the data. Fine-tuning needs a curated set of high-quality examples, and across the projects and practitioner reports we have seen, preparing that data routinely consumes thirty to fifty percent of the total project budget. A model that looked like a few-hundred-dollar exercise becomes a project costing tens or hundreds of thousands once the data work is accounted for honestly. And the quality ceiling is set by the data: if the examples are inconsistent, the model learns the inconsistency.
Fine-tuning also carries two ongoing burdens that prompting and RAG do not. A fine-tuned model is locked to the specific base model it was trained on. When that base model is retired, your fine-tuned version retires with it, and you redo the work. And it needs retraining whenever you want it to keep pace with newer base models, which means the project is never quite finished.
One more thing worth knowing if you are considering fine-tuning specifically on OpenAI’s platform. OpenAI began winding down its self-serve fine-tuning service in 2026, on the stated reasoning that newer base models are now capable enough that most fine-tuning is unnecessary. New fine-tuning jobs are being progressively cut off through 2026 and into early 2027. Fine-tuning has not gone away as a technique, and other providers still support it fully, but the company that popularized the easy version of it has decided most customers no longer need it. That is a signal worth weighing.
The questions we ask before committing to a path
When we scope this work with a client, the decision comes down to a short list of questions. They are worth asking yourself before you commit a budget.
Is the gap about knowledge or behavior? This is the first cut, and it does most of the work.
Does the underlying information change often? If it changes weekly, retraining is too slow and too brittle to keep up. That points to RAG.
Do you need to show where answers came from? If yes, RAG, because traceability is built into how it works.
Can a clearer prompt get you most of the way there? If it can, stop. Ship it. You can always revisit.
What is the query volume? Below roughly ten thousand questions a day, fine-tuning rarely pays back its build cost. Above that, the economics start to shift.
Do you have enough clean, consistent example data, or the budget to create it? If not, fine-tuning will disappoint you regardless of how much you spend on everything else.
Do you have the technical capacity to maintain it? RAG and fine-tuning both need ongoing engineering attention. Prompting is the only one of the three a small team can run without that.
How we approach it
In practice, we work through these in order. We start every engagement with prompt engineering, because it is fast, cheap, and frequently sufficient, and because it gives us a clear read on what the model cannot do. If the gap is knowledge, we add RAG, and we measure how well the retrieval is working separately from how well the model is answering, because most RAG problems are retrieval problems wearing a disguise. We reach for fine-tuning last, only when prompting and RAG have hit a real ceiling, the remaining problem is behavioral, and the volume justifies it.
These are not mutually exclusive, either. The most robust systems we build usually combine RAG for knowledge with careful prompt engineering for behavior, and occasionally a light fine-tune on top for tone and format. The three-way choice is a useful way to think the problem through. It is not a rule that you may only pick one.
The thing we want a business owner to take away is this: the most expensive AI mistakes are not technical failures. They are well-built solutions aimed at the wrong problem. A fine-tuned model that was supposed to know things. A RAG system built when a sharper prompt would have done the job. Working through the questions above before you commit a budget is how you avoid that. And if you want a second opinion on which path a project needs, that is the kind of conversation we would rather have before any code gets written.


