The Battle Between Prompting and Finetuning

Lamini

TL;DR

Prompt engineering is an easy way to start improving your LLM.
Complex use cases demand proper finetuning, so your LLM can learn directly from your data; the best is to use both :)
Our CEO Sharon Zhou is teaching a fine-tuning course with Dr. Andrew Ng.
Sign up to optimize your LLMs with Lamini, or contact us to finetune at scale.

Since ChatGPT started the LLM explosion, you might have seen words like "finetuning" and "prompt engineering" all over your Twitter feed or YouTube recommended — the latter has even become a job title!

You might have already tried ChatGPT for your business or personal tasks. It’s powerful, but sometimes like a naughty kid — it doesn’t output exactly what you want. You need to make it suitable for your use case, and you know these mysterious words have something to do with improving your LLM. But what, exactly? We’re here to demystify things.

Prompting: just tell your LLM what to do 🫡

Prompting is what you do whenever you type a request into ChatGPT or any LLM. Prompt engineering is the art of modifying the structure and content of your LLM prompts for better results.

The obvious benefit of prompt engineering is how easy it is: just tell your model what you want it to do. You don’t need data or programming knowledge. But it also helps us learn more about how our base model works and its limitations, offering a great starting point when you’re figuring out the best way to improve your LLM.

Let’s say you ask ChatGPT to generate a Taylor Swift song. When you prompt directly, it does give you lyrics, but they’re… not very Swiftian. 🫠

It would be insulting to ever associate these lyrics with Taylor Swift. All the essential elements of her songwriting are missing: the emotion, the humor, the vivid storytelling. To start “engineering” our basic prompt, we can just be more specific about what we want:

It’s still pretty generic, and ChatGPT didn’t quite understand what we asked. When we mentioned that it should be witty and talk about an ‘acrimonious breakup,’ it used those words directly in the lyrics instead!

Even though this is a “failed” output, we’ve also learned a fundamental limitation of our base LLM: it struggles to distinguish between writing about a topic and using a specific word. It’s a small step but enabling us to interact more intelligently with our LLM going forward.

While what we’ve done so far is minimal, it’s clear that prompting our way into good Taylor Swift songs would be a Herculean task. This isn’t always true for simpler tasks, where prompt engineering might suffice. But with a difficult task like creative writing, the subjectivity of prompt engineering can be a problem.

It’s not clear how we should change our prompt to produce better results — going through all the trial and error takes significant time and ingenuity. If you have a complex use case, you’ll probably need something a little more definitive, and that’s where finetuning comes in.

Finetuning: making your LLM learn more 📖

Finetuning builds on the extensive pre-training process that LLMs go through to become general-purpose machines: it actually makes your base model learn more!

When an LLM like GPT-4 or LLaMA 2 is pre-trained, it’s exposed to colossal amounts of language data which slowly alter its parameters, the actual numbers in the model that allow it to predict text successfully. (You’ll sometimes hear of the number of parameters in a model referred to as a model’s size, often listed in the millions or billions.)

Finetuning is essentially the continuation of this training process, but on data of your choosing — the model continues to learn from a smaller dataset, which changes its parameters.

The essential idea behind finetuning is that training your LLM on high-quality data most relevant to your use case will make it noticeably better at your given task. Even finetuning on a small dataset of 100 strong examples can seriously improve your model’s performance.

Finetuning does require more expertise, resources, and time. You have to go through an API (we have an extensive tutorial and Colab notebook on finetuning with Lamini); it takes a nontrivial amount of computing power, and you must have high-quality data for it to produce good results.

If your data isn’t the best or your goals aren’t specific enough, finetuning can sometimes fail to produce the desired results. But this is uncommon, and if it does happen, you can carefully evaluate the results and continue finetuning your model.

The Best of Both Worlds 🔮

Finetuning and prompt engineering are, in an ideal workflow, both necessary for improving your LLM until it meets your needs. Using both means changing your LLM from the ‘inside’ (with finetuning) and the ‘outside’ (with prompts).

Each is an art unto itself and contains many advanced techniques (which we’ll explore in future posts!). With both of them in hand, there’s little you can’t do! ChatGPT is a fine-tuned version of GPT-3.5, which speaks to the power of these techniques.

Interested in getting into the battle field of prompting and finetuning? Sign up and try Lamini's fast and furious finetuning, for free! Want to fully utilize the power of finetuning for your business use case? Chat with our team!

Want to learn more about finetuning? Our CEO, Sharon Zhou, is teaching an LLM finetuning course with Dr. Andrew Ng in a couple of weeks. Stay tuned! 🚀 💪 🥳

P.S. We didn’t dare try to imitate the genius of Taylor Swift herself, but you can check out our SwiftieBot: the ultimate Taylor Swift fan, created using finetuning with Lamini.

Published on August 7, 2023