Hyperparameter Tuning for finetuning Large Language Models

Finetuning Large Language Models is a powerful technique for harnessing full power of LLMs. However, finetuning requires careful selection and tuning of hyperparameters to achieve optimal performance.

With our no-code MLOps platform for finetuning LLMs- EasyLLM, you have built-in features to do hyperparameter tuning for finetuning LLMs and with support for multiple providers like OpenAI, Azure OpenAI, GCP Vertex AI, Google AI Studio, AWS Bedrock and Cohere.

In this article, we'll explore hyperparameter tuning and its methods, including random, grid, and Bayesian, and how it can be used for finetuning LLMs.

What are Hyperparameters?

Imagine you're a schoolteacher guiding a group of children to learn math. You have a set of examples you can provide to the children to learn the concept. A sample has a math question and step by step guide to solve the problem.

As a teacher, you can adjust various teaching strategies to ensure they grasp the material effectively. These strategies are like hyperparameters in machine learning, where you're tuning your approach to optimize the students' learning experience.

Epochs: This is like deciding how many times the children will go through the given set of examples. Once, twice, or ten times? Reviewing multiple times might help them remember better, but just memorizing the examples without understanding the concept might not be helpful to perform well in exams.
Learning Rate: Think of this as how quickly the children go through each example. If they rush through the material too fast, they might miss important details. If they go too slowly, it might take forever to finish the lesson. You need to strike the right balance so they can learn efficiently.

Just like a teacher tweaks these strategies to help students succeed, in machine learning, hyperparameters are adjusted to help the model learn from the data effectively!

Hyperparameter tuning

Just like how a teacher does experiments to find the best combination for a student, in machine learning, we too experiment with different settings for these hyperparameters to find the best combination for our model. This process is called hyperparameter tuning.

We train multiple models, each with a unique combination of hyperparameters. We then compare their performance through some evaluation measure such as an accuracy score. This is like checking the performance of the students in an exam. The model with the highest score is selected as the best one for our task.

Hyperparameter tuning involves a lot of trial and error. Some people do this manually. But doing this manually is often time-consuming and not optimal. Instead, a more effective and systematic approach is using automated hyperparameter tuning methods.

Hyperparameter Tuning Methods

Hyperparameter tuning is the process of selecting the best hyperparameters for a given task. There are three common methods for hyperparameter tuning: random, grid, and Bayesian.

Random Search: This method randomly selects hyperparameters from a given range of values. It is a simple and efficient method that can quickly explore a large search space. However, it may not find the optimal hyperparameters and can be computationally expensive.
Grid Search: This method exhaustively searches all possible combinations of hyperparameters from a given range of values. It is a systematic method that guarantees finding the optimal hyperparameters. However, it can be computationally expensive and may not scale well for large search spaces.
Bayesian Optimization: This method uses a probabilistic model to predict the performance of different hyperparameters and selects the best ones based on the model's predictions. It is an efficient method that can handle large search spaces and is less computationally expensive than grid search. However, it requires more expertise to set up and may not always find the optimal hyperparameters.

Hyperparameters for finetuning LLMs

When finetuning LLMs, several hyperparameters need to be tuned to achieve optimal performance. Let's take a look at some of the hyperparameters and their configurations:

LLM: There are many LLMs that you can finetune. In OpenAI, you have gpt-4o-mini and gpt-4o. In GCP Vertex AI, you have Gemini 1.5 Pro and Gemini 1.5 Flash. They come in different sizes and have different capabilities. For example, a small LLM will perform on par with a large LLM on many tasks and cost less to train and run. A large LLM will perform better than a small LLM on complex tasks, but it will cost more to finetune and will be slower to run. In most of the cases, a small LLM may be sufficient for your task, while in others, you may need a large LLM. You can finetune multiple LLMs at once to find the best one for your task.
Batch Size: The batch size is the number of samples processed in each batch. Larger batch sizes tend to work better for larger datasets.
Epochs: The number of epochs is the number of times the entire dataset is passed through the model. Choosing a higher number of epochs can lead to better performance but will take longer to train and there is a risk of overfitting. Sometimes, a lower number of epochs may be sufficient for your task and better generalization.
Learning Rate: The learning rate controls how much the model changes its parameters in response to the estimated error each time it updates them. A higher learning rate can lead to faster training but may cause the model to overfit. A lower learning rate can lead to slower training but may help the model generalize better.
LoRA Rank: LoRA is a technique for finetuning LLMs that allows you to finetune LLMs with fewer parameters. The LoRA rank is the rank of the low-rank adaptation matrix. It controls the size of the adaptation matrix and the number of trainable parameters. A higher rank can lead to better performance. But smaller rank is sufficient for many tasks.

Conclusion

Hyperparameter tuning is a crucial step in finetuning LLMs. Random, grid, and Bayesian optimization are common methods for hyperparameter tuning. When finetuning LLMs, several hyperparameters need to be tuned, including LLM, batch size, epochs, learning rate, and LoRA rank. By carefully selecting and tuning hyperparameters, we can achieve optimal performance in finetuning LLMs.