Hyperparameter Tuning Configuaration

In the EasyLLM finetuning wizard, the Hyperparameter Tuning configuartion allows users to configure different options for hyperparameter tuning to find the best performing finetuned LLM. The available configuration options vary based on the subscription plan.

Hyperparameter Tuning: Hyperparameters are the parameters that are not learned during the finetuning process, but are set before the finetuning process begins (e.g. learning rate, batch size, epochs etc.). These parameters can have a significant impact on the performance of the finetuned LLM. Hyperparameter tuning involves finetuning multiple LLMs with different combinations of hyperparameters to find the best performing finetuned LLM.

If you are new to hyperparameter tuning, please read our blog post on Hyperparameter Tuning for more information on how to use it to find the best performing finetuned LLM.

Sweeps

Sweep is a hyperparameter tuning run with a specific configuration. Based on the configuration, a sweep will perform many finetuning runs with different hyperparameter values. For example, if you run a sweep with 2 LLMs and 3 different Epoch values, the sweep will perform 2 * 3 = 6 finetuning runs.

Note

In EasyLLM, you can run multiple sweeps with different configurations on a single Version or Quest. You can use the "Finetune Again" option to run a new sweep with the different configuration.

Starter Plan Users

Starter plan users have the following configuration options in the UI:

LLM List

The LLM list shows the LLMs available for finetuning. The list of LLMs shown here will be based on the dataset type (chat, tools, multimodal, etc.). You can select multiple LLMs using multiselect checkboxes, allowing you to compare their performance on the same dataset.

Things to consider before deciding which option to choose:

Bigger LLMs like gpt-4o incurs higher cost for finetuning and doing inference on the created finetune
If you have small dataset, opting for bigger LLM like gpt-4o may give best performance. But still the inference cost is higher for gpt-4o
Based on the reasoning capabilities required for the task, choose the LLM size. For example tasks like sentiment analysis, toxicity detection may not require a bigger LLMs like gpt-4o. But tasks like financial document analysis may benefit from better reasoning capabilities of gpt-4o

Continuous Finetuning or New Finetune

Refer to this section for more information on Continuous Finetuning and New Finetune methods. You can click the radio button option for either continuous finetuning or new finetune.

Things to consider before deciding which option to choose:

Choosing continuous finetuning requires only data from the current version, whereas choosing new finetune may incur higher cost due to dataset size (data from current version + data from previous versions).
With new finetune option, you have the flexibility to choose the LLM from the list of available LLMs to finetune, which can help you compare the performance of different LLMs.

Pro Plan Users

Pro plan users have additional configuration options in the UI:

Run Count

When you do hyperparameter tuning, based on the sweep method and the configuration, the total number of finetuning runs that will be created can be calculated. But this can be very high. You need to set the run count to a reasonable number. This is the number of maximum finetuning runs allowed. Be careful with this number, as increasing this will result in more cost, as each finetuning will incur a cost based on the LLM and dataset size.

Sweep Method

The sweep method is the method used for hyperparameter tuning. The possible options are bayes, random, and grid.

Grid search iterates over all possible combinations of parameter values.
Random search chooses a random set of values on each iteration.
Our Bayesian hyperparameter search method uses a Gaussian Process to model the relationship between the parameters and the metric and chooses parameters to optimize the probability of improvement. This strategy requires the metric to be specified in Evaluation configuration. This is the metric that will be used by the sweep method to dynamically choose the hyperparameter values.

To set the sweep method, you can use the radio button UI element.

Dataset Size Percentages

The dataset size percentages are the list of percentages of the dataset used for finetuning. For example, providing 25, 50, 75 values will create training datasets containing 25%, 50%, and 75% of data from the original dataset and finetune the LLMs on these datasets. This configuration option is used to check the performance based on different dataset sizes. It can tell whether adding more data examples improves the LLM performance or not.

To set the dataset size percentages, you can use the text box with a + button to add the values to the list. Added values will have an X button to remove that value.

Batch Size Configuration

You can configure the batch size for finetuning. Training data is usually processed in batches during finetuning. The batch size is the number of training examples used in each batch. Different batch sizes can affect the performance of the finetuned LLM. You can try different batch sizes to see which one works best for your use case.

To set the batch size configuration, you can choose between adding a list of values or providing a range. If you choose the list option, you can use the text box with a + button to add the values to the list. Added values will have an X button to remove that value. If you choose the range option, you can use two text boxes to provide the starting and ending values of the range.

Epoch Configuration

Epoch is the number of times the entire dataset is passed through the model during finetuning. More epochs can lead to better performance, but it can also lead to overfitting. Optimal epoch value can be different for different LLMs and datasets based on the task. Trying different epoch values can help you find the optimal epoch value for your use case.

To set the epoch configuration, you can choose between adding a list of values or providing a range. If you choose the list option, you can use the text box with a + button to add the values to the list. Added values will have an X button to remove that value. If you choose the range option, you can use two text boxes to provide the starting and ending values of the range.

Learning Rate Configuration

Learning rate is the step size used to update the model parameters during finetuning. It controls how much the model learns from the data. A higher learning rate can lead to faster convergence, but it can also lead to unstable training. A lower learning rate can lead to slower convergence, but it can also lead to overfitting.

To set the learning rate configuration, you can choose between adding a list of values or providing a range. If you choose the list option, you can use the text box with a + button to add the values to the list. Added values will have an X button to remove that value. If you choose the range option, you can use two text boxes to provide the starting and ending values of the range.

Learning Rate Multiplier Configuration

The learning rate multiplier is the factor by which the learning rate is multiplied during finetuning. This is similar to the learning rate configuration, but here you can specify the multiplier to the default learning rate. For example, if the default learning rate is 0.01, and the multiplier is 0.1, the learning rate will be 0.001.

To set the learning rate multiplier configuration, you can choose between adding a list of values or providing a range. If you choose the list option, you can use the text box with a + button to add the values to the list. Added values will have an X button to remove that value. If you choose the range option, you can use two text boxes to provide the starting and ending values of the range.

LoRA Rank Configuration

LoRA (Low-Rank Adaptation) is a technique used to finetune large language models. It involves adding low-rank matrices to the LLM to adapt it to specific tasks. The LoRA rank is the dimension of the low-rank matrices. Higher ranks can lead to better performance, but it can also lead to overfitting.

LLM Groups

We recomment performing sweeps with specific set of LLMs or with a single LLM to properly perform hyperparameter tuning. Because the effect of a specific hyperparameter value will be different for different LLMs. As these LLMs are from different providers, it will be better to perform sweeps with specific set of LLMs. For example, even though some of these LLMs have configurable Learning Rate, the effect of a specific Learning Rate value will be different for different LLMs.

LLM Group	LLM Models
OpenAI	GPT 4o, GPT 4o Mini
Azure OpenAI	GPT 4o, GPT 4o Mini
GCP Vertex AI	Gemini 1.5 Pro 002, Gemini 1.5 Flash 002
Google AI Studio	Gemini 1.0 Pro 001, Gemini 1.5 Flash 001
AWS Bedrock	Claude 3 Haiku
AWS Bedrock	Nova Pro, Nova Lite, Nova Micro
AWS Bedrock	Llama 3.1 8B Instruct, Llama 3.1 70B Instruct