Hyperparameter Tuning Configuaration
The Hyperparameter Tuning configuartion allows users to configure different options for hyperparameter optimization to create multiple finetunes and find the best performing model. The available configuration options vary based on the subscription plan.
Hyperparameter optimization
Hyperparameter optimization is a technique used to find the best performing model by tuning the hyperparameters of the model. Hyperparameters are the parameters that are not learned during the finetuning process, but are set before the finetuning process begins. These parameters can have a significant impact on the performance of the model. Hyperparameter optimization involves trying different combinations of hyperparameters to find the best performing model.
If you are new to hyperparameter tuning, please read our blog post on Hyperparameter Tuning for more information on how to use it to find the best performing model.
Starter Plan Users
Starter plan users have the following configuration options in the UI:
Continuous Finetuning or New Finetune
For all the new versions you create after creating the first version (version > 1), Users can choose between continuous finetuning or new finetune.
- Choosing continuous finetuning will use the previously finetuned LLM model to finetune again with new data.
- Choosing new model will let you choose the base models from the list of GPT-3 base models to finetune. Here the data from current version and previous versions will be combined to finetune the model.
You can click the radio button option for either continuous finetuning or new finetune
Things to consider before deciding which option to choose: - Choosing continuous finetuning requires only data from the current version, whereas choosing new finetune may incur higher cost due to dataset size (data from current version + data from previous versions). - Choosing new finetune allows you to choose the base models from the list of GPT-3 base models to finetune, which can help you compare the performance of different base models finetuned on the same dataset.
Model List
The model list is a list of GPT-3 base models (ada, babbage, curie, and davinci) that you can finetune. You can choose multiple models, and the chosen models will be used for finetuning.
To choose the models, you can use the multiselect check boxes UI element. Choosing multiple base models allows you to compare the performance of different base models finetuned on the same dataset.
Things to consider before deciding which option to choose: - Bigger models like davinci incurs higher cost for finetuning and doing inference on the created finetune - If you have small dataset, opting for bigger model like davinci may give best performance. But still the inference cost is higher for davinci - Based on the reasoning capabilities required for the task, choose the model size. For example tasks like sentiment analysis, toxicity detection may not require a bigger model like davinci. But tasks like financial document classification may benefit from better reasoning capabilities of davinci
Pro and Enterprise Plan Users
Pro and Enterprise plan users have additional configuration options in the UI:
Run Count
The run count is the number of maximum finetuning runs allowed with the given configuration. Based on this count, hyperparameter optimization will decide the number of runs it needs to run.
To set the run count, you can use the text input UI element. Be careful with this number, as it indicates the total number of finetuned models to create. Increasing this will result in more cost, as each finetuning will incur a cost based on the base model and dataset size.
Sweep Method
The sweep method is the method used for hyperparameter optimization. The possible options are bayes, random, and grid.
- Grid search iterates over all possible combinations of parameter values.
- Random search chooses a random set of values on each iteration.
- Our Bayesian hyperparameter search method uses a Gaussian Process to model the relationship between the parameters and the model metric and chooses parameters to optimize the probability of improvement. This strategy requires the metric key to be specified.
To set the sweep method, you can use the radio button UI element.
Dataset Size Percentages
The dataset size percentages are the list of percentages of the dataset used for finetuning. For example, providing 25, 50, 75 values will create finetuned models using datasets containing 25%, 50%, and 75% of data from the original dataset.
To set the dataset size percentages, you can use the text box with a + button to add the values to the list. Added values will have an X button to remove that value.
This configuration option is used to check the model performance based on different dataset sizes. It can tell whether adding more data examples improves the model or not.
Batch Size Configuration
The batch size configuration is the list of batch sizes used for finetuning. The batch size is the number of training examples used to train a single forward and backward pass. In general, larger batch sizes tend to work better for larger datasets.
To set the batch size configuration, you can choose between adding a list of values or providing a range. If you choose the list option, you can use the text box with a + button to add the values to the list. Added values will have an X button to remove that value. If you choose the range option, you can use two text boxes to provide the starting and ending values of the range.
Epoch Configuration
The epoch configuration is the list of epoch values used for finetuning. An epoch refers to one full cycle through the training dataset.
To set the epoch configuration, you can choose between adding a list of values or providing a range. If you choose the list option, you can use the text box with a + button to add the values to the list. Added values will have an X button to remove that value. If you choose the range option, you can use two text boxes to provide the starting and ending values of the range.
Learning Rate Configuration
The learning rate configuration is the list of learning rates used for finetuning. The finetuning learning rate is the original learning rate used for pretraining multiplied by this multiplier. We recommend experimenting with values in the range 0.02 to 0.2 to see what produces the best results. Empirically, we've found that larger learning rates often perform better with larger batch sizes.
To set the learning rate configuration, you can choose between adding a list of values or providing a range. If you choose the list option, you can use the text box with a + button to add the values to the list. Added values will have an X button to remove that value. If you choose the range option, you can use two text boxes to provide the starting and ending values of the range.
Prompt Loss Weight Configuration
The prompt loss weight configuration is the list of prompt loss weights used for finetuning. The weight to use for loss on the prompt tokens. This controls how much the model tries to learn to generate the prompt (as compared to the completion which always has a weight of 1.0), and can add a stabilizing effect to training when completions are short.
To set the prompt loss weight configuration, you can choose between adding a list of values or providing a range. If you choose the list option, you can use the text box with a + button to add the values to the list. Added values will have an X button to remove that value. If you choose the range option, you can use two text boxes to provide the starting and ending values of the range.
Conclusion
Hyperparameter optimization is a powerful technique for finding the best performing model. The Hyperparameter Tuning configuartion options allow users to experiment with different hyperparameters and find the best performing model for their specific use case. By understanding the available configuration options and how to use them, users can create high-performing models that meet their needs.