Model hyperparameters are free parameters of a model which control different aspects of the learning process of your model. Hyperparameter search is the process of finding the model hyperparameters which result in the most performant model.
Spell lets you automate hyperparameter searches with the
spell hyper command.
For an interactive, runnable tutorial on hyperparameter search refer to our blog post: "An introduction to hyperparameter search with CIFAR10".
Anatomy of a hyperparameter search
This command is very similar to the
spell run command and takes all of the same command line options, with the addition of hyperparameter specifications. For more info on the
spell run command, see What Is a Run.
Let's take a look at the example command below.
$ spell hyper grid -t T4 \ --param rate=0.001,0.01,0.1,1 \ --param layers=2,3,5,10 -- \ "python train.py --learning_rate :rate: --num_layers :layers:"
The first part should be familiar. We request a grid search running on
Next are two
--param options, which list the values that we want our hyperparameter search to test for each specified parameter. Here we specify two parameters:
layers, and the values we want for each. The way to specify the values to search is different for the different types of hyperparameter searches.
Finally, we have our python command:
python train.py --learning_rate :rate: --num_layers :layers:.
The parameters in colon bracket form,
:layers:, are replaced in individual runs with specific values of the respective parameter.
Performing grid search
In grid search, a set of discrete values are provided for each hyperparameter and a run is created for all possible combinations of hyperparameters. For example:
$ spell hyper grid \ --param rate=0.001,0.01,0.1,1 \ --param layers=2,3,5,10 -- \ python train.py --learning_rate :rate: --num_layers :layers: Everything up-to-date 💫 Casting hyperparameter search #59… rate layers Run ID 0.001 2 362 0.001 3 363 0.001 5 364 0.001 10 365 0.01 2 366 0.01 3 367 0.01 5 368 0.01 10 369 0.1 2 370 0.1 3 371 0.1 5 372 0.1 10 373 1 2 374 1 3 375 1 5 376 1 10 377
Hyperparameters are specified with the
--param NAME=VALUE[,VALUE,VALUE...] flag.
NAME corresponds to the name of the hyperparameter. One or more comma separated
VALUEs can be provided after the
=, corresponding to the values for the hyperparameter. The values can consist of string, integer, or floating point values.
NAME provided must exist in the run command surrounded by colons. This tells Spell where to substitute specific values for the hyperparameter in the run command when making the individual runs for the hyperparameter search.
Performing random search
In random search, each hyperparameter is randomly sampled to determine specific values for each run. Additionally, the
--num-runs option must be specified to indicate the total number of runs to create. Hyperparameters are specified with the
--param option flag. The specification can consist of either:
- A set of discrete values, specified with
--param NAME=VALUE[,VALUE,VALUE...], similar to grid search. In this case one of the discrete values is randomly selected for the hyperparameter value when constituting a run.
- A range specification, specified with
--param NAME=MIN:MAX[:SCALING[:TYPE]]. In this case the hyperparameter value is randomly selected from the specified range when constituting a run.
MAXare required and correspond to the minimum and maximum value of the range of this hyperparameter.
SCALING is optional and can consist of 3 different values (
linear is the default if not specified):
linear: the hyperparameter range (i.e.,
MAX) is sampled uniformly at random to determine a hyperparameter value.
log: the hyperparameter range is scaled logarithmically during the sampling (i.e., the range
log(MAX)is sampled uniformly at random and then exponentiated to yield the hyperparameter value). This results in a higher probability density for the sampling towards the lower end of the range.
reverse_log: this is the opposite scaling as that described in
log, resulting in a higher probability density for the sampling at the higher end of the range.
TYPE is optional and can consist of 2 different values (
float is the default if not specified):
float: the resultant hyperparameter value is a floating point number.
int: the resultant hyperparameter value is an integer. If this option is specified, the value after randomly sampling is rounded to the nearest integer to yield the final hyperparameter value.
An example random hyperparameter search is as follows:
$ spell hyper random \ --num-runs 10 \ --param rate=.001:1.0:log \ --param layers=2:100:linear:int \ --param cell=gru,lstm,rnn -- \ python train.py --learning_rate :rate: --num_layers :layers: --cell_type :cell: Everything up-to-date 💫 Casting hyperparameter search #60… rate layers cell Run ID 0.535637 68 lstm 378 0.192321 21 gru 379 0.501205 34 lstm 380 0.00103308 40 gru 381 0.0976437 49 gru 382 0.0131644 36 rnn 383 0.00139867 27 lstm 384 0.0274699 3 lstm 385 0.350886 9 rnn 386 0.23146 66 lstm 387
Performing Bayesian search
Bayesian search uses the results of prior runs to try to pick new parameters to test intelligently. It will often either note that a large part of the parameter space is unexplored and pick something in that region or it will observe a prior success and pick something near that. This can help you save on the total number of iterations needed to find good parameters.
Similar to the random search you must specify one or more parameters via the
--param flag. These take the form
--param NAME=MIN:MAX[:TYPE], where
MIN is the lowest value that parameter is allowed to take,
MAX is the highest, and
float. Similarly, you will also still need to specify
--num-runs, the total number of runs that will be executed.
You will also need to provide the
--metric should be the name of the metric that the search algorithm will attempt to optimize (to learn more about metrics refer to the Metrics page in our docs).
--metric-agg specifies how Spell should interpret the values of this metric observed: this can be set to
For example, if you select the Keras metric
keras/val_accuracy and aggregation type
last, Spell will use the last validation accuracy recorded in a given run and treat that as the success of your model for those parameters.
The final parameter you will need is
--parallel-runs. This reflects a tradeoff: if you choose a lower number, the search will proceed incrementally and will take longer to complete. If you choose a higher number, many runs will be in progress when a new run is launched and the new run's parameters will be selected without the benefit of knowing how well the in-progress trials do.
if you are unsure what to set this value to, we recommend setting it to 3.
An example bayesian hyperparameter search is as follows:
$ spell hyper bayesian \ --num-runs 12 \ --parallel-runs 3 \ --metric keras/val_acc \ --metric-agg avg \ --param rate=.001:1.0 \ --param layers=2:100:int \ python train.py --learning_rate :rate: --num_layers :layers: Everything up-to-date 💫 Casting hyperparameter search #61… rate layers Run ID 0.343882 23 388 0.294112 72 389 0.587557 64 390
Interrupting a hyperparameter search
Hyperparameter searches can be stopped or killed in much the same way that runs can.
spell hyper stop $HYPER_SEARCH_ID spell hyper kill $HYPER_SEARCH_ID
spell hyper stop will send a graceful shutdown signal to all currently executing runs and dequeue all scheduled ones.
spell hyper kill will send a hard reset signal instead.
The choice between stopping or killing a hyperparameter search has the same tradeoffs as stopping or killing a run. For more information see the corresponding section in What Is a Run page.
Viewing searches in the web console
You can view the results of a hyperparameter search in the web console.
The web console provides a high-level overview of the hyperparameter search process. You can also use it to stop or kill an in-progress search.
However, the most useful feature of the hyperparameter search page are the three built-in model performance visualizations.
A line chart allows you to visualize the model performance over time or epoch:
You can switch to a different metric, click-and-drag to visualize a slice of time, and filter training runs to just the ones you are interested in.
A table provides a high-level overview of final model performance:
And finally, facet charts allow you to visualize model training results on a variable-by-variable basis: