What is an experiment
Experiments group a specific set of runs within a project for detailed comparison of metadata, hyperparameters, and metrics through both tables and charts.
You can compare run metrics, plot a single metrics against a time axis or plot the relationship between metrics and hyperparameters. Aggregated metric values and other run metadata can be compared in the metrics table. Once you have a clearer picture, you can share an experiment directly through the web or download results for further reporting.
Creating an experiment
To create an experiment, open the project tracking the runs that are relevant to your exploration. In the runs list, select a subset of runs to compare, and press the "Create New Experiment" dropdown within the "Experiments" button.
You can view all current experiments on the Project page under the experiments tab. If you'd like to archive an experiment after it's no longer useful, this can be done by clicking the "..." button on the experiment list.
Adding runs to an experiment
Once an experiment is created, you can add more runs to it by selecting runs in the projects page, then clicking "Experiments" and "Add to experiment".
You can remove a run within an experiment by selecting it in the runs list and selecting "Remove" under the "Actions" dropdown.
Using the metrics table
The metrics table allows you to customize a tabular view of runs and their key metrics of interest within an experiment. You can customize columns in the runs table by selecting "Add Column" in the top right above the table and remove columns by clicking the 'x' that appears next to a header on hover. Columns can include run metrics such as metadata (e.g., CLI command, start time, duration), model hyperparameters (tracked through
--param), and runtime metrics (e.g., user defined metrics through Python API, hardware usage).
Tracking hyperparameters in the metrics table
The metrics table will automatically populate user specified hyperparameters from the CLI commands using the
--param flag, allowing for convenient comparison, charting, and sorting on hyperparameter differences among runs. In the following examples, the learning rate,
lr is specified to be tracked, and will appear as a column in the metrics table:
$ spell run -t T4 \ --param lr=0.01 \ "python train.py --learning_rate :lr:"
$ spell hyper grid -t T4 \ --param lr=0.001,0.01,0.1,1 \ "python train.py --learning_rate :lr:"
The Heatmap toggles cell shading to help you visualize the relative values of numeric columns.
By clicking the star icon to the left of a run ID, other rows will be shaded relative to the value of the run you starred, providing a relative comparison to that one run.
You have the ability to compare metrics and hyperparameter values using charts for up to ten runs. We offer two types of charts, in either linear or logarithmic axes:
- Line charts displaying a single metric value over time (either relative to the start of the run, or using the index/epoch of when the metric was logged)
- Scatter plots comparing the aggregated values of two distinct metrics, or a hyperparameter value against an aggregated metric
You can control which runs are displayed in charts by toggling the charting icon to the right of a run ID in the Metrics Table below, for up to ten runs.
Viewing run diffs
Coming soon! Look out for the Run Diff feature in experiments, which will make it possible to do a deep comparison of any two runs, including a code diff.