What is an experiment

Experiments group a specific set of runs within a project for detailed comparison of metadata, hyperparameters, and metrics through both tables and charts.

You can compare run metrics, plot a single metrics against a time axis or plot the relationship between metrics and hyperparameters. Aggregated metric values and other run metadata can be compared in the metrics table. Once you have a clearer picture, you can share an experiment directly through the web or download results for further reporting.

Image of the Experiment Page.

Creating an experiment

To create an experiment, open the project tracking the runs that are relevant to your exploration. In the runs list, select a subset of runs to compare, and press the "Create New Experiment" dropdown within the "Experiments" button.

Image of the Experiment Selection Dropdown.

Image of the Experiment Creation modal.

You can view all current experiments on the Project page under the experiments tab. If you'd like to archive an experiment after it's no longer useful, this can be done by clicking the "..." button on the experiment list.

Adding runs to an experiment

Once an experiment is created, you can add more runs to it by selecting runs in the projects page, then clicking "Experiments" and "Add to experiment".

Image of the Add to Experiment Selection.

You can remove a run within an experiment by selecting it in the runs list and selecting "Remove" under the "Actions" dropdown.

Using the metrics table

The metrics table allows you to customize a tabular view of runs and their key metrics of interest within an experiment. You can customize columns in the runs table by selecting "Add Column" in the top right above the table and remove columns by clicking the 'x' that appears next to a header on hover. Columns can include run metrics such as metadata (e.g., CLI command, start time, duration), model hyperparameters (tracked through --param), and runtime metrics (e.g., user defined metrics through Python API, hardware usage).

Tracking hyperparameters in the metrics table

The metrics table will automatically populate user specified hyperparameters from the CLI commands using the --param flag, allowing for convenient comparison, charting, and sorting on hyperparameter differences among runs. In the following examples, the learning rate, lr is specified to be tracked, and will appear as a column in the metrics table:

$ spell run -t T4  \
    --param lr=0.01 \
    "python train.py --learning_rate :lr:"
$ spell hyper grid -t T4  \
    --param lr=0.001,0.01,0.1,1 \
    "python train.py --learning_rate :lr:"

Heatmap toggles

The Heatmap toggles cell shading to help you visualize the relative values of numeric columns.

Image of a Heatmap.

By clicking the star icon to the left of a run ID, other rows will be shaded relative to the value of the run you starred, providing a relative comparison to that one run.

Image of a Starred Heatmap.

Using charts

You have the ability to compare metrics and hyperparameter values using charts for up to ten runs. We offer two types of charts, in either linear or logarithmic axes: - Line charts displaying a single metric value over time (either relative to the start of the run, or using the index/epoch of when the metric was logged) - Scatter plots comparing the aggregated values of two distinct metrics, or a hyperparameter value against an aggregated metric

Image of a Line Chart Modal.

Image of a Scatter Plot Modal.

You can control which runs are displayed in charts by toggling the charting icon to the right of a run ID in the Metrics Table below, for up to ten runs.

Viewing run diffs

Coming soon! Look out for the Run Diff feature in experiments, which will make it possible to do a deep comparison of any two runs, including a code diff.