Using TensorBoard


TensorBoard is a popular model visualization tool. It provides a variety of features including tracking metrics, visualizing model graphs, projecting embeddings to a lower dimensional space, displaying images/text/audio data, and much more. TensorBoard is supported by both TensorFlow and PyTorch.

With Spell, you can easily use TensorBoard to visually examine your model training jobs.

Tensorboard example

Enabling TensorBoard in a run

TensorBoard operates by reading from an event file which you generate over the course of a run. By specifying the directory that the TensorFlow FileWriter will write to with the --tensorboard-dir flag, Spell will connect these event files to a remote TensorBoard instance that you can view during the run.

Using TensorBoard from the web console

Navigate to the web and select a run. Then click "Open" (in the case of run in progress) or "Resume" (in the case of a completed run) to open the TensorBoard run in a new tab.

Open tensorboard run

In the case that you view TensorBoard for a completed run, you will be asked to specify an instance type—CPU is advised. It may take a minute or two to spin up the machine; the tab will refresh once it is ready.

To stop a Tensorboard run, click on the "Stop Tensorboard" button at the top right of the screen.

Comparing multiple runs with TensorBoard

On the Runs list page, select multiple completed runs using the checkboxes on the left. If the runs may be displayed in TensorBoard, a 'T' icon will be accessible in the batch operations at the top of the list.

TensorBoard on the Runs Page

After selecting a machine type for the TensorBoard, it may take time to acquire a machine and load data for the runs. It will refresh automatically when the data is loaded.

Multiple Tensorboard Runs