TensorBoard is a popular model visualization tool. It provides a variety of features including tracking metrics, visualizing model graphs, projecting embeddings to a lower dimensional space, displaying images/text/audio data, and much more. TensorBoard is supported by both TensorFlow and PyTorch.
With Spell, you can easily use TensorBoard to visually examine your model training jobs.
Enabling TensorBoard in a run
TensorBoard operates by reading from an event file which you generate over the course of a run. By specifying the directory that the TensorFlow
FileWriter will write to with the
--tensorboard-dir flag, Spell will
connect these event files to a remote TensorBoard instance that you can view during the run.
Using TensorBoard from the web console
Navigate to the web and select a run. Then click "Open" (in the case of run in progress) or "Resume" (in the case of a completed run) to open the TensorBoard run in a new tab.
In the case that you view TensorBoard for a completed run, you will be asked to specify an instance type—CPU is advised. It may take a minute or two to spin up the machine; the tab will refresh once it is ready.
To stop a Tensorboard run, click on the "Stop Tensorboard" button at the top right of the screen.
Comparing multiple runs with TensorBoard
On the Runs list page, select multiple completed runs using the checkboxes on the left. If the runs may be displayed in TensorBoard, a 'T' icon will be accessible in the batch operations at the top of the list.
After selecting a machine type for the TensorBoard, it may take time to acquire a machine and load data for the runs. It will refresh automatically when the data is loaded.