This page is a high-level overview of Spell's most important features. For more detailed discussions of specific features, refer to the corresponding sections of the User Guide.
A run is a single instance of a computational job executed on Spell.
Runs take an environment definition, a set of resources (datasets), and code as input, execute it on the cloud, and provide the files created over the course of the run as output. Runs are extensively configurable. The most important options are:
- Compute instance: options range from basic CPU to powerful GPU servers.
- Frameworks: we provide base environments for TensorFlow, PyTorch, Fast.ai, and others. Or, roll your own.
- Packages: you can install any code packages you need using pip, conda, and apt.
- Resources: datasets to be mounted onto the container.
Runs are the atomic unit of work in Spell. We do a lot of work behind the scenes to make runs easy to use and ergonomic to your workflow. To learn more about refer to the Run Overview.
A workspace is an instance of a Jupyter Notebook or JupyterLab environment running on the cloud. Under the hood, workspaces are still just runs, and can be configured in all the same ways.
Workspaces provide a flexible work environment on your choice of CPU and GPU hardware that you can easily spin up and spin down as needed. We manage the data storage and compute environment for you so that you can focus on the code.
To learn more about workspaces refer to the Workspace Overview.
Resources is the generic name for the datasets, models, or any other files made available to a run. Spell keeps these organized for you in a remote filesystem called SpellFS.
The resources associated with your account are split between
public getting-started assets that we provide (e.g. example data),
uploads that you push to SpellFS, and
runs outputs that you create during your run executions or workspace sessions. You can also mount public or private cloud storage buckets, making it easy to import big data into SpellFS.
To learn more about resources, see What is a Resource.
Spell automatically collects all of the hardware metrics and some of the model metrics generated as part of a run, outputting them to the run summary page. You can extend this system to log your own custom user model metrics on Spell.
To learn more on how metrics work refer to the guide on Metrics.
Projects and Experiments
Projects allow you to group your runs into meaningful categories. They allow you to create a summary view highlighting key metrics over time.
Once you have added some runs to a project, you can further subdivide them as Experiments. Experiments allow you to generate reports on specific aspects of your project.
These features work together to make it easy to share and report project state with your collaborators and project stakeholders.
Model hyperparameters control different aspects of the learning process of a machine learning model. Hyperparameter search is the process of finding the values for these numbers that produce the most performant model.
You can launch a hyperparameter search on Spell directly from the command line. We spin up and manage a pool of worker machines, and handle partitioning your search across a set of runs for you.
To learn more about hyperparameter search see the Hyperparameter Searches guide.
Hyperparameter searches are currently only available on Spell for Teams.
Model servers allow you to serve machine learning models on a Kubernetes cluster managed for you by Spell. Model servers are designed to make it trivially easy to productionize models trained on Spell, allowing you to use one tool for both your model training and model serving.
Model servers are currently only available on Spell for Teams.
To learn more about model servers see the Model Servers guide.
That concludes our high-level tour of Spell! This list is not exhaustive, we support many other features like:
- Distributed runs
- TensorBoard support
- Integration with Weights & Biases
- Cluster management
- Private machine types
- Early stopping
- And more...