Quickstart
Spell is a platform for training and deploying machine learning models quickly and easily. This quickstart guide will walk through training your first machine learning model using the Spell CLI. You can follow along with either written or video tutorials below, both using our sample CIFAR10 training code. Note the video covers features in Spell for Teams which may not be applicable to Community users.
Logging in
Before you start, make sure that you have Python and Git installed. If you haven't registered yet, create a free account now on our registration page. Then install the Spell Python package using pip:
$ pip install spell
Once you have the Spell CLI installed, verify that everything is working as expected by running spell --help
. This should output a helpful list of subcommands.
Before you can do anything useful, you first need to log in:
$ spell login
This will open the Spell Web Console in a web browser. You will be asked if you want to authorize the local application to have access to your Spell account. If your login is successful you will see the following greeting:
Hello <username>!
Alternatively, if your account uses Spell authentication (and not Sign In With Google authentication) you may log in using your Spell username and password instead:
$ spell login --identity <username> --password <password>
If you receive an error message and are sure you are using the correct password and username, please contact us at support@spell.ml.
You can check your current login status at any time using:
$ spell whoami
Your first run
Runs are one of the foundations of Spell. Creating a run is how you execute your code on Spell's computing infrastructure, and the spell run
command is likely the command you'll use most while using Spell.
Each run in Spell is an instance of a single computational job executed on our infrastructure. Runs are typically executed from inside a Git repository. Executing a run will:
- Sync the contents of the repository with Spell.
- Spin up a machine (or set of machines!) on the cloud, and execute your job on those machine(s).
- Save any file outputs from those jobs to our filesystem, SpellFS, for later access.
To execute a run, use spell run
. The simplest command you can run on your computer is echo "hello world"
, which will print hello world
to the screen. To run this on Spell:
$ spell run "echo hello world"
Which outputs:
✨ Casting spell #1…
✨ Stop viewing logs with ^C
...
✨ Run is running
hello world
The run workflow
To dig a bit deeper into runs, let's try training a convolutional neural network (CNN) on the CIFAR10 dataset using Spell. We will use a simple training script from our spellml/cnn-cifar10
repository on GitHub (this example uses PyTorch, but TensorFlow 2 works just as well).
Clone the repository, then cd
into it in your terminal. Then run the following code:
$ spell run --machine-type t4 \
python models/train_basic.py
This will launch a model training job on an NVIDIA T4 instance on the cloud (if you do not have access to a T4, try using --machine-type cpu
instead). You can track the progress of this run in the CLI, or alternatively by navigating to the web console and navigating to the run details page for the run you just launched:
The run details page includes all of the information you need to reproduce the run—the command that was run and code repository that was used, the exact commit hash that repository was at, any pip
/apt
/conda-file
dependencies you installed, etcetera. All of the run logs are saved here too.
One of the value-adds that Spell provides to your runs is metrics. Spell automatically saves and displays hardware metrics like CPU and GPU utilization for you. This demo script logs an additional metric, train_loss
, using the send_metric
command in Spell's Python API; this too is saved to and displayed on the run details page:
Any files the run writes to its current working directory will appear here too, as will run logs:
Congratulations, you've now trained your first machine learning model on Spell!
Using workspaces
The next major feature we will cover is workspaces. Workspaces are JupyterLab instances running on the cloud. Workspaces are designed to replicate your local machine learning development. But because workspaces are on the cloud they are more easily replicable, scalable, and shareable.
You can launch a workspace from the web console. You'll be asked for a name and (optionally) a git
repository to initialize the workspace files from. For the purposes of this demo, let's reuse the spellml/cnn-cifar10
repo:
Next you set your environment variables, machine type and any additional apt
, pip
, and/or conda-file
dependencies; and toggle Jupyter Lab or Notebook. After that you can optionally mount any resources you need.
Once you've confirmed your settings, the workspace will be created, the page will refresh, and you can start coding.
You can start, stop, restart, or clone a workspace at any time to pick up right where you left off.
Using model servers
Note
This feature is only available to users on Spell for Teams.
Once you've trained your model using Spell runs and/or workspaces, the next step is deploying them. For that, you can use Spell model servers.
Model servers on Spell are (Kubernetes-based) serving clusters that you spin up and we maintain for you. To demonstrate how they work, let's serve a simple serving script encapsulating the model we just finished training.
The first step is creating a model. A model is a group of resources, sourced from a run or an upload, that encapsulate all of the model weights and configuration files our serving script will need:
$ spell model create cnn-cifar10 runs/$RUN_ID \
--file checkpoints/model_final.pth
Replace $RUN_ID
with the ID number of the run we just finished executing. You can view a list of all of the models you've created by visiting the models page in the web console:
To serve the model, we combine this model artifact with a model serving script. A model serving script is a simple *.py
file with the following basic format:
from spell.serving import BasePredictor
class Predictor(BasePredictor):
def __init__(self):
pass
def predict(self, payload):
pass
To serve our example script, run the following CLI command (make sure you are in the root of the spellml/cnn-cifar10
git
repository on your local machine first):
$ spell server serve \
--node-group default \
--pip pillow \
cnn-cifar10:v1 server/serve.py
Executing this command will automatically take you to the details page for this model server in the web console:
Once the model server is ready to serve traffic, you can test it out for yourself. Grab the URL on the model server summary page—this is the endpoint we need to hit—and a picture of one of the ten classes in the CIFAR10 dataset. I used this picture of a coworker's cat:
Then try running the following Python code:
from PIL import Image
from io import BytesIO
import requests
import base64
img = Image.open("cat.jpg")
img.convert("RGB")
buf = BytesIO()
img.save(buf, format="JPEG")
img_str = base64.b64encode(buf.getvalue())
resp = requests.post(
"https://$SPELL_ORG.spell.services/$SPELL_ORG/$SERVER_NAME/predict",
headers={"Content-Type": "application/json"},
json={
"image": img_str.decode("utf8"),
"format": "JPEG"
})
print(resp.json())
This code packages the image bytes into a base 64 encoded JSON string understood by this model server, sends it, then displays the JSON response.
{'class': 'Cat'}
Next steps
That concludes this Quickstart!
This brief tour covers the three most important features in Spell: runs, workspaces, and model servers.
For a brief tour of the rest of Spell's core features, check out Core Concepts. Alternatively, check the user guide in the sidebar to learn more about specific features that may be valuable.
Though this quickstart is focused on the Spell CLI, everything that we did here can be done using the Spell Python client as well. To learn more about using Spell's Python API, check out the Python quickstart in our spellml/examples
repository on GitHub.