Using custom Docker images in Spell runs and workspaces

One of the most powerful features of Spell’s runs and workspaces features is the ability to completely rewrite the environment using our custom Docker images feature. The --docker-image flag on the spell run and spell jupyter command lets you inject your own execution environment into the run, allowing more powerful customization that are not possible using pip, apt, and conda-file package installation alone.

In this blog post we will explore this feature, seeing what kinds of customization we can perform using this powerful feature!

How it works (the basics)

Before we dive into Docker image customization, let’s first talk about how environment customization on Spell works.

Most of the time, customizing your runtime environment means using a combination of the framework, pip, apt, and conda-env flags to (1) select a framework image and (2) install any additional packages you want on top.

Spell frameworks are Python environments we build and maintain for you that come with common machine learning packages preinstalled. Most users stick with the default framework, which comes with TensorFlow 1, PyTorch, Scikit-Learn, and XGBoost; however, if you want TensorFlow 2 or MXNet, you may swap to our tensorflow2 or mxnet frameworks instead. For the full list, see the Customizing Frameworks section of our docs.

Behind the scenes, our frameworks are just Docker images that we’ve built and uploaded to DockerHub. These images are public — you can browse them here — and replaceable: selecting the custom docker image dropdown in the Spell web console, or using the --docker-image flag in the Spell CLI, lets you pass your own framework image into Spell.

Users on Spell Community can use any image publicly available on a Docker registry (e.g. DockerHub). Users on the Spell for Teams plan can additionally use our AWS ECR and GCP GCR integrations (spell cluster add-docker-registry) to access private Docker images hosted on their cloud provider’s container registry service. To learn more, see the sections Using custom public Docker images and Using custom private Docker images in our docs.

At runtime, Spell will docker pull the image and use it to build an internal version with some small Spell-specific fixes (FROM:your-container) and a custom ENTRYPOINT. In the Jupyter workspace case, the entrypoint will be jupyter lab or jupyter notebook. In the run case, the entrypoint will be your run instruction: e.g. if you executed spell run "echo Hello World!", the entrypoint will be echo Hello World!.

At the end of the day, in order for a custom Docker image to be Spell-compatible, the following must be true:

  • If the Docker image will be used for a Spell run, it must have Python 3 installed, and python3 and pip3 must be available on the command line.
  • If the Docker image will be used for a Spell workspace, it must have Jupyter installed and available on the command line and configured to launch on port 8888 (the default), and port 8888 must be free.

It’s also important to note that when using the custom Docker image feature, all of your environment customization has to be done in Docker. The pip, apt, and conda-env flags are not available.

Now that we understand a little bit about how Spell’s custom Docker image feature works, let’s explore some ideas of what you can do with it!

Option 1: use a community-built image

While Spell’s framework images a great starting point, they are fairly bare-bones. If you just want to experiment with some ideas in a "full-featured" workspace that probably already has all of your favorite data science packages installed, you can use a community-built data science image (or build your own).

For example, the Jupyter project curates a wide range of prebuilt Jupyter environments under the aegis of the Jupyter Docker Stacks project. You can create a polyglot Python+R+Julia data science workspace with all of these language’s most common data analytics packages onboard using the datascience-notebook container:

$ spell jupyter --lab \
    --machine-type cpu \
    --docker-image jupyter/datascience-notebook:latest \
    datascience-notebook

For model training on GPU, you can try out the gpu-jupyter container:

$ spell jupyter --lab \
    --machine-type T4 \
    --docker-image 'cschranz/gpu-jupyter:latest' \
    gpu-jupyter

Option 2: build a custom image that inherits from a framework

As I mentioned earlier, Spell’s framework images are publicly available on DockerHub. Our framework images are baked into our machine VMs, so runs and workspaces using Docker images that reuse our framework images will load significantly faster than brand new images with completely novel layers.

Let’s look at an concrete example. Suppose you want to try out the Dask JupyterLab integration, dask-labextension, from inside of a Spell workspace (https://github.com/dask/dask-labextension). One way to do that is to extend an existing Spell image with this new integration. Here’s an example Dockerfile:

FROM spellrun/default-cpu:latest
RUN pip install dask_labextension && \
    jupyter labextension install dask-labextension && \
    jupyter serverextension enable dask_labextension

We build this new image and push it up to DockerHub:

# replace "residentmario" with the name of your DockerHub account
$ docker pull spellrun/default-cpu
$ docker build . -t residentmario/workspace-with-dask-labextension
$ docker push residentmario/workspace-with-dask-labextension

If you launch a workspace on Spell using this container:

$ spell jupyter --lab \
    --machine-type cpu \
    --docker-image residentmario/workspace-with-dask-labextension:latest \
    workspace-with-dask-labextension

There will be a new item on the sidebar for loading Dask dashboard tools.

If you decide to go this route, I recommend making sure your Dockerfile uses a spellrun image as of a specific digest. E.g. FROM spellrun/foo@sha256:[...] and not FROM spellrun/foo:latest, as here. Our images are occasionally updated with new versions of core libraries; using a specific digest version of the image will prevent it from unexpectedly breaking on you.

Option 3: build a completely custom image

The third and final option is building a custom Docker image completely from scratch. Of the three different options, this is the most laborious approach. But it’s also the most powerful, as it enables you to have complete control over the contents of your environment. Everything in your runtime environment is there because you put it there.

For example, I used this approach to build a custom image with the NVIDIA RAPIDS data science framework installed. Using the following Dockerfile:

FROM nvidia/cuda:10.1-base-ubuntu18.04
WORKDIR /spell

RUN apt-get update && \
    apt-get install -y wget && rm -rf /var/lib/apt/lists/*
ENV CONDA_HOME=/root/anaconda/
RUN wget \
    https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.3-Linux-x86_64.sh \
    && mkdir /root/.conda \
    && bash Miniconda3-py37_4.8.3-Linux-x86_64.sh -fbp $CONDA_HOME \
    && rm -f Miniconda3-py37_4.8.3-Linux-x86_64.sh
ENV PATH=/root/anaconda/bin:$PATH
# must be rapids=0.15 due to https://github.com/rapidsai/cudf/issues/5994
# must use numba=0.50.1 due to https://github.com/rapidsai/dask-cuda/pull/385
RUN conda install \
    -c rapidsai-nightly -c nvidia -c conda-forge -c defaults \
    rapids=0.15 cudatoolkit=10.1 numba==0.50.1 jupyterlab

Notice that in this case, we had install conda and add it to our PATH ourselves. You can try this image out yourself:

$ spell jupyter --lab \
    --machine-type t4 \
    --docker-image residentmario/rapids-jupyterlab:latest \
    rapids-workspace

Some tips and tricks for if you decide to go this route:

  • For workspaces and runs that will execute on GPU, you will probably want to use nvidia/cuda as your base image. This saves you a lot of work installing and configuring NVIDIA's GPU toolkit yourself.
  • Spell runs are executed on a machine with a Ubuntu 18 host OS. If your container cares about OS (as GPU containers often do), make sure to take this into account.
  • Because Spell overwrites your container’s ENTRYPOINT with its own, it is not currently possible to activate a conda environment inside of a container. For conda package management, prefer to install your packages in the default base environment instead. Refer to this article for more details.

Conclusion

Custom Docker images are one of the most powerful features available in Spell, enabling complete control over the run environment. In this guide we learned a little about how it worked, and then saw how we can apply this feature in three different ways to unlock progressively more powerful levels of customization for our runs and workspaces.

To learn more, refer to the custom public Docker images and custom private Docker images sections on the "Runs Overview" page in our docs.

Ready to Get Started?

Create an account in minutes or connect with our team to learn how Spell can accelerate your business.