Spell is excited to announce a joint partnership with Graphcore allowing users access to state-of-the-art AI acceleration hardware. For more information see the official announcement blog, and visit our signup page to request access. This tutorial is a quickstart guide for using IPUs in the Spell Developer community account after you have been approved for access.
The IPU-POD is designed to make training of very large and emergent machine-learning models faster, more efficient, and more scalable. The IPU-POD is constructed from a number of IPU-M2000s, each containing four IPUs (e.g., the IPU-POD16 has four IPU-M2000s for 16 IPUs). For more information visit Graphcore Developer Overview.
Spell is hosting a community cluster of multiple IPU-POD16 and IPU-POD64 time-shared among all users, similar to the current Developer account access to NVIDIA-based GPUs. Unlike owning a dedicated IPU machine, Spell is responsible for and abstracts the orchestration (spin up, spin down, resource allocation) of machines, typically done through the Virtual-IPU management software.
When users submit execution jobs (spell run) or attempt to spin up Jupyter Workspaces, the request will be queued within the full IPU cluster, and once jobs are complete, Spell will spin down the active IPU host and move to the next enqueued user.
IPU Software and SDK Overview
Using IPUs requires utilizing the Poplar Software stack, the core of Graphcore’s easy-to-use and flexible software development environment fully integrated with Tensorflow and Pytorch. Developers can freely access public container images via Graphcore’s Dockerhub:
- Poplar SDK - contains Poplar, PopART and tools to interact with IPU devices
- Pytorch for IPU - contains everything in the Poplar SDK repo with Pytorch pre-installed
- TensorFlow for IPU - contains everything in the Poplar SDK repo with TensorFlow 1 or 2 pre-installed.
For every spell run a user will need to include the --docker-image a user will need to include the --docker-image flag to specify the appropriate image for their code. We recommend using the latest tags for each, e.g. -- docker-image graphcore/pytorch:latest.
Running IPUs with spell run remote execution
Remote execution is the core of the Spell platform experience, and we recommend users download the spell pip package:
$ pip install spell $ spell login
The following commands will run a simple MNIST script on PyTorch. First begin by cloning the appropriate repo:
$ git clone https://github.com/graphcore/tutorials.git $ cd tutorials/simple_applications/pytorch/mnist/
You can directly run the following command to begin execution:
$ spell run --machine-type IPUx16 \ --docker-image "graphcore/pytorch:latest" \ --pip-req requirements.txt \ "python3 mnist_poptorch.py"
The initial build process takes 1-2 mins after a machine is acquired and soon you should see MNIST training logs flow.
For more information on Spell execution, including uploading and mounting datasets, please visit our run documentation.
Running IPUs on Spell Workspaces (Jupyter Notebooks)
Spell Workspaces supports interactive development through Jupyter Notebooks either in-browser or connected to local IDE. User can spin up notebook sessions either through the web console “Workspaces” tab or directly via CLI:
$ spell jupyter myipuworkspace \ --machine-type IPUx16 \ --pip-req requirements.txt \ --docker-image "graphcore/pytorch:latest"
Note that you’ll need to specify the appropriate docker image and pip dependencies similar to a spell run. Please visit the tutorials below for notebook tutorials, and visit Spell documentation on Jupyter Workspaces.
Available In-Depth Tutorials
We’ve enabled a suite of tutorials for instant out-of-the-box execution, from toy models to commonly benchmarked large models. Please visit the following folders within our example Github repository:
- Summary of available tutorials (check here first)
- MNIST on PyTorch
- MNIST on TensorFlow/Keras
- Programs and Variables in Poplar
- Infeed and Outfeed Queues in Tensorflow
- ResNet-50 Benchmark on Tensorflow, Tensorflow2
- BERT Large Fine-tuning SQuAD in Tensorflow, Pytorch, PopART
For additional information on IPUs and related SDK, please visit Graphcore’s Developer Portal and Graphcore’s full tutorial Github repository. For live support please visit our Community Slack at #graphcore-community.