Azure Setup

For users on our Spell for Teams plan, we deploy Spell in your cloud and provide the same cluster management tools backing our own internal infrastructure. This means you can keep your data in your own Azure blob containers, perform runs on your own machines, and deploy models within your own cloud infrastructure.

This guide gets you started using Spell in your Azure account.

Note

Azure currently is missing the following Spell features:

  • Model Serving
  • Public Azure Storage Accounts
  • Automatic Disk Resizing
  • Tensorboard Support

If you are interested in using Azure as your cloud provider but need any of these missing features please let us know and we can prioritize adding the desired functionality.

Setting up Azure

  1. Make sure you have an Azure account. If you don’t, you can create one here.
  2. Make sure you have az command line tools installed. If not, follow the instructions from Microsoft's help docs here.
  3. Make sure you have sufficient permissions to create a Resource Group and Service Principal within your logged in Azure account. Use az account show to display your current logged in Azure account, and refer to the Azure RBAC documentation here.

Setting up Spell

  1. Install Spell using pip install --upgrade 'spell[cluster-azure]'. By specifying [cluster-azure], the installation will include dependencies required specifically for Azure cluster deployment.
  2. Log in to the Spell CLI by running spell login.

Setting up Azure resources

Next, we’ll need to set up your Azure resources. We’ve made this easy with the spell cluster init az command.

Run spell cluster init az - optionally adding a -r to name your Resource Group something other than the default rg-spell - and follow the prompts:

This command will help you
        - Select a region to create resources in and a subscription for billing
        - Create an App and Service Principal
        - Create a Resource group in the specified region to manage your resources
        - Assign a role to your Service Principal that allows Spell to spin up and
            down machines and access your Blob Storage
        - Create a uniquely-named storage account
        - Set up an Blob Container to store your run outputs in
        - Set up a VNet and Subnet which Spell will spin up workers in to run your jobs
        - Set up a Security Group providing Spell SSH and Docker access to workers
Enter a display name for this cluster within Spell:

Enter a name for your cluster.

Querying for Subscription ID...

If you have multiple subscriptions associated with the logged in Azure account, Spell will prompt you to choose which one to use for resources created by Spell. If there is only one subscription, Spell will default to that one and skip the prompt.

Please choose a region for your cluster. This might affect machine availability

Choose a region from the provided list. Keep in mind that different Azure regions have different machine types available. This region cannot be changed once the cluster is created.

Successfully created Service Principal `sp-spell-<org-name>`
Successfully created App client secret

Spell will automatically create and configure an AAD App and Service Principal that has the permissions Spell needs to manage VM infrastructure on your behalf.

Please enter a name for the Azure Storage Account Spell will create to store run outputs.
NOTE: Storage account names must be between 3 and 24 characters in length and may only contain numbers and lowercase letters.
Your storage account name must be UNIQUE within Azure. No two storage accounts can have the same name. [spell<yourcompanyname>storage]:

Enter a name for the Storage Account Spell will use to store the outputs of your runs. Keep in mind that this name must be globally unique across all of Azure, and may contain only lowercase letters and numbers.

Almost done! Spell authorization required:
In order to fully authorize your new cluster, Spell must grant VM Image Read permissions to your App.
Please contact your Spell representative and have them grant these permissions now.
Once your Spell representative has confirmed that this is done, continue with 'y'. [y/N]:

The final step is to reach out to your Spell representative so that they can approve your App/Service Principal to read Spell's Shared Virtual Machine Images stored in Spell's Shared Image Gallery. Once your Spell representative has confirmed that this is done, continue with y.

Your cluster <cluster_name> is initialized!

There is one final step that Spell has to execute on our end: Spell has to give your Service Principal "Read" permissions to the Shared Image Gallery that stores Spell's custom machine images. Reach out directly to your Spell representative to ensure this step is completed.

Once the final step is done, you are free to head on over to the web console to create machine types to execute your runs on.

Azure limits

In order to create machines in your new Azure Resource Group, you'll need to make sure your machine limits enable that machine type. If you've just set up your Azure account, some of your limits may be set to 0, so you'll first need to request an increase before you can create machine types in Spell.

Permissions

Spell uses a brand new Service Principal which has "Contributor" access to the brand new Resource Group created during cluster initialization. The only other necessary permissions are the Cross Tenant Image Sharing permissions, which give the "Reader" role to your Service Principal from Spell's Shared Image Gallery. You can read Azure's documentation on this here.