Make the most of your compute credits using Spell cluster migration

Because Spell is cloud-agnostic, it is relatively straightforward to switch cloud providers while using the platform. We support GCP, AWS, and Azure, all with minimal vendor lock-in—as a result, swapping a Spell organization between these two clouds is a relatively painless process. This is a very useful feature for teams not yet committed to a single specific cloud provider, or for teams with cloud compute credits on both clouds that they’d like to use.

In this blog post we’ll show you how this works. Note that this process does currently require a manual step that can only be performed by a Spell engineer—current customers, ping us in support channel before you run this operation.

First, some background on clusters

When you create an organizational account in Spell for Teams, one of the first very things you do is create and configure your cluster. A Spell cluster is an isolated VPC within your GCP project or AWS account that encapsulates Spell's userland resources. Clusters are easy to create—just run the spell cluster init gcp, spell cluster init aws, or spell cluster init az command and follow the instructions in the interactive prompt to complete cluster setup:

A Spell organization may have at most one cluster at a time (enterprise customers may use multiple organizations). The cluster currently configured for the organization is listed in the sidebar in the Spell web console.

Deleting a cluster is a similarly easy (though more destructive) operation: just run spell cluster delete. However, it’s not as destructive as you might think at first.

Spell has two tiers of infrastructural resources: "cluster level" resources, which live inside of the Spell cluster, and "organization level" resources, which live inside Spell’s private production environment. Because organization-level resources are not part of the cluster, they are not deleted when you delete the cluster. As a result, they will be carried over to any new cluster that you create — including one created on a completely different cloud provider!

Moving between clouds — an example

Most of Spell’s internal components are actually organization-level resources. They will persist across clusters, no problem. This makes moving from an AWS environment to a GCP environment, or vice versa, really easy.

Our starting point is an organization named aws-org, which has an already-initialized cluster named aws-spell2-cluster attached:

Let’s now swap to a GCP cluster (note that this requires having GCP authentication set up on your local machine):

$ spell cluster delete
Are you SURE you want to delete the spell cluster aws-org? [y/N]: y
# ...some interactive prompts...
# VERY IMPORTANT: make sure to answer NO when asked if you'd like
# to delete your Spell bucket!
$ spell cluster create gcp
# ...some more interactive prompts...
Your cluster gcp-org is initialized!

The next step is moving the resources in the old AWS cluster’s S3 bucket into the new GCP cluster’s GCS bucket. Google’s gsutil CLI tool makes this trivially easy to do. In my case, I ran the following command:

$ gsutil cp -R \
    s3://aleksey-local-aws-bucket \
    gs://aleksey-local-gcp-bucket

In this example we are moving from AWS to GCP. To go the other way around, reverse the order of the s3 and gs buckets in the command. If we were to migrate to or from an Azure cluster instead, you would use the azcopy shell utility instead:

$ azcopy copy 'https://storage.cloud.google.com/ACCOUNT_NAME/' \
    'https://ACCOUNT_NAME.blob.core.windows.net/ACCOUNT_NAME/' \
    --recursive=true

For further reference, the Azure docs have tutorial pages covering this process for both S3 and GCS.

The last step in this process is moving certain table references within Spell's internal database that previously pointed at the old cluster to point to the new cluster instead. This is a straightforward process, but it does require a manual action to be performed by a Spell engineer with access to our production database. We are looking to automate this step in the future, but for now customers looking to migrate clouds that would like to preserve old run data should ping us on a support channel before running the migration.

Returning to our example (moving from AWS to GCP), if we now check back in on the web console, our runs page looks exactly the same! The only difference is that where once there was aws-org, now there is gcp-org instead:

Visiting the clusters detail page confirm that we are now rocking a GCP cluster instead of an AWS one:

And if we schedule a new run (spell run --machine-type cpu sleep 1000) it will pick up the next available run ID, just like we’d expect it to:

Finally, keep in mind that there are two things that don’t transfer over:

  • The old cluster's machine types (including private machine types) will be deleted and will need to be recreated.
  • The old cluster's Spell model servers will be terminated and will not carry over. You will need to reinitialize them yourself.

Happy training!

Ready to Get Started?

Create an account in minutes or connect with our team to learn how Spell can accelerate your business.