Serving Clusters

Serving clusters are EKS (on AWS) or GKE (on AWS) Kubernetes clusters that Spell manages for you.

A serving cluster consists of one or more node groups. Each node group consists of a pool of machines of the same instance type—e.g. four g4dn.xlarge instances would constitute a node group on AWS.

Spell model servers are scheduled on individual node groups on the serving cluster. Knowing how to manage your node groups is important to serving in production.

The serving cluster is managed using the spell kube-cluster command group. Node groups within the cluster are managed using the spell kube-cluster node-group command group.

Image of the Node Group tab.

Creating a serving cluster (AWS)

Before using Spell model servers, you will first need to create the Kubernetes cluster that the server will run on. Note that these instructions are for AWS; for GCP, see the next section.

Initializing Spell model serving on AWS uses EKS and requires you first install the following third-party tools:

  • kubectl: CLI tool for managing Kubernetes clusters
  • eksctl: CLI tool for managing EKS clusters
  • aws-iam-authenticator: Utility that allows for authenticating kubectl with EKS clusters via IAM

Then

$ pip install --upgrade 'spell[cluster-aws]'
$ spell kube-cluster create

Creating a serving cluster (GCP)

GCP requires the following third-party tools:

  • kubectl: CLI tool for managing Kubernetes clusters
  • gcloud: CLI tool and SDK to interact with GCP

Then

$ pip install --upgrade 'spell[cluster-gcp]'
$ spell kube-cluster create

Creating node groups

Creating the server cluster creates a default CPU node group.

You can create additional node groups using spell kube-cluster node-group add. In the simplest configuration, the command takes a name and an instance-type:

$ spell kube-cluster node-group add \
    --name t4 \
    --instance-type g4dn.xlarge

Node groups can also be deployed on spot instances using --spot:

$ spell kube-cluster node-group add \
    --name t4-spot \
    --instance-type g4dn.xlarge \
    --spot

GCP separates compute from GPU using the concept of accelerators. To create a GPU node group on GCP, you will need to supplement --instance-type with --accelerator:

spell kube-cluster node-group add \
    --name t4-gcp \
    --instance-type n1-standard-1 \
    --accelerator nvidia-tesla-t4

Listing node groups

Node groups can be listed using spell kube-cluster node-group list:

$ spell kube-cluster node-group list
NAME     INSTANCE TYPE    DISK SIZE    MIN NODES    MAX NODES
default  m5.large         50           1            2
t4       g4dn.xlarge      40           0            0

However, it is usually more convenient to view your node groups using the Node Groups tab on the cluster management page:

Scaling node groups

Node groups can be scaled using spell kube-cluster node-group scale. This command provides --min-nodes and --max-nodes parameters.

The scheduler will try to ensure that at least --min-nodes are always present.

If model servers running on the node group request more resources than available on the existing nodes at the time of the request, the cluster will automatically add more machines to the cluster. However, it will never scale to more than --max-nodes total.

Deleting node groups

Node groups can be deleted using spell kube-cluster node-group delete. Any model servers running on the node group will need to be stopped first (e.g. via spell server stop).

(Advanced) Creating custom node groups using eksctl

For advanced node group configuration, it is also possible to pass an eksctl ClusterConfig that defines a node group. See the eksctl docs for more details.

$ spell kube-cluster node-group add \
    --name t4 \
    --config-file custom_nodegroup.yaml

(Advanced) Accessing the cluster directly using kubectl

You may query the underlying EKS or GKE Kubernetetes cluter directly using spell cluster kubectl. This is only intended for advanced users who are familiar with Kubernetes.

Note

Using kubectl recklessly has the potential to break your model serving deployment. Avoid using commands that alter Kubernetes state. kubectl get and kubectl describe are safe operations.