Serving clusters are EKS (on AWS) or GKE (on AWS) Kubernetes clusters that Spell manages for you.
A serving cluster consists of one or more node groups. Each node group consists of a pool of machines of the same instance type—e.g. four
g4dn.xlarge instances would constitute a node group on AWS.
Spell model servers are scheduled on individual node groups on the serving cluster. Knowing how to manage your node groups is important to serving in production.
The serving cluster is managed using the
spell kube-cluster command group. Node groups within the cluster are managed using the
spell kube-cluster node-group command group.
Creating a serving cluster (AWS)
Before using Spell model servers, you will first need to create the Kubernetes cluster that the server will run on. Note that these instructions are for AWS; for GCP, see the next section.
Initializing Spell model serving on AWS uses EKS and requires you first install the following third-party tools:
- kubectl: CLI tool for managing Kubernetes clusters
- eksctl: CLI tool for managing EKS clusters
- aws-iam-authenticator: Utility that allows for authenticating kubectl with EKS clusters via IAM
$ pip install --upgrade 'spell[cluster-aws]' $ spell kube-cluster create
Creating a serving cluster (GCP)
GCP requires the following third-party tools:
$ pip install --upgrade 'spell[cluster-gcp]' $ spell kube-cluster create
Creating node groups
Creating the server cluster creates a
default CPU node group.
You can create additional node groups using
spell kube-cluster node-group add. In the simplest configuration, the command takes a
name and an
$ spell kube-cluster node-group add \ --name t4 \ --instance-type g4dn.xlarge
Node groups can also be deployed on spot instances using
$ spell kube-cluster node-group add \ --name t4-spot \ --instance-type g4dn.xlarge \ --spot
GCP separates compute from GPU using the concept of accelerators. To create a GPU node group on GCP, you will need to supplement
spell kube-cluster node-group add \ --name t4-gcp \ --instance-type n1-standard-1 \ --accelerator nvidia-tesla-t4
Listing node groups
Node groups can be listed using
spell kube-cluster node-group list:
$ spell kube-cluster node-group list NAME INSTANCE TYPE DISK SIZE MIN NODES MAX NODES default m5.large 50 1 2 t4 g4dn.xlarge 40 0 0
However, it is usually more convenient to view your node groups using the
Node Groups tab on the cluster management page:
Scaling node groups
Node groups can be scaled using
spell kube-cluster node-group scale. This command provides
The scheduler will try to ensure that at least
--min-nodes are always present.
If model servers running on the node group request more resources than available on the existing nodes at the time of the request, the cluster will automatically add more machines to the cluster. However, it will never scale to more than
Deleting node groups
Node groups can be deleted using
spell kube-cluster node-group delete. Any model servers running on the node group will need to be stopped first (e.g. via
spell server stop).
(Advanced) Creating custom node groups using eksctl
For advanced node group configuration, it is also possible to pass an
ClusterConfig that defines a node group. See the eksctl docs for more details.
$ spell kube-cluster node-group add \ --name t4 \ --config-file custom_nodegroup.yaml
(Advanced) Accessing the cluster directly using kubectl
You may query the underlying EKS or GKE Kubernetetes cluter directly using
spell cluster kubectl. This is only intended for advanced users who are familiar with Kubernetes.
kubectl recklessly has the potential to break your model serving deployment. Avoid using commands that alter Kubernetes state.
kubectl get and
kubectl describe are safe operations.