Serving Clusters
Serving clusters are EKS (on AWS) or GKE (on AWS) Kubernetes clusters that Spell manages for you.
A serving cluster consists of one or more node groups. Each node group consists of a pool of machines of the same instance type—e.g. four g4dn.xlarge
instances would constitute a node group on AWS.
Spell model servers are scheduled on individual node groups on the serving cluster. Knowing how to manage your node groups is important to serving in production.
The serving cluster is managed using the spell kube-cluster
command group. Node groups within the cluster are managed using the spell kube-cluster node-group
command group.
Creating a serving cluster (AWS)
Before using Spell model servers, you will first need to create the Kubernetes cluster that the server will run on. Note that these instructions are for AWS; for GCP, see the next section.
Initializing Spell model serving on AWS uses EKS and requires you first install the following third-party tools:
- kubectl: CLI tool for managing Kubernetes clusters
- eksctl: CLI tool for managing EKS clusters
- aws-iam-authenticator: Utility that allows for authenticating kubectl with EKS clusters via IAM
Then
$ pip install --upgrade 'spell[cluster-aws]'
$ spell kube-cluster create
Creating a serving cluster (GCP)
GCP requires the following third-party tools:
Then
$ pip install --upgrade 'spell[cluster-gcp]'
$ spell kube-cluster create
Creating node groups
Creating the server cluster creates a default
CPU node group.
You can create additional node groups using spell kube-cluster node-group add
. In the simplest configuration, the command takes a name
and an instance-type
:
$ spell kube-cluster node-group add \
--name t4 \
--instance-type g4dn.xlarge
Node groups can also be deployed on spot instances using --spot
:
$ spell kube-cluster node-group add \
--name t4-spot \
--instance-type g4dn.xlarge \
--spot
GCP separates compute from GPU using the concept of accelerators. To create a GPU node group on GCP, you will need to supplement --instance-type
with --accelerator
:
spell kube-cluster node-group add \
--name t4-gcp \
--instance-type n1-standard-1 \
--accelerator nvidia-tesla-t4
Listing node groups
Node groups can be listed using spell kube-cluster node-group list
:
$ spell kube-cluster node-group list
NAME INSTANCE TYPE DISK SIZE MIN NODES MAX NODES
default m5.large 50 1 2
t4 g4dn.xlarge 40 0 0
However, it is usually more convenient to view your node groups using the Node Groups
tab on the cluster management page:
Scaling node groups
Node groups can be scaled using spell kube-cluster node-group scale
. This command provides --min-nodes
and --max-nodes
parameters.
The scheduler will try to ensure that at least --min-nodes
are always present.
If model servers running on the node group request more resources than available on the existing nodes at the time of the request, the cluster will automatically add more machines to the cluster. However, it will never scale to more than --max-nodes
total.
Deleting node groups
Node groups can be deleted using spell kube-cluster node-group delete
. Any model servers running on the node group will need to be stopped first (e.g. via spell server stop
).
(Advanced) Creating custom node groups using eksctl
For advanced node group configuration, it is also possible to pass an eksctl
ClusterConfig
that defines a node group. See the eksctl docs for more details.
$ spell kube-cluster node-group add \
--name t4 \
--config-file custom_nodegroup.yaml
(Advanced) Accessing the cluster directly using kubectl
You may query the underlying EKS or GKE Kubernetetes cluter directly using spell cluster kubectl
. This is only intended for advanced users who are familiar with Kubernetes.
Note
Using kubectl
recklessly has the potential to break your model serving deployment. Avoid using commands that alter Kubernetes state. kubectl get
and kubectl describe
are safe operations.