Model server authentication patterns on Spell

One of the first tasks you'll take on when preparing a model server deployment for production use is setting up authorization. You almost never want to set your API up in such a way that anyone on the Internet who happens to know your model server URL can access your service.

This blog post discusses three different ways of authenticating client requests against your Spell model server (with some code samples to go along with):

  1. Setting up authentication in the request body
  2. Setting up authentication in the request headers
  3. Using a private model server

Note that this post will not discuss the "how" of your authentication scheme. In particular, we'll just assume here that you have some way to create (and hopefully rotate) valid auth tokens and share them with user agents ahead of time. How you do this is up to you; we present one example, using credstash, in the blog post "Fine-grained access control in Spell using private pip packages".

Option 1—authenticate using the request body

This is the simplest authentication method of all. Assuming you are using the default format for model server requests, JSON, you can just have your server expect an additional key containing authentication information. The server can validate this information, and if it is not valid, return a 401 Not Authorized or similar.

Here's a trivial example of a model server script that does this:

from spell.serving import BasePredictor
# from spell.serving.metrics import send_metric

from starlette.responses import HTTPException

# not a real package or method, just an example!
from your_authentication_middleware import autheticate_token

class Predictor(BasePredictor):
    def predict(self, payload):
        if "auth" not in payload:
            # push the fact that a bad request was made to the server logs
            print("Received a payload without an auth token, returning 401.")
            raise HTTPException(401)
        if not authenticate_token(payload["auth"]):
            print("Received a payload with an invalid auth token, returning 401.")
            raise HTTPException(401)

        # if authentication succeeded, proceed with the request
        return do_work(payload["data"])

print logs to stdout, which is captured by Spell's logger and posted to the model server logs for potential review down the line. The client that makes a bad request just sees a 401 response:

>>> import requests
>>> resp =, json={"data": [...], "auth": [...]})
>>> resp.status_code

Option 2—authenticate using the request header

Placing authentication information in the request body works in a pinch, but is typically discouraged. Packing authentication data into request headers is recommended instead.

MDN has good reference materials on HTTP authentication. Here's a quick demo of the simplest of these schemes, HTTP basic authorization, implemented in a Spell model server:

from spell.serving import BasePredictor

from starlette.requests import Request
from starlette.exceptions import HTTPException

import base64

# replace this logic with your own authentication logic
def validate_auth(auth):
    if not auth.startswith("Basic "):
        return False
    auth = auth[len("Basic "):]
    token = base64.b64decode(auth).decode("ascii")
    token = token.rstrip("\\n ")  # strip out newlines
    return token == "example-token"

class PythonPredictor(BasePredictor):
    def __init__(self):

    async def predict(self, payload: Request):
        # Cf. <>
        if "Authorization" not in payload.headers:
            print("Received a payload without an auth token, returning 401.")
            raise HTTPException(401)
        auth = payload.headers["Authorization"]
        if not validate_auth(auth):
            print("Received a payload with invalid an auth token, returning 401.")
            raise HTTPException(401)

        # auth check passed, proceed to predict body
        data = await payload.json()
        # ...
        return {"status": "ok"}

This code updates the payload parameter in predict to take a Starlette Request object as input. This instructs Spell to pass the entire request body (including, importantly, the request headers) to the method, instead of the default JSON-ified payload. For more information, refer to "Handling non-JSON requests" in the Spell docs.

The Starlette Request object (docs) has a headers field containing HTTP headers. HTTP basic authentication uses an Authorization header with a payload Basic FOO, where FOO is a base64-encoded version of an authorization token. validate_auth decodes the token and checks it authenticity—in this trivial example, it just expects the token payload value example-token.

You can test this code out yourself using curl as follows:

$ spell server serve somemodel:v1
# ...wait for the server to spin up...
$ AUTH_PAYLOAD=$(echo "example-token" | base64)
$ curl -d '{"a": "b"}' \\
    -X POST \\
    -H "Authorization: Basic $AUTH_PAYLOAD" \\

HTTP Basic Authentication is a secure way of controlling access to a model server because Spell model servers use HTTPS encryption, so the header is encrypted in transit. However, because base64 is reversible encoding, note that basic authorization doesn't protect you against certain types of man-in-the-middle attacks, e.g. DNS spoofing.

Option 3—use a private cluster

When you first create a model serving cluster on Spell you are asked whether you want to deploy the cluster in public mode or private mode.

Model servers on a public serving cluster have their endpoint URL exposed to the public Internet. In other words, in this configuration, anyone with knowledge of the endpoint URL can make a request to it. A load balancer in front of the cluster manages incoming connections from the public Internet and routes them as needed.

Model servers on a private serving cluster are a different story. In this configuration, only other services in the same VPC (virtual private cloud) can reach the model server URL. The URL is not exposed to the public Internet—connectivity is terminated at the VPC boundary, and requests to this URL on the public Internet will be a 404 Not Found.

Keeping your model servers private may be an attractive option because they add an additional layer of access control. Assuming, as is typical, that it's OK to give all of the services running in your account access to your model server endpoints, you don't need to implement any of your own authentications at all—AWS or GCP basically does it for you as a matter of course.

However, if you do decide to go this route, there are a couple of things to keep in mind:

  1. Spell is typically deployed into its own VPC. If you need to access your model servers from other services on your account, you will either need to deploy Spell into an existing VPC (which can be fraught), or set up VPC peering to enable access from peer VPCs in the account.
  2. Model server visibility is a cluster-level setting, not a server-level setting. It is not currently possible to deploy private model servers on public serving clusters, or vice versa. This is due to the limitations of EKS/GKE. If you absolutely need access to both types of servers, you'll need to create a second serving cluster in a separate Spell organization.
  3. Model servers deployed in private mode cannot be accessed from your local development environment. You can continue to access the server from inside of Spell runs and workspaces, as the EC2/GCE machines backing these are placed in the same VPC as the serving cluster. But CLI commands like curl or Python API methods like spell.servers.ModelServer.predict will not work on your local machine.
  4. Model servers deployed in private mode use HTTP by default. Private clusters do not allocate public IP addresses to their machines, which is a prerequisite for HTTPS encryption. This is still fairly secure, since private cluster machines are only accessible from inside the VPC. Nevertheless, customers on Spell's enterprise plan can configure custom certificates for private HTTPS.

Because private clusters typically require additional configuration, we typically recommend new Spell users start with a public serving cluster, and only upgrade to a private one if their production security requirements are sufficiently stringent.

Ready to Get Started?

Create an account in minutes or connect with our team to learn how Spell can accelerate your business.