Product Release Changelog
Here at Spell, we’re releasing new features, fixing bugs, and updating documentation daily. This changelog is an account of user-facing changes updated on a monthly cadence. For more information please reach out at support@spell.ml.
February 2022
Spell x Graphcore Partnership
Spell has launched a partnership with Graphcore IPUs, allowing users to sign up for a shared community IPU cluster, our first foray into on-demand AI accelerated hardware. Please see the full announcement blog and developer guide. If you're interested in joining our beta preview, sign up here.
Other Feature Improvements
- [Run orchestration] Enable support for private
pip
packages inrequirements.txt
files - [Run orchestration] Added machine types
T4
andT4-big
to Azure;T4-bigger
andT4x8
to AWS; andT4-big
to GCP - [Workspaces] Added ability to edit
pip
andrequirements.txt
dependencies for active Workspaces - [Model Servers] Enable autoupgrade of nodes for GCP model serving clusters
- [Azure] Moved
ram-big
andram-huge
machine types from the Ev3-series to Edsv4-series VMs - [Azure] Deprecated support for
K80
machine types due to compute instance age and limited customer use
Bug Fixes
- [UX] Fixed bug where users invited to orgs would receive follow-up emails even after accepting invitation
- [UX] Fixed bug preventing users from logging in on Chrome 98
- [Run orchestration] Fixed bug allowing users to use both
conda
andpip
to specify dependencies, leading to possible undefined behavior
January 2022
Other Feature Improvements
- [Model Servers] Added support for
conda
environments in model servers - [UX] Enabled
requirements.txt
,pip
,apt
,env
, andconda
field options in the web console when using a custom Docker image during a re-run - [Workspaces] Enabled
requirements.txt
,pip
,apt
,env
, andconda
field options in the web console when using a custom Docker image during a Workspace (previous possible only through CLI flags)
Bug fixes
- [Run orchestration] Fixed bug causing
nvidia-smi
command failure - [Tensorboard] Fixed regression caused by TensorFlow 2.6 failing to fully support S3. See issue.
- [Workspaces] Fixed bug caused by duplicating notebooks with legacy environments
- [Model servers] Fixed bug incorrectly marking active model servers as failed
December 2021
Update on recent Log4j vulnerabilities
On December 9th, 2021 Alibaba Cloud publicly disclosed the Log4Shell (CVE-2021-44228) zero-day vulnerability in Log4j, involving arbitrary code execution, affecting hundreds of millions of devices. Spell proprietary code is entirely unaffected, and our team has conducted a thorough investigation of other OSS dependencies; Spell today uses Kafka and Elasticsearch, both written in Java. Our Kafka version is unaffected by the vulnerability, and we’ve upgraded Elasticsearch to the latest secure version as of December 15th, 2021.
Other Feature Improvements
- [Workspaces] Enable JupyterLab Git extension to streamline git credential input and git command usage from Workspaces
- [Workspaces] Remove need to authenticate with spell (e.g.
spell login
) when running Spell commands in Jupyter Workspaces - [UX] Autofill example fields in Create Run button from empty runs page to Pytorch MNIST example
Bug Fixes
- [Run orchestration] Fixed bug causing private pip packages to fail installation
- [Workspaces] Fixed bug in workspaces where
git init
was failing to execute, leading to build failures Additional documentation for connecting Jupyter workspaces to VSCode Remote
November 2021
Updates to default framework and dockerfile
We've updated the following frameworks and packages in the default image. Please note these changes may require migration work for compatibility, and if your team needs an older version contact us at support@spell.ml:
- CUDA 10.1 to 11.3.1
- PyTorch 1.8.1 to 1.10.0
- Tensorflow 2.3.4 to 2.6.2
- Horovod 0.21.3 to 0.23
Support full --pip
specifications in workspace notebooks and model servers
Notebooks and model servers can now support a wide variety of version specifications like requests>=0.2,<4.1,!=3.1.2
and full git specification for pip, such as git+http://repo/my_project.git#egg=SomeProject
Other Feature Improvements
- [Run orchestration] Environment variables that start with
SECRET
are now anonymized in the web console and appear as<REDACTED>
- [Run orchestration]
--framework
parameter is now deprecated from CLI and Python API. In order to use a specific ML framework, use the--pip
or--conda-env
flag or use a custom docker image - [Model servers]
--instance-type
is no longer a required option fornode-group add
, and will default tom5.large
(AWS) orn1-standard-2
(GCP) - [Model servers]
--name
option innode-group add
flagged for deprecation. This is now an argument to the command. - [Workspaces] Specified idle timeout now applies to idle Workspace terminal instances
- [Docs] Documentation updates on secret environment variables, pip specifications
Bugfixes
- [Run orchestration] Added warning message for GCP machine name validation rules
- [Model servers] Fixed bug for unintended behavior when
kube-cluster create
fails - [Model servers] Fix runtime exceptions for
node-group delete
- [Model servers] Fix
eks-cluster-not-found
exception when AWS profile does not have same region askube-cluster
- [Experiments] Fixed bug preventing bulk adding runs to experiments
October 2021
Model server update and QoL changes
Beginning this month, all newly created EKS clusters will be Kubernetes Version 1.18. Furthermore, we've made QoL improvements to kube-cluster
subcommands: kube-cluster create
no longer re-prompts for configuration values if --use-existing
is used; kube-cluster create
and kube-cluster update
no longer prompt for a kubectl context if more than one kubectl context exists; kube-cluster create
and kube-cluster update
no longer rely on the user's current kubectl context remaining static.
Other Feature Improvements
- [Run orchestration] Log throttling values have been updated from 450 logs every 15 seconds per run to 1000 loglines every 10 seconds, to reduce instances of dropped logs on user runs
- [Run orchestration] Changed query logic for label filters, to speed up instances where organizations have 100+ labels
- [Model Servers] Removed deprecated commands in
spell cluster
which affect model servers. From now on all model server related commands are part ofspell kube-cluster
. - [Model Servers] Speed up model server start up times by moving static
pip
installs to the default image instead of the templates - [Model Servers] Added
--docs
flag tospell server mounts
andspell server models
commands - [Model Servers] Added model file information in
spell model describe modelName:modelVersion
command - [Model Servers] Provide more human-readable errors when user attempts to initialize model servers before
kube-cluster create
- [Experiments] Add support for multiple aggregation columns for the same metric
- [UX] Implemented mechanism to require users to initialize cluster during Spell for Team setup
- [UX] Added tooltip to web console warning users of log throttling behavior under high volume
- [Workspaces] Switched default terminal from
sh
tobash
- [Website] Revamped and updated FAQs page
Bugfixes
- [Run orchestration] Fixed bug causing file mount timeouts in Azure
- [Model Servers] Fixed display bugs in model filepath when using multiple model servers
- [Model Servers] Fixed bug preventing users from updating parts of their model belonging to a private Github repo
- [Model Servers] Fixed bug in listing model servers
- [Model Servers] Fixed bug preventing mounting the base of a bucket in a model server
- [Model Servers] Fixed bug causing unintended side effects from
kube-cluster delete
- [Private Machines] Fixed bug occasionally causing private machines to hang
- [Hyper search] Fixed bug where labels were failing to display during hyperparameter searches
- [Hyper search] Fixed bug causing misbehavior of
spell hyper list
- [Experiments] Fix graphical bugs in experiment chart rendering
- [Workspaces] Fixed bug causing Jupyter workspace timeout after 5ms
September 2021
Google SSO
We’ve implemented Google SSO and Oauth for all users! With this update, users can conveniently and securely sign up and log in to Spell via their existing Google accounts. Additionally, users belonging to an org can take advantage of Google authentication features such as enabling 2-factor authentication for their org members. Existing users can choose to link their accounts to Google SSO or use our existing Spell authentication system.
Multiple models in model servers
We've added direct support for multi-model model servers as well as 0-model model servers. Some relevant changes include updating the spell server serve [model] [entrypoint]
to now spell server serve [model1, model2 ...] [entrypoint]
, as well as spell server models add
and spell server models rm
commands.
JupyterLab 3.0 Extension Upgrade
Spell has upgraded from major version 2 to JupyterLab 3.0 released in January 2021. Among the many improvements, notably users can now directly can now pip install $JUPYTERLAB_EXTENSION_PACKAGE_NAME
and the extension will work next time you launch JupyterLab.
Other improvements and fixes
- [Runs, Workspaces, Model Servers] Added full support for
requirements.txt
specifications handling, including comments, multi-line requirements, and options such as--extra-index-url
,--index-url
, and--find-links
- [Runs] Upgraded default AMI used for orchestration to include latest changes in Conda and Jupyter
- [Runs] Improved error message when improper file is mounted
- [UX] Various style updates to web console sidebar for improved readability
- [UX] Fixed display bugs for displaying empty state model servers pages when user has no cluster set up
- [UX] Consolidated user and billing information tabs for more efficient navigation
- [UX] Graphical updates to web console tabs
- [UX] On account creation, directly link to web login when user email validates account
- [UX] Redesigned billing module for developer accounts to fix inconsistent styling and display bugs
- [UX] Display raw logs button in run page even when logs display module has not fully loaded
- [UX] Fixed bug in accounts page that constantly displayed “missing token”
- [Model servers] Add warnings for newly available kube cluster versions when running spell kube-cluster command
- [Model servers] Rearranged model serving cURL params for easier editing
- [Model servers] Changed model server web console headers to breadcrumb-style for improved readability
- [Docs] Updated installation instructions on self-serve trial signup
- [Docs] Refreshes and updates to documentation for Workspaces, Resources, Tensorboard / WandB integrations pages
- [Docs] Small fixes to section headers and links for Quickstart and Workflows documentation