Distributed Training with Horovod
The distributed run feature is particularly useful for doing distributed deep learning. Distributed training runs your code across multiple machines in parallel to get you results even faster.
When you write code for distributed training using the Horovod framework, which works with TensorFlow, Keras, PyTorch, and MXNET, we can easily send your code to run across however many machines you want. All you need to do is add the --distributed parameter to your spell run command.
Simplified Teams Pricing
Spell is committed to building products that make your machine learning workflow faster and more efficient. To make cost more transparent and easier to manage, we’re updating pricing for Teams.
Our Teams subscription plan gives you access to all of Spell’s features and unlimited users. There’s a monthly fee of $399, then we charge based on your overall machine usage: $0.40 / hr for experiments, and $0.10 / hour for model serving with Kubernetes.
If you’re a start-up with <$5M in funding, we’ll waive the monthly fee.
💡 Tip of the month
If you’re running a command that has its own flags, use -- to separate the Spell command flags from your command. For example,
spell run -t k80 "python main.py --myflag hi"
is the same as
spell run -t k80 -- python main.py --myflag hi