Runs

See here for additional information regarding runs.

class SpellClient.runs
class RunsService(client)

An object for managing Spell runs.

new(**kwargs)

Create a run.

Parameters
  • command (str) – the command to run

  • machine_type (str, optional) – the machine type for the run (default: CPU)

  • project (str, optional) – the name of the project to associate this run with (default: None)

  • description (str, optional) – a description for the run (default: None)

  • github_url (str, optional) – a GitHub URL to a repository for code to include in the run. Not applicable when workspace_id or commit_label is specified.

  • github_ref (str, optional) – a reference to a commit, branch, or tag in the repository corresponding to github_url for code to include in the run (default: master)

  • pip_packages (list of str, optional) – pip dependencies (default: None). For example: ["moviepy", "scikit-image"]

  • requirements_file (str, optional) – a path to a requirements.txt file

  • apt_packages (list of str, optional) – apt dependencies (default: None). For example: ["python-tk", "ffmpeg"]

  • conda_file (str, optional) – the path to a conda environment specification file (default: None)

  • envvars (dict of str -> str, optional) – name to value mapping of environment variables to be set within the the run (default: None). For example: {"VARIABLE" : "VALUE", "LANG" : "C.UTF-8"}

  • framework (str, optional) – the framework to use for the run (default: None). Options are default or tensorflow1.

  • docker_image (str, optional) – the name of docker image to use as base (default: None)

  • attached_resources (dict of str -> str, optional) – resource name to mountpoint mapping of attached resouces for the run (default: None). For example: {"runs/42" : "/mnt/data"}

  • labels (list of str, optional) – a list of labels to assign to the run

  • cwd (str, optional) – the working directory within the repository in which to execute the command. If github_url is not set, the run will default to the root directory of the run, /spell/. If github_url is set, the run will default to the root of the repository.

  • tensorboard_directory (str, optional) – the path where tensorboard files will be read from. The Tensorboard integration will not be activated if this parameter is not set.

  • distributed (int, optional) – executes this run in distributed mode on N machines of the specified machine_type

  • idempotent (bool, optional) – use an existing identical run if available in lieu of re-running (default: false)

  • params (dict, optional) – key-value pairs to be injected into the run command. Each key in the input will be matched to a corresponding :KEY: in the run command, and value will be substituted. For example, params={"foo": "bar"} and echo :FOO: will map to echo bar. If this run is assigned to a project, this parameter will show up as a column (and be filterable) on its project details page.

  • timeout (int, optional) – run timeout in minutes. If this parameter is set, Spell will stop the run after this many minutes have elapsed. If this parameter is not set the run will never be timed out.

  • auto_resume (bool, optional) – spot instance machine types only. Enable or disable auto-resume. If left unspecified the default value for the machine type will be used.

  • commit_label (str, optional) – workflow runs only. A commit label for code to include in the run. The value must correspond to one of the commit labels set at workflow creation time (spell workflow create) using the --repo or --github-repo options.

  • workflow_id (int, optional) – workflow runs only. The id of the workflow to which this run will be associated (default: None). This argument takes precedence over active_workflow, the value set by the client.

Returns

A Run object.

Raises

ClientException – an error occured.

get(run_id)

Fetch an existing run by ID.

Parameters

run_id (int) – the ID of the run to fetch

list(number=50, project=None, show_uncategorized=False, labels=[])

Fetch a list of runs.

Parameters
  • number (int, optional) – the maximum number of runs to fetch (default: 50).

  • project (str, optional) – the project to fetch runs from (default: None).

  • show_uncategorized (bool, optional) – if set to True, this command outputs only uncategorized runs (runs not assigned to a project). If set to False, this command outputs only categorized runs. Defaults to False.

  • labels (list, optional) – return only runs that have one or more of these labels set.

Returns

An list of Run objects.

Raises

ClientException – an error occured.

BUILDING = 'building'

a constant for the “building” state

Type

str

RUNNING = 'running'

a constant for the “running” state

Type

str

SAVING = 'saving'

a constant for the “saving” state

Type

str

PUSHING = 'pushing'

a constant for the “pushing” state

Type

str

COMPLETE = 'complete'

a constant for the “complete” state

Type

str

FAILED = 'failed'

a constant for the “failed” state

Type

str

STOPPED = 'stopped'

a constant for the “stopped” state

Type

str

KILLED = 'killed'

a constant for the “killed” state

Type

str

INTERRUPTED = 'interrupted'

a constant for the “interrupted” state

Type

str

BUILD_FAILED = 'build_failed'

a constant for the “build_failed” state

Type

str

MOUNT_FAILED = 'mount_failed'

a constant for the “mount_failed” state

Type

str

FINAL = ('complete', 'failed', 'build_failed', 'mount_failed', 'interrupted', 'killed', 'stopped')

a tuple of the constants for the final states (i.e., COMPLETE, FAILED, STOPPED, INTERRUPTED and KILLED)

Type

tuple of str

EQUALS = 'eq'

a constant for the equals condition (i.e. metric == b)

Type

str

GREATER_THAN = 'gt'

a constant for the greater than condition (i.e. metric > b)

Type

str

GREATER_THAN_EQUALS = 'gte'

a constant for the greater than or equals condition (i.e. metric >= b)

Type

str

LESS_THAN = 'lt'

a constant for the less than condition (i.e. metric < b)

Type

str

LESS_THAN_EQUALS = 'lte'

a constant for the less than or equals condition (i.e., metric <= b)

Type

str

Run

class Run

An object representing a single Spell run.

id

the run id

Type

int

status

the run status

Type

str

user_exit_code

the exit code of command

Type

int

command

the run command

Type

str

gpu

the GPU the run executed on

Type

str

git_commit_hash

the commit hash of the workspace repository for the run

Type

str

github_url

the URL of the GitHub repo used in the run

Type

str

description

the run description

Type

str

docker_image

the run docker image

Type

str

framework

the Spell framework for the run

Type

str

created_at

the run creation time

Type

datetime.datetime

started_at

the run start time

Type

datetime.datetime

ended_at

the run end time

Type

datetime.datetime

workspace

the run workspace

Type

Workspace

pip_packages

pip dependencies

Type

list of str

apt_packages

apt dependencies

Type

list of str

attached_resources

resource name to mountpoint mapping of attached resouces for the run

Type

dict of str -> str

environment_vars

name to value mapping of environment variables for the run

Type

dict of str -> str

already_existed

true if an existing identical run was used in lieu of re-running

Type

bool

labels

labels applied to this run

Type

list of str

add_label(label_name)

Add a label to this run.

Parameters

label_name (str) – the label to add

Raises

ClientException – an error occured.

cp(source_path='', destination_directory=None)

Copy a file or directory from the run to local disk.

Parameters
  • source_path (str, optional) – the path within the run to copy (default: empty string, i.e., copy everything from the run)

  • destination_directory (str, optional) – destination directory to copy the file or directory to (default: the current working directory)

Raises

ClientException – an error occured.

Example

>>> client = spell.client.from_environment()
>>> run = client.runs.run(command="echo contents > file", machine_type="CPU")
>>> run.wait_status(client.runs.COMPLETE)
>>> run.cp("file")
>>> with open("file") as f:
    ...     print(f.read())
    ...
    contents
kill()

Kill the run.

Raises

ClientException – an error occured.

logs(follow=False, offset=0)

Get the logs for the run.

A generator of log entries (LogEntry objects). Each log entry corresponds to either an informational message from Spell regarding run status or any line from standard out or standard error that resulted from executing the run command.

Parameters
  • follow (bool, optional) – follow the log lines until the run reaches a final status (default: False)

  • offset (int, optional) – which log line to start from. Negative values represent offsets relative to the latest log line (default: 0)

Yields

A LogEntry object for each log line.

Raises

ClientException – an error occured.

Example

>>> client = spell.client.from_environment()
>>> run = client.runs.run(command="echo 'HELLO!!!!'", machine_type="CPU")
>>> for line in run.logs():
    ...     print(line)
    ...
    Run created -- waiting for a CPU machine.
    Run is building
    Machine acquired -- commencing run
    Run is running
    Retrieving cached environment...
    HELLO!!!!
    Run is saving
    Retrieving modified or new files from the run
    No modified or new files found
metrics(metric_name, follow=False, start=None)

Get metrics for the run.

Parameters
  • metric_name (str) – the name of the user metric

  • start (datetime.datetime, optional) – the offset to start at. (default: None) A value of None will start from the oldest metric. This is an exclusive offset, so only metrics with timestamp greater than offset will be returned.

  • follow (bool, optional) – follow the metrics until the run reaches a final status (default: False)

Yields

A 3-tuple of (timestamp, index, value) for each metric. timestamp is a datetime.datetime object, index is a int, and value is one of int, float, or str.

Raises

ClientException – an error occured.

refresh()

Refresh the run state.

Refresh all of the run attributes with the latest information for the run from Spell.

Raises

ClientException – an error occured.

Example

>>> r.status
'machine_requested'
>>> r.refresh()
>>> r.status
'running'
remove_label(label_name)

Remove a label from this run.

Parameters

label_name (str) – the label to remove

Raises

ClientException – an error occured.

stop()

Stop the run.

Raises

ClientException – an error occured.

wait_metric(metric_name, condition, value)

Wait until the run metric reaches the given condition and then return

Parameters
Raises
wait_status(*statuses)

Wait until the run achieves one of the given statuses and then return.

Parameters

*statuses (required) – variable length list of statuses to wait for. Allowed values are BUILDING, RUNNING, SAVING, SAVING, PUSHING, COMPLETE, FAILED, STOPPED, KILLED,

Raises

Example

>>> client = spell.client.from_environment()
>>> r = client.runs.run(command="sleep 20", machine_type="CPU")
>>> r.wait_status(client.runs.BUILDING)
>>> r.wait_status(*client.runs.FINAL)