Metrics
A metric is a statistic explaining some aspect of the fittedness of a machine learning model. For example, accuracy and mean squared errors are two common metrics for classification and regression, respectively.
Runs on Spell can record metrics over the course of their execution. Spell will display these metrics on the Run page in the Spell web console.
We log three different kinds of metrics: hardware metrics, framework metrics, and custom user metrics.
Hardware metrics
All Spell runs monitor and track CPU, Memory, and Network. If the machine is equipped with a GPU, this will be tracked as well.
Framework metrics
If your run uses Keras, some common metrics (accuracy
, loss
, val_accuracy
, val_loss
) will be logged automatically.
If your run uses Tensorflow 1, any metrics logged to a file using the tf.summary
and tf.summary.SummaryWriter
methods (docs) will be logged automatically.
User metrics
You can log custom user metrics from inside of a Spell run using the Python API. Here's a trivial example:
import spell.metrics as metrics
import time
import argparse
# Runs for --steps seconds and sends --steps spell metrics with the key 'value'
# and a numeric value starting at --start and incrementing by --stepsize
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--start", type=float, help="Value to start at")
parser.add_argument("--steps", type=int, help="Number of metrics to send")
parser.add_argument("--stepsize", type=float, help="Size of step to take")
args = parser.parse_args()
value = args.start
for i in range(args.steps):
print("Sending metric {}".format(value))
metrics.send_metric("value", value)
value += args.stepsize
time.sleep(1)
This script uses spell.metrics.send_metric
to log an ascending sequence of values to a Spell metric, then exit. Try it yourself:
$ spell run \
--github-url https://github.com/spellml/examples.git \
"python metrics/basic.py"
Alternatively, to try a more complex PyTorch example, run:
$ spell run \
--github-url https://github.com/spellml/examples.git \
"python metrics/pytorch.py"
Upon visiting this run in the web console you should see something like the following:
Note
There is a limit of 50 unique metric names per run, and 1 value per second per metric name.
(Advanced) Using metrics with hyperparameter search
The hyperparameter search feature in Spell for Teams makes extensive use of custom metrics. To learn more about this feature, refer to the page "Hyperparameter searches" in the docs.
(Advanced) Getting metrics using the Python API
In addition to being able to read metrics on the Run Details page on the Spell website, you can also use the Python API to to read and interact with metrics.
import pandas as pd
import spell.client
client = spell.client.from_environment()
# replace with the actual run id value
RUN_ID = 123
run = client.runs.get(RUN_ID)
# we return the metrics data as a generator
metric = run.metrics("keras/val_accuracy")
df = pd.DataFrame(metric, columns=["timestamp", "index", "value"])
df.to_csv('metrics.csv')