Logging¶
Helios supports multiple logging backends. By default, Helios will use the RootLogger for all logging operations. You can optionally enable two other backends:
Tensorboard,
Weights & Biases.
For each optional backend, we’ll cover how to install, configure, and use the backend.
Tensorboard¶
Tensorboard is an optional dependency that can be installed alongside Helios with:
pip install -U helios-ml[tensorboard]
It can then be enabled by passing:
enable_tensorboard=Trueandlog_root
when constructing the Trainer:
import helios
import pathlib
trainer = helios.Trainer(
log_root=pathlib.Path("logs"),
enable_tensorboard=True,
)
Each run creates a folder called <run_name>_<date_time> under
<log_root>/tensorboard. When a checkpoint is saved, this path is stored alongside the
training state. When Helios resumes training, the saved path is restored and new data is
appended to the existing run rather than starting a new one.
Note
In distributed training, the Tensorboard writer is only initialised on rank 0. All other ranks skip Tensorboard entirely.
The writer is accessible via get_tensorboard_writer().
It wraps torch.utils.tensorboard.SummaryWriter and exposes the following
functions:
Weights & Biases¶
W&B is an optional dependency that can be installed alongside Helios with
pip install -U helios-ml[wandb]
It can then be enabled by passing a wandb_args dictionary to the
Trainer constructor:
import helios
trainer = helios.Trainer(
log_root=pathlib.Path("logs"),
wandb_args={
"project": "my-project",
"name": "run-1",
"config": {"lr": 0.001, "batch_size": 32},
},
)
The wandb_args is represented by WandbArgs and
contains the following fields:
Key |
Required |
Description |
|---|---|---|
|
Yes |
W&B project name. |
|
No |
Display name for the run shown in the W&B UI. If not provided, defaults to
the |
|
No |
Hyper-parameter dictionary to associate with the run. |
|
No |
Additional keyword arguments forwarded verbatim to |
When log_root is provided to the Trainer, W&B data is
written under log_root/wandb, otherwise W&B will use its own default directory.
Similarly to the Tensorboard writer, Helios will store the W&B id in the checkpoint. When
Helios resumes training, the saved id is restored and passed back to wandb.init() with
resume="allow" so new data is appended to the existing run.
Note
In distributed training, the W&B writer is only initialised on rank 0. All other ranks skip W&B entirely.