Migration Guide¶
This guide covers the changes introduced in 2.0.0 that require updates to existing 1.x code. The changes are grouped by area and ordered by level of impact.
Checkpoint Migration¶
The checkpoint format has changed in 2.0.0, making Helios unable to load checkpoints
created by 1.x. If you wish to load checkpoints, you must migrate them first using the
chkpt_migrator tool:
python -m helios.chkpt_migrator <checkpoint_path>
Run python -m helios.chkpt_migrator --help for the full list of options.
Warning
Attempting to load a 1.x checkpoint in 2.0.0 without migrating it first will result in an error.
Tensorboard is Now Optional¶
Tensorboard has been turned into an optional dependency. If you already have it installed,
no changes are necessary. If you wish to continue using it, install the tensorboard
extra:
pip install -U helios-ml[tensorboard]
If Tensorboard is not installed, calls to create a
TensorboardWriter will raise an import error at runtime.
Logging Module Replaced¶
The helios.core.logging module has been removed and replaced by the
helios.core.loggers subpackage. The new module provides the same functionality but
with a cleaner API and support for additional logging backends (Weights & Biases).
The affected functions and their replacements are listed below.
1.x |
2.0.0 |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
The following functions are unchanged and exist in the new module under the same names:
get_root_logger(),
get_tensorboard_writer(),
and is_root_logger_active().
Before:
import helios.core.logging as hllog
hllog.create_default_loggers(enable_tensorboard=True)
hllog.setup_default_loggers(run_name, log_root)
...
hllog.close_default_loggers()
After:
import helios.core.loggers as hllog
hllog.create_loggers(enable_tensorboard=True)
hllog.setup_loggers(run_name, log_root)
...
hllog.close_loggers()
CUDAPlugin Removed¶
The functionality of the CUDAPlugin has been absorbed by the
Model class via the
batch_to_device() function. The function is called
automatically by the Trainer before each training, validation,
and testing step. As a result of this, the plugin has been removed.
The default implementation of batch_to_device() handles
tensors, as well as lists, tuples, and dictionaries that contain tensors, recursively. If
your batches use a standard structure, no action is required beyond removing the
CUDAPlugin from your setup code.
If you previously overrode CUDAPlugin.process_training_batch() to perform custom
pre-processing, override
batch_to_device() in your
Model subclass instead:
Before:
import helios.plugins as hlp
class MyPlugin(hlp.CUDAPlugin):
def process_training_batch(self, batch, ...):
batch = super().process_training_batch(batch, ...)
# custom processing
return batch
plugin = MyPlugin()
plugin.configure_trainer(trainer)
After:
import helios
import helios.model as hlm
class MyModel(helios.Model):
def batch_to_device(self, batch, phase: hlm.BatchPhase):
batch = super().batch_to_device(batch, phase)
# custom processing
return batch
Plugin Registration Changed¶
The way plug-ins are registered with the trainer has been reworked. Previously, plug-ins had to:
Register themselves by calling
_register_in_trainerinsideconfigure_trainer,Manually call
configure_trainerandconfigure_model.
This has been replaced by register_plugin(). In addition
to this, configure_trainer() and
configure_model() are now called automatically by
the trainer when fit() or
test() is called. They no longer need to be invoked
manually.
Before:
class MyPlugin(hlp.Plugin):
def configure_trainer(self, trainer):
self._register_in_trainer(trainer)
plugin = MyPlugin()
plugin.configure_trainer(trainer)
plugin.configure_model(model)
After:
plugin = MyPlugin()
trainer.register_plugin(plugin)
# configure_trainer and configure_model are called automatically by the trainer.
Model State Dictionary API Changed¶
In 1.x, the Model users had to override
state_dict() and
load_state_dict() directly to save their training
state. In 2.0.0 these functions are reserved for internal use and manage the separation
between user state and Helios internal state (such as the AMP scaler).
Users must now override user_state_dict() and
load_user_state_dict() instead. The signatures and
responsibilities of these functions are identical to their 1.x counterparts.
Before:
def state_dict(self) -> dict:
return {
"net": self._net.state_dict(),
"optimizer": self._optimizer.state_dict(),
}
def load_state_dict(self, state_dict: dict, fast_init: bool = False) -> None:
self._net.load_state_dict(state_dict["net"])
if not fast_init:
self._optimizer.load_state_dict(state_dict["optimizer"])
After:
def user_state_dict(self) -> dict:
return {
"net": self._net.state_dict(),
"optimizer": self._optimizer.state_dict(),
}
def load_user_state_dict(self, state_dict: dict, for_inference: bool) -> None:
self._net.load_state_dict(state_dict["net"])
if not for_inference:
self._optimizer.load_state_dict(state_dict["optimizer"])
fast_init Renamed to for_inference¶
The fast_init parameter on setup() and
load_user_state_dict() has been renamed to
for_inference. The name better reflects its purpose: when True, the model skips
loading training-only state (optimisers, schedulers, etc.) because it is being prepared
for inference rather than for continued training.
Update any overrides of setup() or load_state_dict() (now load_user_state_dict())
to use the new parameter name.
Before:
def setup(self, fast_init: bool = False) -> None:
...
After:
def setup(self, for_inference: bool = False) -> None:
...
Trainer Arguments Changed¶
The log_path and run_path constructor arguments on
Trainer have been replaced by the single log_root
argument. Previously, log_path controlled where the log file was written and
run_path controlled where the Tensorboard run data was written. Both are now derived
automatically from log_root.
Before:
trainer = Trainer(
...,
log_path=pathlib.Path("logs"),
run_path=pathlib.Path("runs"),
)
After:
trainer = Trainer(
...,
log_root=pathlib.Path("logs"),
)
DataLoaderParams.debug_mode Removed¶
The debug_mode field on
DataLoaderParams has been removed. Its only effect
was to set num_workers to 0 when enabled. Set num_workers=0 directly instead.
Before:
params = DataLoaderParams(..., debug_mode=True)
After:
params = DataLoaderParams(..., num_workers=0)
DataLoaderParams.pin_memory Default Changed¶
The default value of pin_memory in
DataLoaderParams has changed from True to
False. This matches the PyTorch default and avoids unexpected behaviour on machines
with limited RAM or in CPU-only environments.
If you rely on pinned memory for GPU training performance, set pin_memory=True
explicitly in your DataLoaderParams.
get_default_numpy_rng() Return Type Changed¶
get_default_numpy_rng() previously returned a
DefaultNumpyRNG wrapper object. It now returns a
numpy.random.Generator directly. Update any code that was accessing the
generator through the wrapper:
Before:
rng = helios.core.rng.get_default_numpy_rng()
value = rng.generator.integers(0, 10)
After:
rng = helios.core.rng.get_default_numpy_rng()
value = rng.integers(0, 10)
New Features in 2.0.0¶
The following features are new in 2.0.0. They do not require changes to existing code but are worth reviewing as they may simplify your training setup.
Mixed Precision Training (AMP)¶
The Model class now includes built-in support for
Automatic Mixed Precision (AMP) training on both GPU (float16 and bfloat16) and
CPU (bfloat16 only). See
Automatic Mixed Precision in the Quick Reference for full details.
Gradient Clipping¶
A clip_gradients() function is now available directly
on the model, with correct integration with the AMP scaler when AMP is active. See
Gradient Clipping in the Quick Reference.
Weights & Biases Support¶
Weights & Biases logging is now supported natively. Pass a wandb_args dictionary to
the Trainer constructor to enable it. See the
Logging page for full details.
Multi-phase Training¶
The DataModule now supports multi-phase training,
allowing successive datasets to be registered and advanced automatically during training.
See Multi-Phase Training in the Quick Reference for details.
should_save_checkpoint() Control¶
Override should_save_checkpoint() to control whether
the trainer writes a checkpoint at a given point. This is useful for implementing
save-best logic.
get_train_steps_per_epoch()¶
get_train_steps_per_epoch() (delegating to
get_train_steps_per_epoch()) is now
available to obtain the number of training steps per epoch. This is useful for
initialising schedulers in setup().
Linear Warmup Scheduler¶
LinearWarmupScheduler wraps any existing scheduler and
applies a linear warmup phase before handing off to the wrapped scheduler.
Expanded Metrics¶
The helios.metrics module now includes additional metrics beyond accuracy:
precision, recall, F1 score, RMSE, SSIM, PSNR, mAP, and MAE.
Expanded ONNX Support¶
export_to_onnx() now supports multiple inputs, multiple outputs,
and dictionary outputs. It also automatically selects between the legacy and dynamo export
paths based on the installed PyTorch version.