Migration Guide =============== This guide covers the changes introduced in 2.0.0 that require updates to existing 1.x code. The changes are grouped by area and ordered by level of impact. Checkpoint Migration -------------------- The checkpoint format has changed in 2.0.0, making Helios unable to load checkpoints created by 1.x. If you wish to load checkpoints, you must migrate them first using the ``chkpt_migrator`` tool:: python -m helios.chkpt_migrator Run ``python -m helios.chkpt_migrator --help`` for the full list of options. .. warning:: Attempting to load a 1.x checkpoint in 2.0.0 without migrating it first will result in an error. Tensorboard is Now Optional --------------------------- Tensorboard has been turned into an optional dependency. If you already have it installed, no changes are necessary. If you wish to continue using it, install the ``tensorboard`` extra:: pip install -U helios-ml[tensorboard] If Tensorboard is not installed, calls to create a :py:class:`~helios.core.loggers.TensorboardWriter` will raise an import error at runtime. Logging Module Replaced ------------------------ The ``helios.core.logging`` module has been removed and replaced by the ``helios.core.loggers`` subpackage. The new module provides the same functionality but with a cleaner API and support for additional logging backends (Weights & Biases). The affected functions and their replacements are listed below. .. list-table:: :header-rows: 1 :widths: 50 50 * - 1.x - 2.0.0 * - ``helios.core.logging`` - ``helios.core.loggers`` * - ``create_default_loggers()`` - :py:func:`~helios.core.loggers.create_loggers` * - ``setup_default_loggers()`` - :py:func:`~helios.core.loggers.setup_loggers` * - ``restore_default_loggers()`` - :py:func:`~helios.core.loggers.restore_loggers` * - ``flush_default_loggers()`` - :py:func:`~helios.core.loggers.flush_loggers` * - ``close_default_loggers()`` - :py:func:`~helios.core.loggers.close_loggers` The following functions are unchanged and exist in the new module under the same names: :py:func:`~helios.core.loggers.get_root_logger`, :py:func:`~helios.core.loggers.get_tensorboard_writer`, and :py:func:`~helios.core.loggers.is_root_logger_active`. **Before:** .. code-block:: python import helios.core.logging as hllog hllog.create_default_loggers(enable_tensorboard=True) hllog.setup_default_loggers(run_name, log_root) ... hllog.close_default_loggers() **After:** .. code-block:: python import helios.core.loggers as hllog hllog.create_loggers(enable_tensorboard=True) hllog.setup_loggers(run_name, log_root) ... hllog.close_loggers() ``CUDAPlugin`` Removed ---------------------- The functionality of the ``CUDAPlugin`` has been absorbed by the :py:class:`~helios.model.model.Model` class via the :py:meth:`~helios.model.model.Model.batch_to_device` function. The function is called automatically by the :py:class:`~helios.trainer.Trainer` before each training, validation, and testing step. As a result of this, the plugin has been removed. The default implementation of :py:meth:`~helios.model.model.Model.batch_to_device` handles tensors, as well as lists, tuples, and dictionaries that contain tensors, recursively. If your batches use a standard structure, no action is required beyond removing the ``CUDAPlugin`` from your setup code. If you previously overrode ``CUDAPlugin.process_training_batch()`` to perform custom pre-processing, override :py:meth:`~helios.model.model.Model.batch_to_device` in your :py:class:`~helios.model.model.Model` subclass instead: **Before:** .. code-block:: python import helios.plugins as hlp class MyPlugin(hlp.CUDAPlugin): def process_training_batch(self, batch, ...): batch = super().process_training_batch(batch, ...) # custom processing return batch plugin = MyPlugin() plugin.configure_trainer(trainer) **After:** .. code-block:: python import helios import helios.model as hlm class MyModel(helios.Model): def batch_to_device(self, batch, phase: hlm.BatchPhase): batch = super().batch_to_device(batch, phase) # custom processing return batch Plugin Registration Changed ---------------------------- The way plug-ins are registered with the trainer has been reworked. Previously, plug-ins had to: * Register themselves by calling ``_register_in_trainer`` inside ``configure_trainer``, * Manually call ``configure_trainer`` and ``configure_model``. This has been replaced by :py:meth:`~helios.trainer.Trainer.register_plugin`. In addition to this, :py:meth:`~helios.plugins.plugin.Plugin.configure_trainer` and :py:meth:`~helios.plugins.plugin.Plugin.configure_model` are now called automatically by the trainer when :py:meth:`~helios.trainer.Trainer.fit` or :py:meth:`~helios.trainer.Trainer.test` is called. They no longer need to be invoked manually. **Before:** .. code-block:: python class MyPlugin(hlp.Plugin): def configure_trainer(self, trainer): self._register_in_trainer(trainer) plugin = MyPlugin() plugin.configure_trainer(trainer) plugin.configure_model(model) **After:** .. code-block:: python plugin = MyPlugin() trainer.register_plugin(plugin) # configure_trainer and configure_model are called automatically by the trainer. Model State Dictionary API Changed ------------------------------------ In 1.x, the :py:class:`~helios.model.model.Model` users had to override :py:meth:`~helios.model.model.Model.state_dict` and :py:meth:`~helios.model.model.Model.load_state_dict` directly to save their training state. In 2.0.0 these functions are reserved for internal use and manage the separation between user state and Helios internal state (such as the AMP scaler). Users must now override :py:meth:`~helios.model.model.Model.user_state_dict` and :py:meth:`~helios.model.model.Model.load_user_state_dict` instead. The signatures and responsibilities of these functions are identical to their 1.x counterparts. **Before:** .. code-block:: python def state_dict(self) -> dict: return { "net": self._net.state_dict(), "optimizer": self._optimizer.state_dict(), } def load_state_dict(self, state_dict: dict, fast_init: bool = False) -> None: self._net.load_state_dict(state_dict["net"]) if not fast_init: self._optimizer.load_state_dict(state_dict["optimizer"]) **After:** .. code-block:: python def user_state_dict(self) -> dict: return { "net": self._net.state_dict(), "optimizer": self._optimizer.state_dict(), } def load_user_state_dict(self, state_dict: dict, for_inference: bool) -> None: self._net.load_state_dict(state_dict["net"]) if not for_inference: self._optimizer.load_state_dict(state_dict["optimizer"]) ``fast_init`` Renamed to ``for_inference`` ------------------------------------------ The ``fast_init`` parameter on :py:meth:`~helios.model.model.Model.setup` and :py:meth:`~helios.model.model.Model.load_user_state_dict` has been renamed to ``for_inference``. The name better reflects its purpose: when ``True``, the model skips loading training-only state (optimisers, schedulers, etc.) because it is being prepared for inference rather than for continued training. Update any overrides of ``setup()`` or ``load_state_dict()`` (now ``load_user_state_dict()``) to use the new parameter name. **Before:** .. code-block:: python def setup(self, fast_init: bool = False) -> None: ... **After:** .. code-block:: python def setup(self, for_inference: bool = False) -> None: ... Trainer Arguments Changed -------------------------- The ``log_path`` and ``run_path`` constructor arguments on :py:class:`~helios.trainer.Trainer` have been replaced by the single ``log_root`` argument. Previously, ``log_path`` controlled where the log file was written and ``run_path`` controlled where the Tensorboard run data was written. Both are now derived automatically from ``log_root``. **Before:** .. code-block:: python trainer = Trainer( ..., log_path=pathlib.Path("logs"), run_path=pathlib.Path("runs"), ) **After:** .. code-block:: python trainer = Trainer( ..., log_root=pathlib.Path("logs"), ) ``DataLoaderParams.debug_mode`` Removed ---------------------------------------- The ``debug_mode`` field on :py:class:`~helios.data.datamodule.DataLoaderParams` has been removed. Its only effect was to set ``num_workers`` to ``0`` when enabled. Set ``num_workers=0`` directly instead. **Before:** .. code-block:: python params = DataLoaderParams(..., debug_mode=True) **After:** .. code-block:: python params = DataLoaderParams(..., num_workers=0) ``DataLoaderParams.pin_memory`` Default Changed ------------------------------------------------ The default value of ``pin_memory`` in :py:class:`~helios.data.datamodule.DataLoaderParams` has changed from ``True`` to ``False``. This matches the PyTorch default and avoids unexpected behaviour on machines with limited RAM or in CPU-only environments. If you rely on pinned memory for GPU training performance, set ``pin_memory=True`` explicitly in your :py:class:`~helios.data.datamodule.DataLoaderParams`. ``get_default_numpy_rng()`` Return Type Changed ------------------------------------------------- :py:func:`~helios.core.rng.get_default_numpy_rng` previously returned a ``DefaultNumpyRNG`` wrapper object. It now returns a :py:class:`numpy.random.Generator` directly. Update any code that was accessing the generator through the wrapper: **Before:** .. code-block:: python rng = helios.core.rng.get_default_numpy_rng() value = rng.generator.integers(0, 10) **After:** .. code-block:: python rng = helios.core.rng.get_default_numpy_rng() value = rng.integers(0, 10) New Features in 2.0.0 ----------------------- The following features are new in 2.0.0. They do not require changes to existing code but are worth reviewing as they may simplify your training setup. Mixed Precision Training (AMP) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The :py:class:`~helios.model.model.Model` class now includes built-in support for Automatic Mixed Precision (AMP) training on both GPU (``float16`` and ``bfloat16``) and CPU (``bfloat16`` only). See :ref:`amp` in the Quick Reference for full details. Gradient Clipping ~~~~~~~~~~~~~~~~~ A :py:meth:`~helios.model.model.Model.clip_gradients` function is now available directly on the model, with correct integration with the AMP scaler when AMP is active. See :ref:`grad-clipping` in the Quick Reference. Weights & Biases Support ~~~~~~~~~~~~~~~~~~~~~~~~~ Weights & Biases logging is now supported natively. Pass a ``wandb_args`` dictionary to the :py:class:`~helios.trainer.Trainer` constructor to enable it. See the :doc:`logging` page for full details. Multi-phase Training ~~~~~~~~~~~~~~~~~~~~ The :py:class:`~helios.data.datamodule.DataModule` now supports multi-phase training, allowing successive datasets to be registered and advanced automatically during training. See :ref:`multi-phase-training` in the Quick Reference for details. ``should_save_checkpoint()`` Control ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Override :py:meth:`~helios.model.model.Model.should_save_checkpoint` to control whether the trainer writes a checkpoint at a given point. This is useful for implementing save-best logic. ``get_train_steps_per_epoch()`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :py:meth:`~helios.model.model.Model.get_train_steps_per_epoch` (delegating to :py:meth:`~helios.data.datamodule.DataModule.get_train_steps_per_epoch`) is now available to obtain the number of training steps per epoch. This is useful for initialising schedulers in :py:meth:`~helios.model.model.Model.setup`. Linear Warmup Scheduler ~~~~~~~~~~~~~~~~~~~~~~~~ :py:class:`~helios.scheduler.LinearWarmupScheduler` wraps any existing scheduler and applies a linear warmup phase before handing off to the wrapped scheduler. Expanded Metrics ~~~~~~~~~~~~~~~~ The :py:mod:`helios.metrics` module now includes additional metrics beyond accuracy: precision, recall, F1 score, RMSE, SSIM, PSNR, mAP, and MAE. Expanded ONNX Support ~~~~~~~~~~~~~~~~~~~~~~ :py:func:`~helios.onnx.export_to_onnx` now supports multiple inputs, multiple outputs, and dictionary outputs. It also automatically selects between the legacy and dynamo export paths based on the installed PyTorch version.