helios.plugins.optuna
=====================

.. py:module:: helios.plugins.optuna


Classes
-------

.. autoapisummary::

   helios.plugins.optuna.OptunaPlugin


Functions
---------

.. autoapisummary::

   helios.plugins.optuna.resume_study
   helios.plugins.optuna.checkpoint_sampler
   helios.plugins.optuna.restore_sampler


Module Contents
---------------

.. py:function:: resume_study(study_args: dict[str, Any], failed_states: Sequence = (optuna.trial.TrialState.FAIL, ), backup_study: bool = True) -> optuna.Study

   Resume a study that stopped because of a failure.

   The goal of this function is to allow studies that failed due to an error (either an
   exception, system error etc.) to continue utilising the built-in checkpoint system
   from Helios. To accomplish this, the function will do the following:

   #. Grab all the trials from the study created by the given ``study_args``, splitting
       them into three groups: completed/pruned, failed, and failed but completed.
   #. Create a new study with the same name and storage. This new study will get all of
       the completed trials of the original, and will have the failed trials re-enqueued.

   .. warning::
       This function **requires** the following conditions to be true:
           #. It is called **before** the trials are started.
           #. The study uses ``RDBStorage`` as the storage argument for
           ``optuna.create_study``.
           #. ``load_if_exists`` is set to True in ``study_args``.
           #. ``TrialState.PRUNED`` **cannot** be in the list of ``failed_states``.

   The ``failed_states`` argument can be used to set additional trial states to be
   considered as "failures". This can be useful when dealing with special cases where
   trials were either completed or pruned but need to be re-run.

   By default, the original study (assuming there is one) will be backed up with the name
   ``<study-name>_backup-#`` where ``<study-name>`` is the name of the database of the
   original study, and ``#`` is an incremental number starting at 0. This behaviour can
   be disabled by setting ``backup_study`` to False.

   This function works in tandem with
   :py:meth:`~helios.plugins.optuna.OptunaPlugin.configure_model` to ensure that when
   the failed trial is re-run, the original save name is restored so any saved
   checkpoints can be re-used so the trial can continue instead of starting from
   scratch.

   .. note::
       Only trials that fail but haven't been completed will be enqueued by this
       function. If a trial fails and is completed later on, it will be treated as if it
       had finished successfully.

   :param study_args: dictionary of arguments for ``optuna.create_study``.
   :param failed_states: the trial states that are considered to be failures and should
                         be re-enqueued.
   :param backup_study: if True, the original study is backed up so it can be re-used later
                        on.


.. py:function:: checkpoint_sampler(trial: optuna.Trial, chkpt_root: pathlib.Path) -> None

   Create a checkpoint with the state of the sampler.

   This function can be used to ensure that if a study is restarted, the state of the
   sampler is recovered so trials can be reproducible. The function will automatically
   create a checkpoint using ``torch.save``.

   .. note::
       It is recommended that this function be called at the start of the objective
       function to ensure the checkpoint is made correctly, but it can be called at any
       time.

   :param trial: the current trial.
   :param chkpt_root: the root where the checkpoints will be saved.


.. py:function:: restore_sampler(chkpt_root: pathlib.Path) -> optuna.samplers.BaseSampler | None

   Restore the sampler from a previously saved checkpoint.

   This function can be used in tandem with
   :py:func:`~helios.plugins.optuna.checkpoint_sampler` to ensure that the last
   checkpoint is loaded and the correct state is restored for the sampler. This function
   **needs** to be called before ``optuna.create_study`` is called.

   :param chkpt_root: the root where the checkpoints are stored.

   :returns: The restored sampler.


.. py:class:: OptunaPlugin(trial: optuna.Trial, metric_name: str)

   Bases: :py:obj:`helios.plugins.Plugin`


   Plug-in to do hyper-parameter tuning with Optuna.

   This plug-in integrates `Optuna <https://optuna.readthedocs.io/en/stable/>`__ into the
   training system in order to provide hyper-parameter tuning. The plug-in provides the
   following functionality:

   #. Automatic handling of trial pruning.
   #. Automatic reporting of metrics.
   #. Exception registration for trial pruning.
   #. Easy integration with Helios' checkpoint system to continue stopped trials.

   .. warning::
       This plug-in **requires** Optuna to be installed before being used. If it isn't,
       then :py:exc:`ImportError` is raised.

   .. rubric:: Example

   .. code-block:: python

       import helios.plugins as hlp
       import optuna

       def objective(trial: optuna.Trial) -> float:
           datamodule = ...
           model = ...
           plugin = hlp.optuna.OptunaPlugin(trial, "accuracy")

           trainer = ...

           # Automatically registers the plug-in with the trainer.
           plugin.configure_trainer(trainer)

           # This can be skipped if you don't want the auto-resume functionality or
           # if you wish to manage it yourself.
           plugin.configure_model(model)

           trainer.fit(model, datamodule)
           plugin.check_pruned()
           return model.metrics["accuracy"]

       def main():
           # Note that the plug-in requires the storage to be persistent.
           study = optuna.create_study(storage="sqlite:///example.db", ...)
           study.optimize(objective, ...)

   :param trial: the Optuna trial.
   :param metric_name: the name of the metric to monitor. This assumes the name will be
                       present in the :py:attr:`~helios.model.model.Model.metrics` table.


   .. py:attribute:: plugin_id
      :value: 'optuna'


   .. py:property:: trial
      :type: optuna.Trial


      Return the trial.


   .. py:method:: configure_trainer(trainer: helios.trainer.Trainer) -> None

      Configure the trainer with the required settings.

      This will do two things:

      #. Register the plug-in itself with the trainer.
      #. Append the trial pruned exception to the trainer.

      :param trainer: the trainer instance.


   .. py:method:: configure_model(model: helios.model.Model) -> None

      Configure the model to set the trial number into the save name.

      This will alter the :py:attr:`~helios.model.model.Model.save_name` property of the
      model by appending :code:`_trial-<trial-numer>`.

      :param model: the model instance.


   .. py:method:: suggest(type_name: str, name: str, **kwargs: Any) -> Any

      Generically Wrap the ``suggest_`` family of functions of the optuna trial.

      This function can be used to easily invoke the corresponding ``suggest_`` function
      from the Optuna trial held by the plug-in without having to manually type each
      individual function. This lets you write generic code that can be controlled by an
      external source (such as command line arguments or a config table). The function
      wraps the following functions:

      .. list-table:: Suggestion Functions
          :header-rows: 1

          * - Function
            - Name
          * - ``optuna.Trial.suggest_categorical``
            - categorical
          * - ``optuna.Trial.suggest_int``
            - int
          * - ``optuna.Trial.suggest_float``
            - float

      .. warning::
          Functions that are marked as deprecated by Optuna are *not* included in this
          wrapper.

      .. note::
          You can find the exact arguments for each function `here
          <https://optuna.readthedocs.io/en/stable/reference/generated/optuna.trial.Trial.html>`__.

      .. rubric:: Example

      .. code-block:: python

          import helios.plugin as hlp
          import optuna

          def objective(trial: optuna.Trial) -> float:
              plugin = hlp.optuna.OptunaPlugin(trial, "accuracy")
              # ... configure model and trainer.

              val1 = plugin.suggest("categorical", "val1", choices=[1, 2, 3])
              val2 = plugin.suggest("int", "val2", low=0, high=10)
              val3 = plugin.suggest("float", "val3", low=0, high=1)

      :param type_name: the name of the type to suggest from.
      :param name: a parameter name
      :param \*\*kwargs: keyword arguments to the corresponding suggest function.

      :raises KeyError: if the value passed in to ``type_name`` is not recognised.


   .. py:method:: setup() -> None

      Configure the plug-in.

      :raises ValueError: if the study wasn't created with persistent storage.


   .. py:method:: report_metrics(validation_cycle: int) -> None

      Report metrics to the trial.

      This function should be called from the model once the corresponding metrics have
      been saved into the :py:attr:`~helios.model.model.Model.metrics` table.

      .. rubric:: Example

      .. code-block:: python

          import helios.model as hlm
          import helios.plugins.optuna as hlpo

          class MyModel(hlm.Model):
              ...
              def on_validation_end(self, validation_cycle: int) -> None:
                  # Compute metrics
                  self.metrics["accuracy"] = 10

                  plugin = self.trainer.plugins[hlpo.OptunaPlugin.plugin_id]
                  assert isinstance(plugin hlpo.OptunaPlugin)
                  plugin.report_metrics(validation_cycle)

      .. note::
          In distributed training, only rank 0 will report the metrics to the trial.

      :param validation_cycle: the current validation cycle.


   .. py:method:: should_training_stop() -> bool

      Handle trial pruning.

      :returns: True if the trial should be pruned, false otherwise.


   .. py:method:: on_training_end() -> None

      Clean-up on training end.

      If training is non-distributed and the trial was pruned, then this function will
      do the following:

      #. Call :py:meth:`~helios.model.model.Model.on_training_end` to ensure metrics are
         correctly logged (if using).
      #. Raise :py:exc:`optuna.TrialPruned` exception to signal the trial was pruned.

      If training is distributed, this function does nothing.

      :raises TrialPruned: if the trial was pruned.


   .. py:method:: check_pruned() -> None

      Ensure pruned distributed trials are correctly handled.

      Due to the way distributed training works, we can't raise an exception within the
      distributed processes, so we have to do it after we return to the main process.
      If the trial was pruned, this function will raise :py:exc:`optuna.TrialPruned`. If
      distributed training wasn't used, this function does nothing.

      .. warning::
          You *must* ensure this function is called after
          :py:meth:`~helios.trainer.Trainer.fit` to ensure pruning works correctly.

      :raises TrialPruned: if the trial was pruned.


   .. py:method:: state_dict() -> dict[str, Any]

      Get the state of the current trial.

      This will return the parameters to be optimised for the current trial.

      :returns: The parameters of the trial.