helios.data.samplers ==================== .. py:module:: helios.data.samplers Attributes ---------- .. autoapisummary:: helios.data.samplers.SAMPLER_REGISTRY helios.data.samplers.ResumableSamplerType Classes ------- .. autoapisummary:: helios.data.samplers.ResumableSampler helios.data.samplers.ResumableRandomSampler helios.data.samplers.ResumableSequentialSampler helios.data.samplers.ResumableDistributedSampler Functions --------- .. autoapisummary:: helios.data.samplers.create_sampler Module Contents --------------- .. py:data:: SAMPLER_REGISTRY Global instance of the registry for samplers. .. rubric:: Example .. code-block:: python import helios.data.samplers as hlds # This automatically registers your sampler. @hlds.SAMPLER_REGISTRY.register class MySampler: ... # Alternatively you can manually register a sampler like this: hlds.SAMPLER_REGISTRY.register(MySampler) .. py:function:: create_sampler(type_name: str, *args: Any, **kwargs: Any) -> ResumableSamplerType Create a sampler of the given type. This uses the SAMPLER_REGISTRY to look-up sampler types, so ensure your samplers have been registered before using this function. :param type_name: the type of the transform to create. :param args: positional arguments to pass into the sampler. :param kwargs: keyword arguments to pass into the sampler. :returns: The constructed sampler. .. py:class:: ResumableSampler(batch_size: int) Bases: :py:obj:`torch.utils.data.Sampler` Base class for samplers that are resumable. Let :math:`b_i` be the ith batch for a given epoch :math:`e`. Let the sequence of batches that follow be :math:`b_{i + 1}, b_{i + 2}, \ldots`. Suppose that on iteration :math:`i`, batch :math:`b_i` is loaded, and training is stopped immediately after. A sampler is defined to be resumable if and only if: #. Upon re-starting training on epoch :math:`e`, the next batch the sampler loads is :math:`b_{i + 1}`. #. The order of the subsequent batches :math:`b_{i + 2}, \ldots` must be *identical* to the order that the sampler would've produced for the epoch :math:`e` had training not stopped. :param batch_size: the number of samples per batch. .. py:property:: start_iter :type: int The starting iteration for the sampler. .. py:method:: set_epoch(epoch: int) -> None Set the current epoch for seeding. .. py:class:: ResumableRandomSampler(data_source: Sized, seed: int = 0, batch_size: int = 1) Bases: :py:obj:`ResumableSampler` Random sampler with resumable state. This allows training to stop and resume while guaranteeing that the order in which the batches will be returned stays consistent. It is effectively a replacement to the default ``RandomSampler`` from PyTorch. :param data_source: the dataset to sample from. :param seed: the seed to use for setting up the random generator. :param batch_size: the number of samples per batch. .. py:method:: __len__() -> int Return the length of the dataset. .. py:method:: __iter__() -> Iterator[int] Retrieve the index of the next sample. .. py:class:: ResumableSequentialSampler(data_source: Sized, batch_size: int = 1) Bases: :py:obj:`ResumableSampler` Sequential sampler with resumable state. This allows training to stop and resume while guaranteeing that the order in which the batches will be returned stays consistent. It is effectively a replacement to the default ``SequentialSampler`` from PyTorch. :param data_source: the dataset to sample from. :param batch_size: the number of samples per batch. .. py:method:: __len__() -> int Return the length of the dataset. .. py:method:: __iter__() -> Iterator[int] Retrieve the index of the next sample. .. py:class:: ResumableDistributedSampler(dataset: torch.utils.data.Dataset, num_replicas: int | None = None, rank: int | None = None, shuffle: bool = True, seed: int = 0, drop_last: bool = False, batch_size: int = 1) Bases: :py:obj:`torch.utils.data.DistributedSampler` Distributed sampler with resumable state. This allows training to stop and resume while guaranteeing that the order in which the batches will be returned stays consistent. It is effectively a replacement to the default ``DistributedSampler`` from PyTorch. :param dataset: the dataset to sample from. :param num_replicas: number of processes for distributed training. :param rank: (optional) rank of the current process. :param shuffle: if true, shuffle the indices. Defaults to true. :param seed: random seed used to shuffle the sampler. Defaults to 0. :param drop_last: if true, then drop the final sample to make it even across replicas. Defaults to false. :param batch_size: the number of samples per batch. Defaults to 1. .. py:property:: start_iter :type: int The starting iteration for the sampler. .. py:method:: __iter__() -> Iterator[int] Retrieve the index of the next sample. .. py:data:: ResumableSamplerType Defines the resumable sampler type. A resumable sampler **must** be derived from either :py:class:`~helios.data.samplers.ResumableSampler` or :py:class:`~helios.data.samplers.ResumableDistributedSampler`.