Environment Sampler
~~~~~~~~~~~~~~~~~~~
Environment samplers define how training and evaluation environments are selected during data collection and evaluation. They provide a flexible interface for sampling from one or more configured environment specifications, with optional stochasticity or adaptive scheduling.
We currently provide a uniform random sampler implementation.
.. raw:: html
API Reference
"""""""""""""
.. py:class:: BaseEnvSampler(train_env_specs, eval_env_specs=None, rng_seed=489)
Abstract base class for all environment samplers.
:param train_env_specs: List of environment specifications used for training.
:type train_env_specs: List[unstable.utils._types.TrainEnvSpec]
:param eval_env_specs: Optional list of environment specifications used for evaluation.
:type eval_env_specs: Optional[List[unstable.utils._types.EvalEnvSpec]]
:param rng_seed: Optional integer seed for reproducible random sampling.
:type rng_seed: Optional[int]
**Methods**
.. py:method:: env_list() -> str
Return a comma-separated string listing the identifiers of all training environments.
**Expected Methods**
.. py:method:: sample(kind: str = "train") -> TrainEnvSpec | EvalEnvSpec
:noindex:
Sample one environment specification according to the implemented strategy.
:param kind: Whether to sample from ``"train"`` or ``"eval"`` environments.
:type kind: str
.. py:method:: update(avg_actor_reward: float, avg_opponent_reward: float | None)
:noindex:
Update internal state based on observed training metrics.
:param avg_actor_reward: Average episodic reward achieved by the learning agent.
:type avg_actor_reward: float
:param avg_opponent_reward: Optional average reward of the opponent (if applicable).
:type avg_opponent_reward: Optional[float]
UniformRandomEnvSampler
-----------------------
.. py:class:: UniformRandomEnvSampler(train_env_specs, eval_env_specs=None, rng_seed=489)
:module: unstable.collection.env_samplers
:noindex:
Samples environments uniformly at random.
:param train_env_specs: List of environment specifications used for training.
:type train_env_specs: List[unstable.utils._types.TrainEnvSpec]
:param eval_env_specs: Optional list of environment specifications used for evaluation.
:type eval_env_specs: Optional[List[unstable.utils._types.EvalEnvSpec]]
:param rng_seed: Random seed for reproducibility.
:type rng_seed: Optional[int]
**Methods**
.. py:method:: sample(kind: str = "train") -> TrainEnvSpec | EvalEnvSpec
Sample a single environment specification uniformly at random.
:param kind: Sampling mode. Either ``"train"`` or ``"eval"``.
:type kind: str
.. py:method:: update(avg_actor_reward: float, avg_opponent_reward: float | None) -> None
This method is a **no-op** for the uniform random strategy, since sampling probabilities are fixed.
:param avg_actor_reward: Average episodic reward achieved by the learning agent.
:type avg_actor_reward: float
:param avg_opponent_reward: Optional average reward of the opponent.
:type avg_opponent_reward: Optional[float]
---