Environment Sampler ~~~~~~~~~~~~~~~~~~~ Environment samplers define how training and evaluation environments are selected during data collection and evaluation. They provide a flexible interface for sampling from one or more configured environment specifications, with optional stochasticity or adaptive scheduling. We currently provide a uniform random sampler implementation. .. raw:: html
API Reference """"""""""""" .. py:class:: BaseEnvSampler(train_env_specs, eval_env_specs=None, rng_seed=489) Abstract base class for all environment samplers. :param train_env_specs: List of environment specifications used for training. :type train_env_specs: List[unstable.utils._types.TrainEnvSpec] :param eval_env_specs: Optional list of environment specifications used for evaluation. :type eval_env_specs: Optional[List[unstable.utils._types.EvalEnvSpec]] :param rng_seed: Optional integer seed for reproducible random sampling. :type rng_seed: Optional[int] **Methods** .. py:method:: env_list() -> str Return a comma-separated string listing the identifiers of all training environments. **Expected Methods** .. py:method:: sample(kind: str = "train") -> TrainEnvSpec | EvalEnvSpec :noindex: Sample one environment specification according to the implemented strategy. :param kind: Whether to sample from ``"train"`` or ``"eval"`` environments. :type kind: str .. py:method:: update(avg_actor_reward: float, avg_opponent_reward: float | None) :noindex: Update internal state based on observed training metrics. :param avg_actor_reward: Average episodic reward achieved by the learning agent. :type avg_actor_reward: float :param avg_opponent_reward: Optional average reward of the opponent (if applicable). :type avg_opponent_reward: Optional[float] UniformRandomEnvSampler ----------------------- .. py:class:: UniformRandomEnvSampler(train_env_specs, eval_env_specs=None, rng_seed=489) :module: unstable.collection.env_samplers :noindex: Samples environments uniformly at random. :param train_env_specs: List of environment specifications used for training. :type train_env_specs: List[unstable.utils._types.TrainEnvSpec] :param eval_env_specs: Optional list of environment specifications used for evaluation. :type eval_env_specs: Optional[List[unstable.utils._types.EvalEnvSpec]] :param rng_seed: Random seed for reproducibility. :type rng_seed: Optional[int] **Methods** .. py:method:: sample(kind: str = "train") -> TrainEnvSpec | EvalEnvSpec Sample a single environment specification uniformly at random. :param kind: Sampling mode. Either ``"train"`` or ``"eval"``. :type kind: str .. py:method:: update(avg_actor_reward: float, avg_opponent_reward: float | None) -> None This method is a **no-op** for the uniform random strategy, since sampling probabilities are fixed. :param avg_actor_reward: Average episodic reward achieved by the learning agent. :type avg_actor_reward: float :param avg_opponent_reward: Optional average reward of the opponent. :type avg_opponent_reward: Optional[float] ---