Environment Sampler

Environment samplers define how training and evaluation environments are selected during data collection and evaluation. They provide a flexible interface for sampling from one or more configured environment specifications, with optional stochasticity or adaptive scheduling. We currently provide a uniform random sampler implementation.

API Reference

class BaseEnvSampler(train_env_specs, eval_env_specs=None, rng_seed=489)

Abstract base class for all environment samplers.

Parameters:
  • train_env_specs (List[unstable.utils._types.TrainEnvSpec]) – List of environment specifications used for training.

  • eval_env_specs (Optional[List[unstable.utils._types.EvalEnvSpec]]) – Optional list of environment specifications used for evaluation.

  • rng_seed (Optional[int]) – Optional integer seed for reproducible random sampling.

Methods

env_list() str

Return a comma-separated string listing the identifiers of all training environments.

Expected Methods

sample(kind: str = 'train') TrainEnvSpec | EvalEnvSpec

Sample one environment specification according to the implemented strategy.

Parameters:

kind (str) – Whether to sample from "train" or "eval" environments.

update(avg_actor_reward: float, avg_opponent_reward: float | None)

Update internal state based on observed training metrics.

Parameters:
  • avg_actor_reward (float) – Average episodic reward achieved by the learning agent.

  • avg_opponent_reward (Optional[float]) – Optional average reward of the opponent (if applicable).

UniformRandomEnvSampler

class unstable.collection.env_samplers.UniformRandomEnvSampler(train_env_specs, eval_env_specs=None, rng_seed=489)

Samples environments uniformly at random.

Parameters:
  • train_env_specs (List[unstable.utils._types.TrainEnvSpec]) – List of environment specifications used for training.

  • eval_env_specs (Optional[List[unstable.utils._types.EvalEnvSpec]]) – Optional list of environment specifications used for evaluation.

  • rng_seed (Optional[int]) – Random seed for reproducibility.

Methods

sample(kind: str = 'train') TrainEnvSpec | EvalEnvSpec

Sample a single environment specification uniformly at random.

Parameters:

kind (str) – Sampling mode. Either "train" or "eval".

update(avg_actor_reward: float, avg_opponent_reward: float | None) None

This method is a no-op for the uniform random strategy, since sampling probabilities are fixed.

Parameters:
  • avg_actor_reward (float) – Average episodic reward achieved by the learning agent.

  • avg_opponent_reward (Optional[float]) – Optional average reward of the opponent.