Environment Sampler¶

Environment samplers define how training and evaluation environments are selected during data collection and evaluation. They provide a flexible interface for sampling from one or more configured environment specifications, with optional stochasticity or adaptive scheduling. We currently provide a uniform random sampler implementation.

API Reference¶

class BaseEnvSampler(train_env_specs, eval_env_specs=None, rng_seed=489)¶

Abstract base class for all environment samplers.

Parameters:

train_env_specs (List[unstable.utils._types.TrainEnvSpec]) – List of environment specifications used for training.
eval_env_specs (Optional[List[unstable.utils._types.EvalEnvSpec]]) – Optional list of environment specifications used for evaluation.
rng_seed (Optional[int]) – Optional integer seed for reproducible random sampling.

Methods

env_list() → str¶: Return a comma-separated string listing the identifiers of all training environments.

Expected Methods

sample(kind: str = 'train') → TrainEnvSpec | EvalEnvSpec

Sample one environment specification according to the implemented strategy.

Parameters:: kind (str) – Whether to sample from "train" or "eval" environments.

update(avg_actor_reward: float, avg_opponent_reward: float | None)

Update internal state based on observed training metrics.

Parameters:

avg_actor_reward (float) – Average episodic reward achieved by the learning agent.
avg_opponent_reward (Optional[float]) – Optional average reward of the opponent (if applicable).

UniformRandomEnvSampler¶

class unstable.collection.env_samplers.UniformRandomEnvSampler(train_env_specs, eval_env_specs=None, rng_seed=489)

Samples environments uniformly at random.

Parameters:

train_env_specs (List[unstable.utils._types.TrainEnvSpec]) – List of environment specifications used for training.
eval_env_specs (Optional[List[unstable.utils._types.EvalEnvSpec]]) – Optional list of environment specifications used for evaluation.
rng_seed (Optional[int]) – Random seed for reproducibility.

Methods

sample(kind: str = 'train') → TrainEnvSpec | EvalEnvSpec¶

Sample a single environment specification uniformly at random.

Parameters:: kind (str) – Sampling mode. Either "train" or "eval".

update(avg_actor_reward: float, avg_opponent_reward: float | None) → None¶

This method is a no-op for the uniform random strategy, since sampling probabilities are fixed.

Parameters:

avg_actor_reward (float) – Average episodic reward achieved by the learning agent.
avg_opponent_reward (Optional[float]) – Optional average reward of the opponent.

—