Model Sampler¶
This compoenent chooses opponents for training/evaluation jobs. It maintains a model registry that maintains metadata and TrueSkill ratings for all models (checkpoints and fixed baselines)
API Reference¶
- class BaseModelSampler(model_registry)
Abstract base for opponent sampling strategies. Samplers read from the registry to select opponents and push back results after games.
- Parameters:
model_registry (ray.actor.ActorHandle) – Actor that keeps metadata about each checkpoint and fixed opponent.
Methods
- get_current_ckpt() tuple[str | None, str]¶
Fetch the current checkpoint UID and its corresponding path/name.
- update(game_info: GameInformation, job_info: Dict[str, Any]) None¶
Push training/evaluation outcomes back to the registry.
- Parameters:
game_info (unstable.utils._types.GameInformation) – Holds per-episode results including
final_rewardsper player ID.job_info (Dict[str, Any]) – Dict with at least
"models"(list of dicts withuid,pid) and"env_id".
Expected Methods
- sample_opponent() tuple[str, str, None, str]
Strategy-specific opponent sampling.
Fixed Opponent Sampler¶
- class FixedOpponentModelSampler(model_registry, include_current_ckpt: bool = False)
Samples an opponent uniformly at random from fixed baselines. Optionally, the current checkpoint may be included in the candidate set.
- Parameters:
model_registry (ray.actor.ActorHandle) – Actor that keeps metadata about each checkpoint and fixed opponent.
include_current_ckpt (bool) – Whether to include the current checkpoint as a possible opponent.
Methods
- sample_opponent() tuple[str, str, None, str]¶
Return a randomly selected opponent.
AsynchronousModelSampler¶
- class AsynchronousModelSampler(model_registry)
Samples an opponent uniformly at random from checkpoint models. Designed to support asynchronous self-play where multiple recent checkpoints are active concurrently.
- Parameters:
model_registry (ray.actor.ActorHandle) – Actor that keeps metadata about each checkpoint and fixed opponent.
Methods
- sample_opponent() tuple[str, str, None, str]¶
Return a randomly selected recent checkpoint as opponent along with its metadata.
ModelRegistry (Ray actor)¶
- class ModelRegistry(tracker, beta: float = 4.0, k: int = 1)
Stores models, keeps running TrueSkill ratings, and logs periodic snapshots for analysis.
- Parameters:
tracker (ray.actor.ActorHandle) – The experiment logging tracker.
beta (float) – TrueSkill performance variance parameter (β). Higher values increase the magnitude of rating updates.
k (int) – Maximum number of active checkpoints kept in a rolling window.
Methods
- add_checkpoint(uid: str, path: str, iteration: int, inherit: bool = True) None¶
Register a new checkpoint model. If
inherit=Trueand a current checkpoint exists, initialize the new rating from the current checkpoint’s μ and 2×σ; otherwise use the default TrueSkill prior.- Parameters:
uid (str) – Unique identifier for the checkpoint.
path (str) – Filesystem path or model identifier (LoRA path, HF name, etc.).
iteration (int) – Training iteration when this checkpoint was produced.
inherit (bool) – Whether to initialize rating from the current checkpoint.
- add_fixed(name: str, prior_mu: float = 25.0) None¶
Register a fixed baseline model.
- Parameters:
name (str) – Name of the fixed baseline.
prior_mu (float) – Prior TrueSkill μ for the baseline.
- update_ratings(uids: List[str], scores: List[float], env_id: str, dummy_uid: str = 'fixed-env') None¶
Update TrueSkill ratings after a match between the provided models.
- Parameters:
uids (List[str]) – Ordered list of participating model UIDs.
scores (List[float]) – Real-valued performance scores aligned with
uids.env_id (str) – Identifier of the environment/task where the match occurred (for logging).
dummy_uid (str) – UID of the synthetic opponent for single-player updates.
- get_all_models() Dict[str, ModelMeta]¶
Return a deep copy of the internal registry of all models and their metadata.
- get_current_ckpt() str | None¶
Return the UID of the current checkpoint (or
Noneif none exists).
- get_name_or_lora_path(uid: str) str¶
Return the model’s
path_or_namefor the given UID.
Static Methods
- _scores_to_ranks(scores: List[float]) List[int]
Convert scores to ranking indices (lower rank value is better). Ties receive the same rank.