Game Scheduler¶
The game scheduler is responsible for starting games for both, training and evaluation. The class balances work across GPU actors and streams results to the replay buffer.
API Reference¶
- class unstable.collection.game_scheduler.GameScheduler(vllm_config, tracker, buffer, model_sampler, env_sampler, action_sampler: str = 'default')
- Parameters:
vllm_config (Mapping[str, Any]) – Configuration passed to each
VLLMActor(see section vLLM Configuration below).tracker (ray.actor.ActorHandle) – Actor that provides logging and weights and biases integration.
buffer (ray.actor.ActorHandle) – Actor that stores training trajectories and exposes.
model_sampler (BaseModelSampler) – Component that provides the models for the next game.
env_sampler (BaseEnvSampler) – Component that provides the environment for the next game.
action_sampler (str, optional) – Name of the action-sampling strategy to use for evaluation. Default is “default” which samples a single action from the model.
Methods
- collect(num_train_workers: int, num_eval_workers: int | None = None)¶
Schedules training (and optionally evaluation) games until the buffer signals to stop.
- Parameters:
num_train_workers (int) – Maximum number of concurrent training episodes.
num_eval_workers (Optional[int]) – Maximum number of concurrent evaluation episodes. If
None, no evaluation episodes are scheduled.
vLLM Configuration¶
- model_name
HuggingFace or local model identifier.
- temperature
Sampling temperature used during text generation (must be ≥ 0.0).
- max_tokens
Maximum number of tokens to generate per sequence.
- max_parallel_seq
Maximum number of sequences processed in parallel on a single actor.
- max_loras
Maximum number of concurrently loaded LoRA adapters.
- max_model_len
Maximum context window length for the underlying model.
- lora_config
Optional configuration for LoRA fine-tuning adapters.
lora_rank Rank of the LoRA projection matrices.
lora_alpha Scaling factor for the LoRA updates.
lora_dropout Dropout probability applied to LoRA layers (range 0.0–1.0).
target_modules List of target submodules where LoRA adapters are applied.
Typical target modules include:
q_projk_projv_projo_projgate_projup_projdown_proj