Game Scheduler
~~~~~~~~~~~~~~
The game scheduler is responsible for starting games for both, training and evaluation. The class balances work across GPU actors and streams results to the replay buffer.
.. raw:: html
API Reference
"""""""""""""
.. py:class:: GameScheduler(vllm_config, tracker, buffer, model_sampler, env_sampler, action_sampler: str = "default")
:module: unstable.collection.game_scheduler
:noindex:
:param vllm_config: Configuration passed to each :class:`VLLMActor` (see section **vLLM Configuration** below).
:type vllm_config: Mapping[str, Any]
:param tracker: Actor that provides logging and weights and biases integration.
:type tracker: ray.actor.ActorHandle
:param buffer: Actor that stores training trajectories and exposes.
:type buffer: ray.actor.ActorHandle
:param model_sampler: Component that provides the models for the next game.
:type model_sampler: BaseModelSampler
:param env_sampler: Component that provides the environment for the next game.
:type env_sampler: BaseEnvSampler
:param action_sampler: Name of the action-sampling strategy to use for evaluation. Default is "default" which samples a single action from the model.
:type action_sampler: str, optional
**Methods**
.. py:method:: collect(num_train_workers: int, num_eval_workers: Optional[int] = None)
Schedules training (and optionally evaluation) games until the buffer signals to stop.
:param num_train_workers: Maximum number of concurrent training episodes.
:type num_train_workers: int
:param num_eval_workers: Maximum number of concurrent evaluation episodes.
If ``None``, no evaluation episodes are scheduled.
:type num_eval_workers: Optional[int]
vLLM Configuration
""""""""""""""""""
**model_name**
HuggingFace or local model identifier.
**temperature**
Sampling temperature used during text generation (must be ≥ 0.0).
**max_tokens**
Maximum number of tokens to generate per sequence.
**max_parallel_seq**
Maximum number of sequences processed in parallel on a single actor.
**max_loras**
Maximum number of concurrently loaded LoRA adapters.
**max_model_len**
Maximum context window length for the underlying model.
**lora_config**
Optional configuration for LoRA fine-tuning adapters.
**lora_rank** Rank of the LoRA projection matrices.
**lora_alpha** Scaling factor for the LoRA updates.
**lora_dropout** Dropout probability applied to LoRA layers (range 0.0–1.0).
**target_modules** List of target submodules where LoRA adapters are applied.
Typical target modules include:
- ``q_proj``
- ``k_proj``
- ``v_proj``
- ``o_proj``
- ``gate_proj``
- ``up_proj``
- ``down_proj``