.. raw:: html
An Async, Online, Multi-Turn, Multi-Agent RL library for training reasoning models on TextArena games.
Unstable Baselines is a **lightweight reinforcement-learning research library** focused on self-play for text-based games.
Through its deep integration with TextArena, it supports wide range of single and multi-player games.
The interface is simple and hackable, making it easy to experiment, extend and customize:
.. code-block:: python
from unstable import train, get_algorithm_config
config = get_algorithm_config("reinforce")
config['learner']['learning_rate'] = 1e-5
config['learner']['grad_clip'] = 0.2
config['replay_buffer']['max_buffer_size'] = 800
checkpoint_path = train(config)
----
.. admonition:: Why "Unstable"?
:class: tip
Our project is meant for rapid prototying of new research ideas.
.. note::
Feel free to extend this documentation and open a PR on the `GitHub repository `_.
.. toctree::
:maxdepth: 2
:hidden:
:caption: Introduction
introduction/overview
introduction/installation
introduction/quickstart
config/index
.. toctree::
:maxdepth: 2
:hidden:
:caption: Collection
configuration/game_scheduler
configuration/buffer
configuration/esampler
configuration/msampler
configuration/asampler
.. toctree::
:maxdepth: 2
:hidden:
:caption: Algorithms
algorithms/reinforce
algorithms/ppo
.. toctree::
:maxdepth: 2
:hidden:
:caption: Tutorials
tutorials/ppo
.. toctree::
:maxdepth: 2
:hidden:
:caption: Our Projects
TextArena
Unstable Baselines
Contribute!