.. raw:: html

An Async, Online, Multi-Turn, Multi-Agent RL library for training reasoning models on TextArena games.

Unstable Baselines is a **lightweight reinforcement-learning research library** focused on self-play for text-based games. Through its deep integration with TextArena, it supports wide range of single and multi-player games. The interface is simple and hackable, making it easy to experiment, extend and customize: .. code-block:: python from unstable import train, get_algorithm_config config = get_algorithm_config("reinforce") config['learner']['learning_rate'] = 1e-5 config['learner']['grad_clip'] = 0.2 config['replay_buffer']['max_buffer_size'] = 800 checkpoint_path = train(config) ---- .. admonition:: Why "Unstable"? :class: tip Our project is meant for rapid prototying of new research ideas. .. note:: Feel free to extend this documentation and open a PR on the `GitHub repository `_. .. toctree:: :maxdepth: 2 :hidden: :caption: Introduction introduction/overview introduction/installation introduction/quickstart config/index .. toctree:: :maxdepth: 2 :hidden: :caption: Collection configuration/game_scheduler configuration/buffer configuration/esampler configuration/msampler configuration/asampler .. toctree:: :maxdepth: 2 :hidden: :caption: Algorithms algorithms/reinforce algorithms/ppo .. toctree:: :maxdepth: 2 :hidden: :caption: Tutorials tutorials/ppo .. toctree:: :maxdepth: 2 :hidden: :caption: Our Projects TextArena Unstable Baselines Contribute!