An Async, Online, Multi-Turn, Multi-Agent RL library for training reasoning models on TextArena games.

Unstable Baselines is a lightweight reinforcement-learning research library focused on self-play for text-based games. Through its deep integration with TextArena, it supports wide range of single and multi-player games. The interface is simple and hackable, making it easy to experiment, extend and customize:

from unstable import train, get_algorithm_config

config = get_algorithm_config("reinforce")
config['learner']['learning_rate'] = 1e-5
config['learner']['grad_clip'] = 0.2
config['replay_buffer']['max_buffer_size'] = 800
checkpoint_path = train(config)

Why “Unstable”?

Our project is meant for rapid prototying of new research ideas.

Note

Feel free to extend this documentation and open a PR on the GitHub repository.