An Async, Online, Multi-Turn, Multi-Agent RL library for training reasoning models on TextArena games.
Unstable Baselines is a lightweight reinforcement-learning research library focused on self-play for text-based games. Through its deep integration with TextArena, it supports wide range of single and multi-player games. The interface is simple and hackable, making it easy to experiment, extend and customize:
from unstable import train, get_algorithm_config
config = get_algorithm_config("reinforce")
config['learner']['learning_rate'] = 1e-5
config['learner']['grad_clip'] = 0.2
config['replay_buffer']['max_buffer_size'] = 800
checkpoint_path = train(config)
Why “Unstable”?
Our project is meant for rapid prototying of new research ideas.
Note
Feel free to extend this documentation and open a PR on the GitHub repository.