Examples
This section contains examples demonstrating how to use rLLM to train agents for various tasks.
Available Examples
🧮 DeepScaler & 💻 DeepCoder
Train reasoning models that aces math competition (e.g. DeepScaleR) and coding contests (e.g. DeepCoder)
🛠️ DeepSWE
Train an autonomous SWEAgent that can write software patches to resolve real-world Github issues.
🔍 Search Agent
Build agents that can search and retrieve information effectively.
🎮 Frozenlake Agent
Classic RL examples using environments like FrozenLake.
📚 Math SFT Training
Supervised fine-tuning of base math models(e.g. Qwen/Qwen2.5-Math-1.5B) using high-quality trajectories generated from teacher models (e.g. DeepScaleR)
⚖️ Solver-Judge Workflow
Train a multi-agent workflow to sample multiple candidate solutions, then use a judge to select the best.