Examples

This section contains examples demonstrating how to use rLLM to train agents for various tasks.

Available Examples

🧮 DeepScaler & 💻 DeepCoder

Train reasoning models that aces math competition (e.g. DeepScaleR) and coding contests (e.g. DeepCoder)

🛠️ DeepSWE

Train an autonomous SWEAgent that can write software patches to resolve real-world Github issues.

🔍 Search Agent

Build agents that can search and retrieve information effectively.

🎮 Frozenlake Agent

Classic RL examples using environments like FrozenLake.

📚 Math SFT Training

Supervised fine-tuning of base math models(e.g. Qwen/Qwen2.5-Math-1.5B) using high-quality trajectories generated from teacher models (e.g. DeepScaleR)

⚖️ Solver-Judge Workflow

Train a multi-agent workflow to sample multiple candidate solutions, then use a judge to select the best.