Examples

This section contains examples demonstrating how to use rLLM to train agents for various tasks.

Available Examples

🧩 rLLM SDK

Train agents using the rLLM SDK, including tutorials for a math agent, solver-judge workflow, and LangGraph RAG agent.

🚀 RL Training with Tinker

Train a solver‑judge RL workflow using Tinker's hosted GPU service.

💡 LoRA Training with Verl

Fine-tune a math reasoning agent on GSM8K with LoRA using verl as training backend.

⚖️ Solver-Judge Workflow

Train a multi-agent workflow to sample multiple candidate solutions, then use a judge to select the best.

👁️ Vision-Language Models (VLM)

Train multimodal agents that can reason over both images and text, demonstrated with geometry problem solving.

🧮 DeepScaler & 💻 DeepCoder

Train reasoning models that aces math competition (e.g. DeepScaleR) and coding contests (e.g. DeepCoder)

🛠️ DeepSWE

Train an autonomous SWEAgent that can write software patches to resolve real-world Github issues.

🔍 Search Agent

Build agents that can search and retrieve information effectively.

🎮 Frozenlake Agent

Classic RL examples using environments like FrozenLake.

🧊 Eval Protocol Integration (FrozenLake)

Use Eval Protocol benchmarks as rLLM workflows for evaluation and RL training.

📚 Math SFT Training

Supervised fine-tuning of base math models(e.g. Qwen/Qwen2.5-Math-1.5B) using high-quality trajectories generated from teacher models (e.g. DeepScaleR)