Examples
This section contains examples demonstrating how to use rLLM to train agents for various tasks.
Available Examples
๐งฉ rLLM SDK
Train agents using the rLLM SDK, including tutorials for a math agent, solver-judge workflow, and LangGraph RAG agent.
๐ RL Training with Tinker
Train a solverโjudge RL workflow using Tinker's hosted GPU service.
๐ก LoRA Training with Verl
Fine-tune a math reasoning agent on GSM8K with LoRA using verl as training backend.
โ๏ธ Solver-Judge Workflow
Train a multi-agent workflow to sample multiple candidate solutions, then use a judge to select the best.
๐๏ธ Vision-Language Models (VLM)
Train multimodal agents that can reason over both images and text, demonstrated with geometry problem solving.
๐งฎ DeepScaler & ๐ป DeepCoder
Train reasoning models that aces math competition (e.g. DeepScaleR) and coding contests (e.g. DeepCoder)
๐ ๏ธ DeepSWE
Train an autonomous SWEAgent that can write software patches to resolve real-world Github issues.
๐ Search Agent
Build agents that can search and retrieve information effectively.
๐ฎ Frozenlake Agent
Classic RL examples using environments like FrozenLake.
๐ง Eval Protocol Integration (FrozenLake)
Use Eval Protocol benchmarks as rLLM workflows for evaluation and RL training.
๐ Math SFT Training
Supervised fine-tuning of base math models(e.g. Qwen/Qwen2.5-Math-1.5B) using high-quality trajectories generated from teacher models (e.g. DeepScaleR)