Skip to content

Examples

This section contains examples demonstrating how to use rLLM to train agents for various tasks.

Available Examples

๐Ÿงฉ rLLM SDK

Train agents using the rLLM SDK, including tutorials for a math agent, solver-judge workflow, and LangGraph RAG agent.

๐Ÿš€ RL Training with Tinker

Train a solverโ€‘judge RL workflow using Tinker's hosted GPU service.

๐Ÿ’ก LoRA Training with Verl

Fine-tune a math reasoning agent on GSM8K with LoRA using verl as training backend.

โš–๏ธ Solver-Judge Workflow

Train a multi-agent workflow to sample multiple candidate solutions, then use a judge to select the best.

๐Ÿ‘๏ธ Vision-Language Models (VLM)

Train multimodal agents that can reason over both images and text, demonstrated with geometry problem solving.

๐Ÿงฎ DeepScaler & ๐Ÿ’ป DeepCoder

Train reasoning models that aces math competition (e.g. DeepScaleR) and coding contests (e.g. DeepCoder)

๐Ÿ› ๏ธ DeepSWE

Train an autonomous SWEAgent that can write software patches to resolve real-world Github issues.

๐Ÿ” Search Agent

Build agents that can search and retrieve information effectively.

๐ŸŽฎ Frozenlake Agent

Classic RL examples using environments like FrozenLake.

๐ŸงŠ Eval Protocol Integration (FrozenLake)

Use Eval Protocol benchmarks as rLLM workflows for evaluation and RL training.

๐Ÿ“š Math SFT Training

Supervised fine-tuning of base math models(e.g. Qwen/Qwen2.5-Math-1.5B) using high-quality trajectories generated from teacher models (e.g. DeepScaleR)