Quick Start with rLLM
This guide walks you through using rLLM to build AI agents with tool usage capabilities. We'll use the math tool agent example to demonstrate the complete workflow from dataset preparation through model training.
Overview
In this tutorial, you'll create a math reasoning agent that can:
- Access a Python interpreter to solve mathematical problems
- Perform step-by-step reasoning with interleaved tool usage
- Learn and improve its math problem solving ability through reinforcement learning
The example uses:
- Base Model: Qwen3-4B
- Training Data: DeepScaleR-Preview-Math dataset
- Evaluation Data: AIME 2024 mathematics competition problems
- Tools: Python interpreter for mathematical computations
Prerequisites
Before starting, ensure you have:
- rLLM Installation: Follow the installation guide
- GPU Requirements: At least 1 GPU with 16GB+ memory for inference, 8+ GPUs for training
- Model Server: We'll use vLLM or SGLang to serve the base model
Step 1: Dataset Preparation
rLLM's DatasetRegistry provides a centralized way to manage datasets. Let's prepare the math datasets:
from datasets import load_dataset
from rllm.data.dataset import DatasetRegistry
def prepare_math_data():
train_dataset = load_dataset("agentica-org/DeepScaleR-Preview-Dataset", split="train")
test_dataset = load_dataset("HuggingFaceH4/aime_2024", split="train")
def preprocess_fn(example, idx):
return {
"question": example["problem"],
"ground_truth": example["answer"],
"data_source": "math",
}
train_dataset = train_dataset.map(preprocess_fn, with_indices=True)
test_dataset = test_dataset.map(preprocess_fn, with_indices=True)
train_dataset = DatasetRegistry.register_dataset("deepscaler_math", train_dataset, "train")
test_dataset = DatasetRegistry.register_dataset("aime2024", test_dataset, "test")
return train_dataset, test_dataset
if __name__ == "__main__":
train_dataset, test_dataset = prepare_math_data()
print(train_dataset)
print(test_dataset)
This registers the training dataset deepscaler_math and the testing dataset aime2024. Under the hood, rLLM stores the processed data as parquet files in a format suitable for both inference and training. Later, you can easily load the registered datasets using DatasetRegistry.load_dataset.
Run the preparation script:
Step 2: Model Server Setup
rLLM requires a model server for inference. Choose one of these options:
Option A: vLLM Server
python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen3-4B \
--host 0.0.0.0 \
--port 30000 \
--dtype bfloat16
Option B: SGLang Server
The server provides an OpenAI-compatible API at http://localhost:30000/v1.
Step 3: Model Inference
Now let's run inference to see how agents solve math problems using tools:
import asyncio
from transformers import AutoTokenizer
from rllm.agents import ToolAgent
from rllm.data.dataset import DatasetRegistry
from rllm.engine.agent_execution_engine import AgentExecutionEngine
from rllm.environments.tools.tool_env import ToolEnvironment
from rllm.rewards.reward_fn import math_reward_fn
from rllm.utils import compute_pass_at_k
if __name__ == "__main__":
import os
os.environ["TOKENIZERS_PARALLELISM"] = "true"
n_parallel_agents = 64
model_name = "Qwen/Qwen3-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
agent_args = {"tools": ["python"], "parser_name": "qwen", "system_prompt": "You are a math assistant that can write python to solve math problems."}
env_args = {
"tools": ["python"],
"reward_fn": math_reward_fn,
}
sampling_params = {"temperature": 0.6, "top_p": 0.95, "model": model_name}
engine = AgentExecutionEngine(
agent_class=ToolAgent,
agent_args=agent_args,
env_class=ToolEnvironment,
env_args=env_args,
engine_name="openai",
rollout_engine_args={"base_url": "http://localhost:30000/v1", "api_key": "None"},
tokenizer=tokenizer,
sampling_params=sampling_params,
max_response_length=16384,
max_prompt_length=2048,
n_parallel_agents=n_parallel_agents,
)
test_dataset = DatasetRegistry.load_dataset("aime2024", "test")
if test_dataset is None:
print("Dataset not found, preparing dataset...")
from prepare_math_data import prepare_math_data
_, test_dataset = prepare_math_data()
tasks = test_dataset.repeat(n=8) # repeat to evaluate pass@k
results = asyncio.run(engine.execute_tasks(tasks))
compute_pass_at_k(results)
Run the inference script:
The script above configures a ToolAgent from rLLM with access to the python tool for solving math problems in AIME2024, and a ToolEnvironment for handling Python tool calls and returning results.
The AgentExecutionEngine orchestrates the interaction between the ToolAgent and ToolEnvironment. The execute_tasks function launches 64 agent-environment pairs in parallel (n_parallel_agents=64) for rollout generation and returns results after all problems from the AIME2024 dataset are processed. Finally, the Pass@1 and Pass@K metrics for AIME are computed and printed.
Step 4: Agent Training with GRPO
Training improves the agent's ability to use tools effectively. rLLM uses verl as its training backend, which supports training language models with GRPO and various other RL algorithms.
import hydra
from rllm.agents import ToolAgent
from rllm.data.dataset import DatasetRegistry
from rllm.environments.tools.tool_env import ToolEnvironment
from rllm.rewards.reward_fn import math_reward_fn
from rllm.trainer.agent_trainer import AgentTrainer
@hydra.main(config_path="pkg://rllm.trainer.config", config_name="agent_ppo_trainer", version_base=None)
def main(config):
train_dataset = DatasetRegistry.load_dataset("deepscaler_math", "train")
test_dataset = DatasetRegistry.load_dataset("aime2024", "test")
agent_args = {"tools": ["python"], "parser_name": "qwen", "system_prompt": "You are a math assistant that can write python to solve math problems."}
env_args = {
"tools": ["python"],
"reward_fn": math_reward_fn,
}
trainer = AgentTrainer(
agent_class=ToolAgent,
env_class=ToolEnvironment,
agent_args=agent_args,
env_args=env_args,
config=config,
train_dataset=train_dataset,
val_dataset=test_dataset,
)
trainer.train()
if __name__ == "__main__":
main()
Run the training script:
The script above launches an RL training job for our ToolAgent, using deepscaler_math as the training set and aime2024 as the test set. Under the hood, rLLM handles agent trajectory generation using our AgentExecutionEngine and transforms the trajectories into verl's format for model training using FSDP or Megatron. The training process works as follows:
- Rollout Generation: A batch of data is passed to
AgentExecutionEngine, which launches multiple agent-environment pairs in parallel to process the batch. The engine returns all trajectories along with rewards computed by the environment. - Transform Trajectories: Agent trajectories are transformed into the corresponding format for our training backend
verl. - Advantage Calculation with GRPO:
verluses GRPO for advantage calculation. - Model Update:
verlupdates the model parameters to increase the probability of successful actions. The updated model is then used to generate trajectories for the next batch of data.
Key rLLM Components in This Example
| Component | Purpose | Example Usage |
|---|---|---|
ToolAgent |
Agent with tool usage capabilities | Reasoning + Python execution |
ToolEnvironment |
Safe tool execution environment | Sandboxed Python interpreter |
DatasetRegistry |
Centralized dataset management | Load/register math datasets |
AgentExecutionEngine |
Parallel agent execution | Efficient batch inference |
AgentTrainer |
RL training orchestration | PPO-based agent improvement |
Next Steps
Congratulations! You've successfully used rLLM to run and train a ToolAgent for math problem solving. For a deeper dive into rLLM's main components, check out Core Concepts in rLLM.