DeepScaler Math Agent Example
This example demonstrates training and running DeepScaleR, a reasoning LLM finetuned from Deepseek-R1-Distill-1.5B on math competition problems using RL. The model achieves >40% Pass@1 on AIME2024, reaching o1-preview performance despite its small size.
Overview
The DeepScaler examples demonstrate:
- How to use rLLM's MathAgent for mathematical reasoning
- How to train agents with iterative context lengthening (8K -> 16K -> 24K)
- How to evaluate mathematical reasoning with Pass@K metrics
Quick Start
Setup Math Data
First, prepare your mathematical datasets:
Model Hosting
Start a model server (choose one option):
Option 1: Using vLLM
python -m vllm.entrypoints.openai.api_server \
--model agentica-org/DeepScaleR-1.5B-Preview \
--host 0.0.0.0 \
--port 30000 \
--dtype bfloat16
Option 2: Using SGLang
python -m sglang_router.launch_server \
--model-path agentica-org/DeepScaleR-1.5B-Preview \
--dp-size 1 \
--dtype bfloat16
Run DeepScaler Agent
Execute the math reasoning agent:
Train DeepScaler Agent
Train your own DeepScaler agent with iterative context lengthening:
# Train with 8K context
bash train_deepscaler_8k.sh
# Train with 16K context (modify MODEL_PATH to 8k checkpoint)
bash train_deepscaler_16k.sh
# Train with 24K context (modify MODEL_PATH to 16k checkpoint)
bash train_deepscaler_24k.sh
Code Reference
Math Agent Runner
Main script for running mathematical reasoning:
examples/deepscaler/run_deepscaler.py
import asyncio
from transformers import AutoTokenizer
from rllm.agents.math_agent import MathAgent
from rllm.data.dataset import DatasetRegistry
from rllm.engine.agent_execution_engine import AgentExecutionEngine
from rllm.environments.base.single_turn_env import SingleTurnEnvironment
from rllm.rewards.reward_fn import math_reward_fn
from rllm.utils import compute_pass_at_k
if __name__ == "__main__":
import os
os.environ["TOKENIZERS_PARALLELISM"] = "true"
n_parallel_agents = 64
model_name = "agentica-org/DeepScaleR-1.5B-Preview"
tokenizer = AutoTokenizer.from_pretrained(model_name)
reward_fn = math_reward_fn
env_args = {
"reward_fn": reward_fn,
}
sampling_params = {"temperature": 0.6, "top_p": 0.95, "model": model_name}
engine = AgentExecutionEngine(
agent_class=MathAgent,
env_class=SingleTurnEnvironment,
agent_args={},
env_args=env_args,
engine_name="openai",
tokenizer=tokenizer,
sampling_params=sampling_params,
rollout_engine_args={
"base_url": "http://localhost:30000/v1",
"api_key": "None",
},
max_response_length=32768,
max_prompt_length=2048,
n_parallel_agents=n_parallel_agents,
)
test_dataset = DatasetRegistry.load_dataset("aime2024", "test")
if test_dataset is None:
print("Dataset not found, preparing dataset...")
from prepare_math_data import prepare_math_data
_, test_dataset = prepare_math_data()
tasks = test_dataset.repeat(n=16) # repeat to evaluate pass@k
results = asyncio.run(engine.execute_tasks(tasks))
compute_pass_at_k(results)
Training Script
DeepScaler training configuration:
examples/deepscaler/train_deepscaler.py
import hydra
from rllm.agents.math_agent import MathAgent
from rllm.data.dataset import DatasetRegistry
from rllm.environments.base.single_turn_env import SingleTurnEnvironment
from rllm.rewards.reward_fn import math_reward_fn
from rllm.trainer.agent_trainer import AgentTrainer
@hydra.main(config_path="pkg://rllm.trainer.config", config_name="agent_ppo_trainer", version_base=None)
def main(config):
train_dataset = DatasetRegistry.load_dataset("deepscaler_math", "train")
test_dataset = DatasetRegistry.load_dataset("aime2024", "test")
env_args = {"reward_fn": math_reward_fn}
trainer = AgentTrainer(
agent_class=MathAgent,
agent_args={},
env_args=env_args,
env_class=SingleTurnEnvironment,
config=config,
train_dataset=train_dataset,
val_dataset=test_dataset,
)
trainer.train()
if __name__ == "__main__":
main()
For detailed setup instructions, see the README in the deepscaler example directory.