DeepCoder Programming Agent Example
This example demonstrates training and running DeepCoder, a code reasoning LLM fine-tuned from DeepSeek-R1-Distill-Qwen-14B on coding competition problems with RL. The model achieves 60.6% Pass@1 accuracy on LiveCodeBench v5, representing an 8% improvement over the base model.
Overview
The DeepCoder examples demonstrate:
- How to use rLLM's CompetitionCodingAgent for programming tasks
- How to train agents with iterative context lengthening (16K -> 32K)
- How to evaluate coding performance on LiveCodeBench
Quick Start
Setup Coding Data
First, prepare your coding datasets:
Model Hosting
Start a model server (choose one option):
Option 1: Using vLLM
python -m vllm.entrypoints.openai.api_server \
--model agentica-org/DeepCoder-14B-Preview \
--host 0.0.0.0 \
--port 30000 \
--dtype bfloat16 \
--max-model-len 32768
Option 2: Using SGLang
python -m sglang_router.launch_server \
--model-path agentica-org/DeepCoder-14B-Preview \
--dp-size 1 \
--dtype bfloat16
Run DeepCoder Agent
Execute the coding agent for evaluation:
Train DeepCoder Agent
Train your own DeepCoder agent with iterative context lengthening:
# Train with 16K context
bash train_deepcoder_16k.sh
# Train with 32K context (modify MODEL_PATH to 16k checkpoint)
bash train_deepcoder_32k.sh
Code Reference
Code Agent Evaluator
Main script for evaluating coding performance:
examples/deepcoder/run_deepcoder.py
import asyncio
import os
from datetime import datetime
from transformers import AutoTokenizer
from rllm.agents.code_agent import CompetitionCodingAgent
from rllm.data.dataset import DatasetRegistry
from rllm.engine.agent_execution_engine import AgentExecutionEngine
from rllm.environments.base.single_turn_env import SingleTurnEnvironment
from rllm.rewards.reward_fn import code_reward_fn
from rllm.utils import save_trajectories
if __name__ == "__main__":
os.environ["TOKENIZERS_PARALLELISM"] = "true"
n_parallel_agents = 64
model_name = "agentica-org/DeepCoder-14B-Preview"
tokenizer = AutoTokenizer.from_pretrained(model_name)
reward_fn = code_reward_fn
env_args = {
"reward_fn": reward_fn,
}
sampling_params = {"temperature": 0.6, "top_p": 0.95, "model": model_name}
engine = AgentExecutionEngine(
agent_class=CompetitionCodingAgent,
env_class=SingleTurnEnvironment,
agent_args={},
env_args=env_args,
engine_name="openai",
tokenizer=tokenizer,
sampling_params=sampling_params,
rollout_engine_args={
"base_url": "http://localhost:30000/v1",
"api_key": "None",
},
max_response_length=65536,
max_prompt_length=4096,
n_parallel_agents=n_parallel_agents,
)
test_dataset = DatasetRegistry.load_dataset("deepcoder", "test")
if test_dataset is None:
print("Dataset not found, preparing dataset...")
from prepare_deepcoder_data import prepare_deepcoder_data
_, test_dataset = prepare_deepcoder_data()
tasks = test_dataset.get_data()
results = asyncio.run(engine.execute_tasks(tasks))
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
save_trajectories(results, filename=f"deepcoder_trajectories_{len(tasks)}_{timestamp}.pt")
Training Script
DeepCoder training configuration:
examples/deepcoder/train_deepcoder.py
import hydra
from rllm.agents.code_agent import CompetitionCodingAgent
from rllm.data.dataset import DatasetRegistry
from rllm.environments.base.single_turn_env import SingleTurnEnvironment
from rllm.rewards.reward_fn import code_reward_fn
from rllm.trainer.agent_trainer import AgentTrainer
@hydra.main(config_path="pkg://rllm.trainer.config", config_name="agent_ppo_trainer", version_base=None)
def main(config):
train_dataset = DatasetRegistry.load_dataset("deepcoder", "train")
test_dataset = DatasetRegistry.load_dataset("deepcoder", "test")
env_args = {"reward_fn": code_reward_fn}
trainer = AgentTrainer(
agent_class=CompetitionCodingAgent,
agent_args={},
env_args=env_args,
env_class=SingleTurnEnvironment,
config=config,
train_dataset=train_dataset,
val_dataset=test_dataset,
)
trainer.train()
if __name__ == "__main__":
main()
For detailed setup instructions, see the README in the deepcoder example directory.