DeepSWE Software Engineering Agent Example

This example demonstrates training and running DeepSWE, a software-engineering agent trained on top of Qwen3-32B to search, view, and navigate codebases. The model achieves an impressive 59.0% on SWE-Bench-Verified, which is currently #1 in the open-weights category.

Overview

The DeepSWE examples demonstrate:

How to use rLLM's SWEAgent for software engineering tasks.
How to train DeepSWE with compact filtering.
How to evaluate DeepSWE over SWE-Bench-Verified.

Quick Start

Setup Coding Data

First, prepare your coding datasets:

cd examples/swe
python prepare_swe_data.py

Model Hosting

Start a model server using vLLM:

# Start VLLM server with tensor parallelism across 8 GPUs
export MAX_CONTEXT_LEN=65536
export TENSOR_PARALLEL_SIZE=8
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve agentica-org/DeepSWE-Preview \
    --tensor-parallel-size $TENSOR_PARALLEL_SIZE \
    --max-model-len $MAX_CONTEXT_LEN \
    --hf-overrides '{"max_position_embeddings": '$MAX_CONTEXT_LEN'}' \
    --enable_prefix_caching

Run/Evaluate DeepSWE Agent on SWE-Bench-Verified

python run_deepswe.py

To fully reproduce DeepSWE's evaluation, see the official R2E-Gym repo for more details.

Train DeepSWE Agent

To train DeepSWE, we suggest deploying a Kubernetes (K8) cluster on AWS/GCP/Azure. Each node should have a large number of CPUs and diskspace. Each node in our K8 cluster contains 200 CPUs and over 6 TB+ of disk space to store 1000s of Docker images.

To run Kubernetes locally, we suggest installing kind and launching it with kind create cluster. However, please do note that this is not sufficient to launch a full training run.

Next, run the bash script below:

# Train with 16K context
bash train_deepswe_32b.sh

Code Reference

SWE Agent Runner

Main script for evaluating SWE-Bench performance:

examples/deepcoder/run_deepswe.py

import asyncio

from transformers import AutoTokenizer

from rllm.agents.swe_agent import SWEAgent
from rllm.data.dataset import DatasetRegistry
from rllm.engine.agent_execution_engine import AgentExecutionEngine
from rllm.environments.swe.swe import SWEEnv
from rllm.utils import compute_pass_at_k


def load_swe_data():
    if DatasetRegistry.dataset_exists("SWE_Bench_Verified", "test"):
        test_dataset = DatasetRegistry.load_dataset("SWE_Bench_Verified", "test")
        return test_dataset.get_data()
    raise ValueError("SWE_Bench_Verified dataset not found. Please run `python prepare_swe_data.py` to create the dataset.")


if __name__ == "__main__":
    import os

    os.environ["TOKENIZERS_PARALLELISM"] = "true"

    model_name = "agentica-org/DeepSWE-Preview"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    sampling_params = {"temperature": 1, "model": model_name}

    engine = AgentExecutionEngine(
        agent_class=SWEAgent,
        env_class=SWEEnv,
        agent_args={},
        env_args={},
        engine_name="openai",
        tokenizer=tokenizer,
        sampling_params=sampling_params,
        rollout_engine_args={
            "base_url": "http://localhost:30000/v1",
            "api_key": "None",
        },
        n_parallel_agents=48,
        max_response_length=65536,
        max_prompt_length=4096,
    )

    tasks = load_swe_data()

    results = asyncio.run(engine.execute_tasks(tasks))
    compute_pass_at_k(results)

Training Script

DeepSWE training configuration:

examples/deepcoder/train_deepswe_agent.py

import hydra

from rllm.agents.swe_agent import SWEAgent
from rllm.data import DatasetRegistry
from rllm.environments.swe.swe import SWEEnv
from rllm.trainer.agent_trainer import AgentTrainer


@hydra.main(config_path="pkg://rllm.trainer.config", config_name="agent_ppo_trainer", version_base=None)
def main(config):
    # Load SWE datasets - using names from prepare_swe_data.py
    train_dataset = DatasetRegistry.load_dataset("R2E_Gym_Subset", "train")
    val_dataset = DatasetRegistry.load_dataset("SWE_Bench_Verified", "test")

    trainer = AgentTrainer(
        agent_class=SWEAgent,
        env_class=SWEEnv,
        config=config,
        train_dataset=train_dataset,
        val_dataset=val_dataset,
    )
    trainer.train()


if __name__ == "__main__":
    main()

For detailed setup instructions, see the README in the deepswe example directory.