Base Environment
Core environment interface and base functionality that all rLLM environments inherit from.
Base Environment
rllm.environments.base.base_env
BaseEnv
Bases: ABC
Source code in rllm/environments/base/base_env.py
idx
property
writable
The index or identifier of the environment, often used within a batch.
Returns:
| Type | Description |
|---|---|
Any
|
The assigned index or identifier, or None if not set. |
reset
abstractmethod
Standard Gym reset method. Resets the environment to an initial state.
Returns:
| Type | Description |
|---|---|
tuple[dict, dict]
|
A tuple typically containing the initial observation and auxiliary info. |
step
abstractmethod
Standard Gym step method. Executes one time step within the environment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
action
|
Any
|
An action provided by the agent. |
required |
Returns:
| Type | Description |
|---|---|
tuple[Any, float, bool, dict]
|
A tuple containing (observation, reward, done, info). |
Source code in rllm/environments/base/base_env.py
close
from_dict
abstractmethod
staticmethod
Creates an environment instance from a dictionary.
This method should be implemented by concrete subclasses to handle environment-specific initialization from serialized data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
info
|
dict
|
A dictionary containing the necessary information to initialize the environment. |
required |
Returns:
| Type | Description |
|---|---|
BaseEnv
|
An instance of the specific BaseEnv subclass. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If the subclass does not implement this method. |
Source code in rllm/environments/base/base_env.py
Single Turn Environment
rllm.environments.base.single_turn_env
SingleTurnEnvironment
Bases: MultiTurnEnvironment
A simple environment for single-turn interactions with LLMs. This is a special case of MultiTurnEnvironment where max_turns=1. The environment provides a question/prompt and evaluates the response using a custom reward function.
Source code in rllm/environments/base/single_turn_env.py
__init__
Initialize the single turn environment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task
|
dict | None
|
Dictionary containing the task information, including at least a "question" field |
None
|
Source code in rllm/environments/base/single_turn_env.py
get_reward_and_next_obs
Compute the reward based on the task and action.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task
|
dict
|
The task dictionary containing relevant information |
required |
action
|
Any
|
The action taken by the agent |
required |
Returns:
| Type | Description |
|---|---|
tuple[float, dict]
|
Tuple of (reward: float, next_observation: Dict) |
Source code in rllm/environments/base/single_turn_env.py
Multi Turn Environment
rllm.environments.base.multi_turn_env
MultiTurnEnvironment
Bases: BaseEnv, ABC
An environment for multi-turn interactions with LLMs. The environment provides a series of questions/prompts and evaluates responses using a custom reward function. The interaction terminates after reaching the maximum number of turns.
Source code in rllm/environments/base/multi_turn_env.py
__init__
Initialize the multi-turn environment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task
|
dict | None
|
Dictionary containing the task information, including at least a "questions" field with a list of questions for each turn |
None
|
max_turns
|
int
|
Maximum number of turns before terminating the interaction |
3
|
Source code in rllm/environments/base/multi_turn_env.py
step
Take a step in the environment based on the action.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
action
|
Response string from the LLM |
required |
Returns:
| Type | Description |
|---|---|
|
next_observation, reward, terminated, truncated, info |
Source code in rllm/environments/base/multi_turn_env.py
get_reward_and_next_obs
abstractmethod
Abstract method to compute the reward based on the task and action.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task
|
dict
|
The task dictionary containing relevant information |
required |
action
|
Any
|
The action taken by the agent |
required |
Returns:
| Type | Description |
|---|---|
tuple[float, dict]
|
Tuple of (reward: float, metadata: Dict) |