Environment Utils
Utility functions and helpers for environment implementations and management.
rllm.environments.env_utils
compute_trajectory_reward
Add trajectory reward to the dict of each interaction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
trajectory
|
Trajectory
|
List of dictionaries representing each step in the trajectory. |
required |
Returns:
| Type | Description |
|---|---|
Trajectory
|
The updated trajectory with trajectory_reward added to each step. |
Source code in rllm/environments/env_utils.py
compute_mc_return
In-place Monte Carlo returns for a Trajectory dataclass.
G_t = R_{t+1} + γ * G_{t+1}
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
trajectory
|
Trajectory
|
Trajectory object whose .steps is a list of Step objects. |
required |
gamma
|
float
|
Discount factor. |
0.95
|
Returns:
| Type | Description |
|---|---|
Trajectory
|
The same Trajectory, with each step.mc_return filled. |
Source code in rllm/environments/env_utils.py
parallel_task_manager
parallel_task_manager(func: Callable, items: list[Any], max_workers: int = 32) -> Iterator[list[tuple[int, Any]]]
Execute a function in parallel for all items and collect results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
func
|
Callable
|
Function to execute |
required |
items
|
list[Any]
|
List of items to process |
required |
max_workers
|
int
|
Maximum number of workers |
32
|
Yields:
| Type | Description |
|---|---|
list[tuple[int, Any]]
|
List of (idx, result) tuples |