Gradient-Based Methods
Warning
If you encounter issues, please report them on our GitHub issues page.
In addition to classic RL interfaces, FluidGym environments also support gradient-based methods for policy optimization. This allows users to leverage techniques DPC to be applied directly within FluidGym environments.
Here is a simple example from examples/interfaces/gradient_based_methods.py:
import fluidgym
# Create a FluidGym environment
env = fluidgym.make(
"CylinderJet2D-easy-v0",
differentiable=True, # This flag enables backpropagation through the environment
)
obs, info = env.reset(seed=42)
for _ in range(50):
action = env.sample_action()
# Enable gradient tracking for the action
action.requires_grad_(True)
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
obs, info = env.reset(seed=42)
reward.backward()
print("Action gradients:", action.grad)
# To detach the environment from the computation graph, use:
env.detach()
env.render()
env.save_gif("cylinder.gif")