Planning Across Controls, Reinforcement Learning, and Diffusion

dBackground: Diffusion Models for Reinforcement Learning Survey

"Current works applying Diffusion models on Reinforcement Learning mainly fall into three categories: as planners, as policies, and as data synthesizers." ^[1]

planning in rl: Learning to Combat Compounding-Error in Model-Based Reinforcement Learning

"Planning in RL refers to using a dynamic model to make
decisions imaginarily and selecting the appropriate action to
maximize cumulative rewards." ^[1:1]

"Planning is commonly used in the MBRL framework with a learned dynamic model. However, the planning sequences are usually simulated auto-regressively, which may lead to severe compounding errors, especially in the offline setting due to limited data support." ^[1:2]

planning in diffusion: Planning with Diffusion for Flexible Behavior Synthesis

"Diffusion models are designed to generate clips of the trajectory τ = (s1, a1, r1, . . . , sH , aH , rH ), denoted as x(τ ) = (e1, e2, . . . , eH ). H is the planning horizon. Here et represents the selected elements from (st, at, rt)" ^[1:3]

"In order to make the diffusion planner generate high rewarded trajectories during evaluations, guided sampling techniques are widely adopted." ^[1:4]

"When deploying the trained diffusion planner for online evaluations, fast sampling methods are usually adopted to reduce the inference time." ^[1:5]

Hierarchical Modeling

Safe Planning and Offline Reinforcement Learning

Multi-agent Reinforcement Learning

Multi-Task Reinforcement Learning

Imitation Learning and Robotics

___

Diffusion Models for Reinforcement Learning Survey ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎