🚀 Phase 6 — SOTA RL & Research Trends
Cutting-edge reinforcement learning methods powering modern AI systems.
Algorithms
- GRPO — Generalized Reinforcement Policy Optimization
- DPO — Direct Preference Optimization
- DreamerV3 — Latent world-model-based imagination training
- Decision Transformer — Sequence modeling for RL trajectories
- GFlowNets — Sampling trajectories proportional to reward
- Diffusion Policies — Diffusion-based action generation
- RLHF — Reward modeling + policy optimization with human feedback
Mini Projects
- PPO → GRPO comparison experiment
- DreamerV3 pixel-control demo
- DPO fine-tuning on preference data
📁 Source folder:
06-SOTA-RL