🚀 Phase 6 — SOTA RL & Research Trends

Cutting-edge reinforcement learning methods powering modern AI systems.

Algorithms

  • GRPO — Generalized Reinforcement Policy Optimization
  • DPO — Direct Preference Optimization
  • DreamerV3 — Latent world-model-based imagination training
  • Decision Transformer — Sequence modeling for RL trajectories
  • GFlowNets — Sampling trajectories proportional to reward
  • Diffusion Policies — Diffusion-based action generation
  • RLHF — Reward modeling + policy optimization with human feedback

Mini Projects

  • PPO → GRPO comparison experiment
  • DreamerV3 pixel-control demo
  • DPO fine-tuning on preference data

📁 Source folder:
06-SOTA-RL