🧭 Phase 3 — Policy-Based Methods

Directly learn parameterized policies without discrete value tables.

Topics

  • Policy Gradient Theorem
  • REINFORCE Algorithm
  • Variance Reduction (Baselines)
  • Actor-Critic (A2C)

Mini Projects

  • MountainCarContinuous-v0 (REINFORCE)
  • CartPole (A2C)

📁 Source folder:
03-Policy-Based