🧭 Phase 3 — Policy-Based Methods
Directly learn parameterized policies without discrete value tables.
Topics
- Policy Gradient Theorem
- REINFORCE Algorithm
- Variance Reduction (Baselines)
- Actor-Critic (A2C)
Mini Projects
- MountainCarContinuous-v0 (REINFORCE)
- CartPole (A2C)
📁 Source folder:
03-Policy-Based