Policy Iteration
This experiment is designed to demonstrate the policy iteration algorithm applied in a Gridworld setting. It offers an interactive platform where users can observe and analyze the evolution of policies at each iteration within the Markov Decision Process (MDP).
Objectives:
- Interactive Policy Visualization: Enable users to visually track policy changes in the Gridworld at each iteration.
- Customizable MDP Dynamics: Allow users to modify the MDP dynamics, facilitating a deeper understanding of how policy adaptation occurs under different conditions.
- Demonstrate Convergence: Clearly illustrate the process of convergence, showing how iterative policy refinement leads to optimal decision-making in the Gridworld.
The experiment is tailored for learners interested in reinforcement learning and decision theory, providing an engaging and educational exploration of policy iteration in a controlled, yet dynamic environment.