Policy Iteration

This experiment is designed to demonstrate the policy iteration algorithm applied in a Gridworld setting. It offers an interactive platform where users can observe and analyze the evolution of policies at each iteration within the Markov Decision Process (MDP).

Objectives:

  • Interactive Policy Visualization: Enable users to visually track policy changes in the Gridworld at each iteration.
  • Customizable MDP Dynamics: Allow users to modify the MDP dynamics, facilitating a deeper understanding of how policy adaptation occurs under different conditions.
  • Demonstrate Convergence: Clearly illustrate the process of convergence, showing how iterative policy refinement leads to optimal decision-making in the Gridworld.

The experiment is tailored for learners interested in reinforcement learning and decision theory, providing an engaging and educational exploration of policy iteration in a controlled, yet dynamic environment.