Value Iteration

This experiment is structured to demonstrate the value iteration algorithm applied in a Gridworld setting. It provides an interactive and educational platform where users can observe and analyze the evolution of value functions and derived policies at each iteration within the Markov Decision Process (MDP).

Objectives:

  • Interactive Value Function Visualization: Enable users to visually track the changes in value functions across the Gridworld at each iteration, enhancing understanding of the value iteration process.
  • Customizable MDP Dynamics: Allow users to adjust the MDP parameters and environmental conditions, offering insights into how different settings influence the value iteration process and the resulting policies.
  • Demonstrate Convergence and Policy Extraction: Clearly illustrate the convergence of value functions and how optimal policies are extracted from these values. This aspect is crucial in demonstrating the process of how iterative value estimation leads to optimal decision-making in the Gridworld.

The experiment is designed for learners and researchers with an interest in reinforcement learning, decision theory, and algorithmic processes. It provides a hands-on and insightful way to explore the value iteration in a simulated, yet dynamic environment.