Policy Iteration Practice

Instructions
  • Reset: Reinitializes the simulation to its default state.
  • Next Value: Computes the value for the subsequent state in the grid.
  • Next Iteration: Completes the current iteration by calculating values for all states.
  • Adjustments: Modify grid size, rewards, and discount factor using the dropdowns provided.
  • Create Obstacles: Click any cell in the left grid to make it an obstacle.
  • Set Rewards: Double-click a cell to toggle its reward state: green (reward: +1), red (reward: -1), or neutral.
  • Visualization: The animation on the left grid shows the calculation for the current state, while the right grid displays the corresponding state value.

Calculation of value function of a state appears here

Policy Representation

Calculation of State values

Controls

0
0

The arrows represents randomly initialised/improved policy. We will evalute this policy by calculating value of the states till they converges.

CONVERGED!! The values represents the converged value of the states. We will calculate the improved policy by using these value of the states.

Please enter your input: