Policy Iteration Demo

Instructions
  • Start: Initiates the simulation with default speed set at 1x. Ensure all inputs are selected before starting.
  • Speed Adjustment: Modify simulation speed using the slider.
  • Reset: Resets the simulation to default settings.
  • Grid Size: Alter the grid dimensions using the dropdown menu.
  • Obstacles: Click any cell in the left-hand matrix to toggle it as an obstacle within the MDP.
  • Reward Cells: Double-click to cycle a cell through reward states: green (+1 reward), red (-1 reward), or normal.
  • Animation: The left grid visualizes the current cell's calculation process, while the right grid highlights the state value.
  • Note: The arrows represents randomly initialised/improved policy. We will evalute this policy by calculating value of the states till they converges.

Calculation of value function of a state appears here

Policy Representation

Calculation of State values

Observations

0
0

0.9

0.0

Min.Speed

Max.Speed

The arrows represents randomly initialised/improved policy. We will evalute this policy by calculating value of the states till they converges.

CONVERGED!! The values represents the converged value of the states. We will calculate the improved policy by using these value of the states.