Policy Iteration Demo

Instructions

Start: Initiates the simulation with default speed set at 1x. Ensure all inputs are selected before starting.
Speed Adjustment: Modify simulation speed using the slider.
Reset: Resets the simulation to default settings.
Grid Size: Alter the grid dimensions using the dropdown menu.
Obstacles: Click any cell in the left-hand matrix to toggle it as an obstacle within the MDP.
Reward Cells: Double-click to cycle a cell through reward states: green (+1 reward), red (-1 reward), or normal.
Animation: The left grid visualizes the current cell's calculation process, while the right grid highlights the state value.
Note: The arrows represents randomly initialised/improved policy. We will evalute this policy by calculating value of the states till they converges.

Calculation of value function of a state appears here

Policy Representation

Iterations :

Sub Iterations :

Discount Factor :

0.9

Reward :

0.0

Grid Size :

Min.Speed

Max.Speed