Policy Iteration Demo
Instructions
- Start: Initiates the simulation with default speed set at 1x. Ensure all inputs are selected before starting.
- Speed Adjustment: Modify simulation speed using the slider.
- Reset: Resets the simulation to default settings.
- Grid Size: Alter the grid dimensions using the dropdown menu.
- Obstacles: Click any cell in the left-hand matrix to toggle it as an obstacle within the MDP.
- Reward Cells: Double-click to cycle a cell through reward states: green (+1 reward), red (-1 reward), or normal.
- Animation: The left grid visualizes the current cell's calculation process, while the right grid highlights the state value.
- Note: The arrows represents randomly initialised/improved policy. We will evalute this policy by calculating value of the states till they converges.
Calculation of value function of a state appears here
Policy Representation
Calculation of State values
Observations
0
0
0.9
0.0
Min.Speed
Max.Speed