Policy Iteration

Step 1: Modifying the Grid

  • Double Click/Single Click on any cell in the grid to change its state.
    • Terminal States: These are your target or end points (like a charging station).
    • Blocked States: These are obstacles or no-go areas.

Step 2: Adjusting Settings

  • Use the Control Menu to change grid size and algorithm parameters.
    • Adjust things like grid dimensions or algorithm settings to see how they affect the outcome.

Step 3: Understanding the Grid

  • The grid shows the State Value Function for each cell.
    • Each cell's value represents the expected reward for moving in the directions: Left, Up, Right, Down.

Step 4: Iteration and Sub-Iterations

  • Click "Next Value" to progress through the current iteration step-by-step.
    • The Sub-Iterations count increases with each step.
    • When a terminal state or the maximum steps per iteration are reached, the iteration count increases and the steps reset to 0.

Step 5: Moving to the Next Iteration

  • Click "Next Iteration" to proceed to the next cycle of the algorithm.
    • This allows you to see how the algorithm refines its strategy over time.

Step 6: Learning the Policy

  • The Arrows in the Left Grid indicate the currently learned policy.
    • These arrows guide you towards the most rewarding actions in each state.

Step 7: Reaching the Optimal Policy

  • When the State Value Functions of all cells stabilize, an Optimal Policy is achieved.
    • A message will be displayed indicating that the best strategy has been found.