Q Learning Practice

Instructions
  • Reset: Resets the simulation to its original state.
  • Next Value: Updates the Q value for the selected state-action pair.
  • Next Iteration: Updates the Q values repeatedly until the episode reaches a terminal state or a pre-set number of steps.
  • Grid Customization: Adjust the grid size, reward values, and discount factor using the dropdown menus provided.
  • Modifying State Rewards: Double-click a cell to cycle its reward status: green (reward: +1), red (penalty: -1), and back to normal state.
  • Grid Animation: The left grid shows the current cell being used to calculate the Q value, which corresponds to the highlighted state value in the right grid.

Calculation of Q values of (state, action) pair appears here

Previous Iteration

Present Iteration

Controls

0
0