Q Learning Practice
Instructions
- Reset: Resets the simulation to its original state.
- Next Value: Updates the Q value for the selected state-action pair.
- Next Iteration: Updates the Q values repeatedly until the episode reaches a terminal state or a pre-set number of steps.
- Grid Customization: Adjust the grid size, reward values, and discount factor using the dropdown menus provided.
- Modifying State Rewards: Double-click a cell to cycle its reward status: green (reward: +1), red (penalty: -1), and back to normal state.
- Grid Animation: The left grid shows the current cell being used to calculate the Q value, which corresponds to the highlighted state value in the right grid.
Calculation of Q values of (state, action) pair appears here
Previous Iteration
Present Iteration
Controls
0
0