Q Learning Practice

Instructions

Reset: Resets the simulation to its original state.
Next Value: Updates the Q value for the selected state-action pair.
Next Iteration: Updates the Q values repeatedly until the episode reaches a terminal state or a pre-set number of steps.
Grid Customization: Adjust the grid size, reward values, and discount factor using the dropdown menus provided.
Modifying State Rewards: Double-click a cell to cycle its reward status: green (reward: +1), red (penalty: -1), and back to normal state.
Grid Animation: The left grid shows the current cell being used to calculate the Q value, which corresponds to the highlighted state value in the right grid.

Calculation of Q values of (state, action) pair appears here

Previous Iteration

Iterations :

Steps :

Discount Factor :

Epsilon :

Learning Rate :

Reward :

Grid Size :