Q-Learning

This experiment is designed to illustrate the Q-Learning algorithm in a Gridworld setting. It serves as an educational and interactive tool, allowing users to observe and analyze the update process of Q-values and the development of policies in real-time within a Reinforcement Learning (RL) framework.

Objectives

  • Interactive Q-Value Visualization: This feature allows users to visually monitor the updates of Q-values in the Gridworld after each action. The visualization aims to enhance the understanding of the Q-Learning process, highlighting how state-action pairs are evaluated and revised, which reflects the learning progress of the agent.

  • Customizable RL Scenarios: Participants can adjust key parameters and environmental conditions relevant to Q-Learning, such as the learning rate, discount factor, and exploration strategies. This customization provides insights into the influence of various settings on the learning process and the efficacy of the resulting policy.

  • Demonstration of Policy Development and Optimization: The experiment aims to clearly demonstrate the evolution of policies as the Q-Learning algorithm progresses through its iterations. It emphasizes how the algorithm explores the environment, learns from interactions, and gradually converges towards an optimal policy, underscoring the algorithm's adaptive nature in decision-making within a Gridworld context.

This experiment offers a practical and comprehensive approach to understanding Q-Learning, a critical algorithm in many contemporary AI applications.