Q-Learning
What is Q-learning?
What is the purpose of the Q-value in Q-learning
What is the purpose of the epsilon-greedy policy in Q-learning?
Is Q-Learning an off policy method?
What is the ideal value of the discount factor in Q-learning to prioritize long-term rewards?