Virtual Labs

What does MDP stand for?

a: Markov Decision Process Explanation

Explanation

b: Machine Deterministic Process Explanation

Explanation

c: Magnitude Difference Policy Explanation

Explanation

d: Markov Dynamic Position Explanation

Explanation

What is reward hypothesis?

a: Goal can be thought as minimizing the expected value of cumulative reward Explanation

Explanation

b: There is no correlation between the reward and the Goal Explanation

Explanation

c: Goal can be reached by taking the action with high reward at that point only Explanation

Explanation

d: Goal can be thought as maximizing the expected value of cumulative reward Explanation

Explanation

Which of the following is true, regarding MDP? \ 1.The environment is fully observable. \ 2.The future is dependent on the present and past states. \ 3.The probability to reach the successor state only depends on the current state.

a: only 1 Explanation

Explanation

b: 1 and 2 Explanation

Explanation

c: 1 and 3 Explanation

Explanation

d: 1, 2 and 3 Explanation

Explanation

An agent receives an representation of the environment at time step 't' and performs an action 'a'. Then agent receives a reward 'r' at time step ____ for the action 'a'.

a: t Explanation

Explanation

b: t+1 Explanation

Explanation

c: t+2 Explanation

Explanation

d: None Explanation

Explanation

Imagine, an agent is in a maze-like gridworld. You would like the agent to find the goal, as quickly as possible and you need the path to be small. You give the agent a reward of +1 when it reaches the goal and the discount rate is 1.0, because this is an episodic task. When you run the agent its finds the goal, but does not seem to care how long it takes to complete each episode. How could you fix this?

a: Give an reward of +1 for each step the agent takes. Explanation

Explanation

b: Give an reward of 0 for each step the agent takes. Explanation

Explanation

c: Give an reward of -1 for each step the agent takes and set the discount factor to be 0.9. Explanation

Explanation

d: None Explanation

Explanation

Value Iteration