Activation Functions & Optimization

1. Which activation function is zero-centred with output range (-1, 1)?
Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

2. What is a potential drawback of using ReLU activation?
Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

3. Why can Sigmoid activation slow down training?
Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

4. What is a drawback of the 'dying ReLU' problem?
Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

5. Which optimizer combines momentum and adaptive learning rates?
Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

6. How does the Adam optimizer differ from standard SGD?
Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

7. What is the main advantage of using adaptive optimizers like Adam?
Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

8. What is the update rule of parameters in Stochastic Gradient Descent?
Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

9. What does the term 'moment' refer to in Adam optimizer?
Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

10. Which activation function is non-differentiable at zero?
Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

Explanation

Explanation