To understand and implement a Multi-Layer Perceptron (MLP) with focus on architecture, activation functions, and training

In an MLP, what is the role of the input layer?
What role does the learning rate play in the training of an MLP?
What is the purpose of the output layer in an MLP?
Which of the following statements is true regarding the XOR problem and MLPs?
The perceptron learning rule updates the weights based on: