MLE: Learning the Classifier from Data

In the previous experiment, we learned how to generate a classifier, given models for each of the classes, using Bayes' Rule. The next goal is to learn such a classifier automatically, given a set of labelled samples from each of the possible classes.

Before doing this experiment, you should ensure that you have understood the previous experiment on Bayesian Classification as well as the experiment on Mean and Covariance Estimation from Data.

The idea behind maximum likelihood parameter estimation is to determine the parameters that maximize the probability (likelihood) of the sample data. From a statistical point of view, the method of maximum likelihood is considered to be more robust (with some exceptions) and yields estimators with good statistical properties. In other words, MLE methods are versatile and apply to most models and different types of data. In addition, they provide efficient methods for quantifying uncertainty through confidence bounds. Although the methodology for maximum likelihood estimation is simple, the implementation can be mathematically intense. However, with today's computing power, mathematical complexity is not a significant obstacle.

The MLE is an important type of estimator for the following reasons:

  1. The MLE implements the likelihood principle.
  2. MLEs are often simple and easy to compute.
  3. MLEs have asymptotic optimality properties (consistency and efficiency).
  4. MLEs are invariant under reparameterization.
  5. If an efficient estimator exists, it is the MLE.
  6. In signal detection with unknown parameters (composite hypothesis testing), MLEs are used in implementing the generalized likelihood ratio test (GLRT).