N-Grams Smoothing
What is the main purpose of an N-gram language model?
Which of the following is a bigram?
Why do we need smoothing in N-gram language models?
What does Add-One (Laplace) Smoothing do to the counts in an N-gram model?
If a bigram never appears in the training corpus, what is its probability in a maximum likelihood estimate (without smoothing)?
What is the effect of smoothing on the probability distribution of N-grams?
Which of the following is a limitation of Add-One Smoothing?
How does the vocabulary size (V) affect the denominator in Add-One Smoothing for bigram probabilities?
Which of the following is a real-world application of N-gram smoothing?