N-Grams Smoothing
What is the main problem that smoothing solves in N-gram language models?
In Add-One Smoothing, what is added to each bigram count?
Given a vocabulary size of 5 and a bigram count C('the', 'cat') = 2, and C('the') = 4, what is the Add-One smoothed probability P('cat'|'the')?
Which smoothing technique is considered a simple baseline for N-gram models?
Why is Add-One Smoothing not always preferred for real-world language modeling?
Which of the following is the correct formula for Add-One smoothed bigram probability?
If a bigram ('she', 'likes') appears 0 times in the corpus, C('she') = 2, and V = 5, what is the Add-One smoothed probability P('likes'|'she')?
Which of the following tasks would most benefit from N-gram smoothing?
What happens to the probability of seen N-grams after smoothing is applied?