N-Grams
What is an N-Gram in natural language processing?
Which of the following is a bigram?
Why do N-Gram models use the Markov assumption?
Which of the following is a common application of N-Gram models?
What is the main advantage of using trigrams over bigrams?
What is a major limitation of N-Gram models as N increases?
A bigram model assigns P(A|B) = 0.5 and P(B|START) = 0.4. What is the probability of the sequence START B A?
What does smoothing accomplish in N-Gram models?
Which of the following is NOT a use case for N-Gram models?
Which of the following best describes the chain rule in the context of N-Gram models?