N-Grams
Q1. A trigram is a second-order Markov model. Derive the formula to calculate trigram probability. Next, calculate the trigram probabilities for the given corpus.
(eos) Can I sit near you (eos) You can sit (eos) Sit near him (eos) I can sit you (eos)
Q2. A character based N-gram is a set of n consecutive characters extracted from a word. It is generally used in measuring the similarity of character strings. Some of its applications are in spellcheckers, stemming, OCR error correction, etc.
Given, four valid words:
(a) quote
(b) patient
(c) patent
(d) impatient
Q3. Calculate the probability of occurrence of each word given below. Which of these represent the correct spelling?
(a) qotient
(b) quotent
(c) quotient