POS Tagging - Viterbi Decoding

1. How is Viterbi decoding different from the forward algorithm? Explain the key differences in terms of:

  • Objective (what each algorithm computes)
  • Mathematical formulation
  • Practical applications in NLP

2. How is Viterbi decoding more efficient than a brute force approach for finding the optimal tag sequence? Calculate and compare:

  • Time complexity of brute force method for a sentence of length N with T possible tags
  • Time complexity of Viterbi algorithm
  • Provide a concrete example with N=5 words and T=10 tags

3. Given the following emission and transition probability matrices, manually compute the Viterbi table for the sentence "The dog runs":

Emission Matrix P(word|tag):

         The    dog    runs
          Noun     0.1    0.6    0.1
          Verb     0.0    0.1    0.8
          Det      0.9    0.0    0.0
          

Transition Matrix P(tag_j|tag_i):

         Noun   Verb   Det
          Noun     0.3    0.4    0.1
          Verb     0.4    0.1    0.2
          Det      0.7    0.2    0.1
          

Assume equal initial probabilities π[tag] = 1/3 for all tags.

4. Analyze the role of transition vs. emission probabilities:

  • Create a scenario where high emission probability conflicts with low transition probability
  • Explain how the Viterbi algorithm resolves such conflicts
  • Discuss which type of probability is more important for overall tagging accuracy

5. Implement smoothing for unseen events:

  • How would you handle a word that doesn't appear in your emission matrix?
  • How would you handle a tag transition that has zero probability in your transition matrix?
  • Propose specific smoothing techniques and justify your choices

6. Consider the sentence "Bank the money in the bank" where "bank" can be both a noun (financial institution) and a verb (to deposit):

  • Explain how context helps the Viterbi algorithm choose the correct POS tag for each occurrence
  • Discuss what happens if your training data has unbalanced occurrences of ambiguous words

7. Dynamic Programming Properties:

  • Prove that the POS tagging problem exhibits optimal substructure
  • Explain why the Markov assumption is crucial for the Viterbi algorithm's correctness
  • What would happen if we violated the Markov assumption?

8. Compare and contrast Viterbi decoding with modern neural approaches:

  • What are the advantages of HMM-based tagging?
  • What are the limitations that neural networks address?
  • In what scenarios might you still prefer Viterbi decoding?