Building Chunker

  1. Explain the difference between chunking and full parsing in Natural Language Processing. Illustrate your answer with an example.
  2. Describe the main steps involved in building a chunker using machine learning. What are the roles of training and testing data in this process?
  3. Compare Hidden Markov Models (HMM) and Conditional Random Fields (CRF) for chunking. What are the strengths and limitations of each approach?
  4. How does the choice of features (e.g., lexicon, part-of-speech tags, or both) affect the performance of a chunker? Support your answer with observations from the simulation.
  5. What is the impact of increasing the size of the training corpus on chunking accuracy? Is there a point where adding more data provides little additional benefit? Explain.
  6. Suppose your chunker is not performing well. List three strategies you could try to improve its accuracy, and explain why each might help.
  7. In the simulation, what does the “Check Accuracy” step demonstrate? Why is it important to evaluate a model on unseen data?