Building Chunker
- Explain the difference between chunking and full parsing in Natural Language Processing. Illustrate your answer with an example.
- Describe the main steps involved in building a chunker using machine learning. What are the roles of training and testing data in this process?
- Compare Hidden Markov Models (HMM) and Conditional Random Fields (CRF) for chunking. What are the strengths and limitations of each approach?
- How does the choice of features (e.g., lexicon, part-of-speech tags, or both) affect the performance of a chunker? Support your answer with observations from the simulation.
- What is the impact of increasing the size of the training corpus on chunking accuracy? Is there a point where adding more data provides little additional benefit? Explain.
- Suppose your chunker is not performing well. List three strategies you could try to improve its accuracy, and explain why each might help.
- In the simulation, what does the “Check Accuracy” step demonstrate? Why is it important to evaluate a model on unseen data?