Building Chunker

To understand and demonstrate the process of chunking in Natural Language Processing (NLP) by building a chunker using machine learning models. The experiment aims to show how the choice of features and the size of the training corpus affect the accuracy of chunking, and to help learners visualize and analyze the impact of these factors through interactive simulation.

What is Chunking?

Chunking is the task of segmenting and labeling multi-token sequences (such as noun groups, verb groups, etc.) in a sentence. Chunks are non-overlapping, non-recursive groups of words that form meaningful units, such as noun phrases (NP), verb phrases (VP), and prepositional phrases (PP).

For example, the sentence:

'He reckons the current account deficit will narrow to only 1.8 billion in September.'

can be divided as:

[NP He ] [VP reckons ] [NP the current account deficit ] [VP will narrow ] [PP to ] [NP only 1.8 billion ] [PP in ] [NP September ]

This experiment allows you to explore how different algorithms and features influence the chunking process and its accuracy.