POS Tagging - Viterbi Decoding
A
Algorithm - A step-by-step procedure for solving a problem or completing a task, such as the Viterbi algorithm for finding the most likely POS tag sequence.
Ambiguity - The property of words that can belong to multiple grammatical categories depending on context (e.g., "run" can be a noun or verb).
B
Backtracking - The final step in the Viterbi algorithm where the most likely path is traced backwards to determine the complete POS tag sequence.
Beam Search - An approximation to Viterbi that keeps only the top-K most probable paths at each step, trading accuracy for computational efficiency.
Bigram - A sequence of two adjacent elements, in HMM context referring to consecutive POS tags used in transition probabilities.
C
Corpus - A large collection of written or spoken texts used for linguistic analysis and training statistical models.
Conditional Probability - The probability of an event occurring given that another event has occurred, fundamental to HMM calculations.
D
Decoding - The process of finding the most likely sequence of hidden states (POS tags) given the observed sequence (words).
Dynamic Programming - An algorithmic technique that solves complex problems by breaking them down into simpler subproblems, used in the Viterbi algorithm.
E
Emission Probability - The probability of observing a particular word given a specific POS tag, denoted as P(word|tag).
End of Sentence (EOS) - A special marker used to denote sentence boundaries in corpus annotation and HMM training.
F
First-order Markov Model - A model where the probability of the next state depends only on the current state, not on the entire history.
Forward Algorithm - An algorithm for computing the probability of an observation sequence in an HMM.
G
Grammar - The set of structural rules governing the composition of clauses, phrases, and words in a language.
Grammatical Category - A class of words that have similar grammatical properties (noun, verb, adjective, etc.).
H
Hidden Markov Model (HMM) - A statistical model where the system being modeled is assumed to be a Markov process with unobserved states.
Hidden States - The unobserved states in an HMM, which in POS tagging correspond to the grammatical categories.
I
Independence Assumption - The assumption that the probability of observing a word depends only on its POS tag, not on other words or tags.
Initial State Distribution - The probability distribution over the possible starting states in an HMM.
L
Lexical Category - Another term for part-of-speech or grammatical category of a word.
Likelihood - The probability of observing the given data under a particular model or set of parameters.
Log-Space Computation - A numerical technique used in Viterbi algorithm to prevent underflow by working with logarithms of probabilities instead of probabilities themselves.
M
Markov Assumption - The assumption that the probability of the next state depends only on the current state.
Markov Chain - A mathematical system that undergoes transitions from one state to another according to certain probabilistic rules.
Maximum Likelihood Estimation - A method of estimating model parameters by finding values that maximize the likelihood of the observed data.
N
Natural Language Processing (NLP) - A field of computer science and artificial intelligence concerned with interactions between computers and human language.
N-gram - A contiguous sequence of n items from a given sequence of text or speech.
O
Observation - In HMM context, the visible outputs (words) that are generated by the hidden states (POS tags).
Optimal Substructure - A key property of dynamic programming problems where optimal solutions contain optimal solutions to subproblems, enabling the Viterbi algorithm's efficiency.
Out-of-Vocabulary (OOV) - Words that appear in test data but were not seen during training.
P
Part-of-Speech (POS) - A category of words that have similar grammatical properties (noun, verb, adjective, etc.).
POS Tagging - The process of assigning part-of-speech tags to words in a sentence.
Probability Matrix - A matrix containing probability values, such as transition or emission probabilities in an HMM.
Pruning - Optimization techniques in Viterbi algorithm that discard low-probability paths to reduce computational complexity.
S
Sequence Labeling - The task of assigning labels to elements in a sequence, such as POS tags to words.
Smoothing - Techniques used to handle zero probabilities in statistical models by redistributing probability mass.
State - A condition or situation in a system, in HMM referring to the hidden grammatical categories.
Statistical Model - A mathematical model that uses probability distributions to represent data and make predictions.
T
Transition Probability - The probability of moving from one state to another in a sequence, denoted as P(tag₂|tag₁).
Training Data - Annotated data used to learn the parameters of a statistical model.
U
Unigram - A single word or token, used in calculating base probabilities for words.
Unsupervised Learning - Machine learning where the algorithm learns patterns from data without labeled examples.
V
Viterbi Algorithm - A dynamic programming algorithm for finding the most likely sequence of hidden states in an HMM.
Viterbi Table - The matrix used to store intermediate probability calculations during Viterbi decoding, where each cell represents the maximum probability of any path ending at a specific state and time.
Vocabulary - The set of all unique words in a corpus or dataset.
W
Word Sense Disambiguation - The process of determining which meaning of a word is used in a particular context.
Word Tokenization - The process of breaking text into individual words or tokens.