Word Analysis
Morphological analysis is a foundational task in Natural Language Processing (NLP). It involves breaking down a word into its root and affix(es), which is essential for understanding word structure and meaning in any language.
1. Morphological Analysis
Morphological analysis involves breaking a word into its root and affix(es). Identifying the root of a word is a fundamental step in many NLP tasks.
Example: Word Forms Across Languages
- English:
- Root: 'play'
- Forms: 'play', 'plays', 'played', 'playing'
- Hindi:
- Root: 'खेल' (khela)
- Forms: खेल, खेला, खेली, खेलूंगा, खेलूंगी, खेलेगा, खेलेगी, खेलते, खेलती, खेलने, खेलकर
- Telugu:
- Root: ఆడడం (Adadam)
- Forms: ఆడుతాను, ఆడుతున్నాను, ఆడేను, ఆడేవా, ...
2. Morphological Richness
The number of forms a root word can take varies across languages. Indian languages are generally considered morphologically rich, meaning words can have many forms due to inflections and derivations.
3. Types of Morphology
Inflectional Morphology
- Deals with word forms of a root where there is no change in lexical category.
- Example: 'played' is an inflection of 'play' (both are verbs).
Derivational Morphology
- Deals with word forms of a root where there is a change in lexical category.
- Example: 'happiness' is a derivation of 'happy' (adjective → noun).
4. Morphological Features
During morphological analysis, each word is assigned a lexical category and may take suffixes for features such as gender, number, person, case, tense, aspect, and modality.
Nouns & Pronouns
- Can take suffixes for: gender, number, person, case
Example Analyses
Language | Input Word | Output Analysis |
---|---|---|
Hindi | लड़के (ladake) | rt=लड़का(ladakaa), cat=n, gen=m, num=sg, case=obl |
Hindi | लड़के (ladake) | rt=लड़का(ladakaa), cat=n, gen=m, num=pl, case=dir |
Hindi | लड़कों (ladakoM) | rt=लड़का(ladakaa), cat=n, gen=m, num=pl, case=obl |
English | boy | rt=boy, cat=n, gen=m, num=sg |
English | boys | rt=boy, cat=n, gen=m, num=pl |
Verbs
- Can take suffixes for: tense, aspect, modality, gender, number, person
Example Analyses
Language | Input Word | Output Analysis |
---|---|---|
Hindi | हंसी(hansii) | rt=हंस(hans), cat=v, gen=fem, num=sg/pl, per=1/2/3 tense=past, aspect=pft |
English | toys | rt=toy, cat=n, num=pl, per=3 |
5. Feature References
- rt: root
- cat: lexical category (noun, verb, adjective, pronoun, adverb, preposition)
- gen: gender (masculine or feminine)
- num: number (singular (sg) or plural (pl))
- per: person (1, 2, or 3)
- tense: present, past, or future (for verbs)
- aspect: perfect (pft), continuous (cont), or habitual (hab) (for verbs)
- case: direct or oblique (for nouns)
Note:
- A case is oblique when a postposition occurs after a noun. If no postposition can occur after a noun, the case is direct.
- Some Hindi postpositions: का(kaa), की(kii), के(ke), को(ko), में(meM)
6. Summary
Morphological analysis is crucial for understanding and processing natural language, especially in morphologically rich languages like Hindi and Telugu. By breaking words into their roots and features, we can better analyze, translate, and generate language computationally.