Fine - grained POS Tagging
Part-of-speech tagging (POS tagging) is the process of marking up a word in a text as corresponding to a particular part of speech, based on both its definition, as well as its context. POS-taggers play an increasingly important role in speech recognition, natural language parsing and information retrieval.
In traditional grammars there were generally only a few parts of speech (Refer to Experiment 5). However, there is a need for further distinction within these categories. For example these tagsets distinguish between possessive pronouns (my, your, his, her, its) and personal pronouns (I, you, he, me). Knowing whether a word is a possessive pronoun or a personal pronoun can tell us what words are likely to occur in its vicinity. For Example , possessive pronouns are likely to be followed by a noun, personal pronouns by a verb.
- his(Possessive) pen(Noun)
- he(Personal_Pronoun) went(Verb)
Therefore, each of the traditional part of speech can be made more fine-grained so that can convey additional information. Such a tagset (adapted from the Penn tagset) is shown below:
PENN tagset
| POS Tag | Deacription | Example |
|---|---|---|
| CC | Coordinating Conjunction | and, but, or |
| CD | Cardinal Number | 1, one, third |
| DT | Determiner | the, some |
| EX | Existential There | there is |
| IN | Preposition/Subordinating conjunction | in, of, like, that |
| JJ | Adjective | green, good |
| JJR | Adjective, Comparative | greener, better |
| JJS | Adjective, Superlative | greenest, best |
| MD | Modal | could, will |
| NN | Noun, Singular or Mass | table |
| NNS | Noun Plural | tables |
| NNP | Proper Noun, Singular | John, Google |
| NNPS | Proper Noun, Plural | Vikings, Bachchans |
| PDT | Predeterminer | both the boys |
| POS | Possessive Ending | friend's |
| PRP | Personal Pronoun | I, he, it |
| PRP$ | Possessive Pronoun | my, his |
| RB | Adverb | however, usually, naturally, here, very |
| RBR | Adverb, Comparative | more |
| RBS | Adverb, Superlative | most |
| RP | Particle | give up |
| TO | To | to go, to him |
| UH | Interjection | uhhuhhuhh |
| VB | Verb, base form | take |
| VBD | Verb, Past | Took |
| VBG | Verb, Gerund/Present Participle | Taking |
| VBN | Verb, Past Participle | Taken |
| VBP | Verb, Singular, Present, non-3rd person | Take |
| VBZ | Verb, Singular, Present, 3rd person | Takes |
| WDT | Wh-determiner | which |
| WP | Wh-pronoun | who, what |
| WP$ | Possessive Wh-pronoun | whose |
| WRB | Wh-abverb | where,when |
For eg :
A child in the play liked to place the green bill on the red flower, and used to wonder which of them was beautiful.
A/DT child/NN in/IN the/DT play/NN liked/VBD to/TO place/VB the/DT green/JJ bill/NN on/IN the/DT red/JJ flower/NN and/CC used/VBD to/TO wonder/VB which/WDT of/IN them/PRP was/VBD beautiful/JJ
A fine grained POS tagset for Indian languages
| Sl. no. | Category | Tag name | Example |
|---|---|---|---|
| 1.1 | Common Noun | NN | किताब, कुर्सी |
| 1.2 | Noun denoting spatial and temporal expressions | NST | ऊपर, सामने |
| 2. | Proper Noun | NNP | राम, राधा |
| 3.1 | Pronoun | PRP | वह |
| 3.2 | Demonstrative | DEM | इस किताब, वह लड़का |
| 4 | Verb Main | VM | करूँगा, खाते |
| 5 | Verb Aux | VAUX | किया था, खाते हुए |
| 6 | Adjective | JJ | सुन्दर, छोटा |
| 7 | Adverb | RB | जल्दी, धीरे (*Only manner adverb) |
| 8 | Post position | PSP | ने, के लिए |
| 9 | Particles | RP | भी ही, जी |
| 10 | Conjuncts | CC | और, या |
| 11 | Question Words | WQ | क्या, कौन |
| 12.1 | Quantifiers | QF | बहुत, थोडा, कम |
| 12.2 | Cardinal | QC | तीन |
| 12.3 | Ordinal | QO | तीसरा |
| 13 | Intensifier | INTF | बहुतअच्छा |
| 14 | Interjection | INJ | अरे!, वाह! |
| 15 | Negation | NEG | नहीं |
| 16 | Compounds | C | स्कूल शिक्षा |
For example:
सभी बच्चे बाहर से आये हुए अतिथियों का स्वागत करेंगे।
सभी/QF बच्चे/NN बाहर/NST से/PSP आये/VM हुए/VAUX अतिथियों/NN का/PSP स्वागत/NN करेंगे/VM