Tokens and Types

Definition

Tokens correspond to the total number of word counts in a text while type corresponds to the total count of unique words in a text. We can say that language consists of various types of words and all the particular instances of these words are called tokens.

For Eg: Do not waste time as wasting time does a lot of harm.

Here, #tokens=12 #types=11 (time has been repeated twice)

Type vs token distinction

The type/token distinction is related to that between universals and particulars. Tokens are concrete particular instances of a general and abstract type. There is only one word 'the' (type) but many instances of it found on this page (token).

The type/token distinction is applicable beyond language as well. For eg:

  • Beethovena's Fifth Symphony and performances of it

  • The white elephant and specimens of it

  • Kentucky Fried Chicken and its centres

    Types - (continued)

    Study this example again: Do not waste time as wasting time does a lot of harm.

    Now, we notice that 'waste' and 'wasting' share a common root. So do 'do' and 'does'. Do we consider them as different types? The second approach is to consider them as a single type as inflections(different grammatical forms) of the same word (type). Therefore,

    #tokens=12 #types(root)=9

    Types_root -

    This is the number of unique types (words) after considering their root forms, by grouping words with the same root.

    e.g. Consider the following sentence - She sells seashells on the seashore. The shells she sells are surely seashells.

    • Tokens: 13 (total word count). (She, sells, seashells, on, the, seashore, The, shells, she, sells, are, surely, seashells)
    • Types: 8 (unique words). (she, sells, seashells, on, the, seashore, shells, are, surely)
    • Types Root: 9 (unique root words after grouping). (she, sell, seashell, on, the, seashore, shell, are, sure.)

    Summary

    • Tokens: Total number of words in the text, including repetitions.
    • Types: Total number of unique words (case-insensitive).
    • Types Root: Total number of unique root words (grouping words with the same root).