What is WordNet?
WordNet is a digital dictionary and a thesaurus at the same time, and contains a total of about 155k words grouped into 117k synsets. It is also seen as a lexical knowledge based, where a set of synonymous words are grouped into a synonym set (synset for short). Each synset is associated with a formal definition (gloss) and some example phrases. Each sense of a polysemous word is categorized under the synset that best fits its meaning. It is assumed to be one of the most extensive digital dictionaries available for English, and covers the majority of the English vocabulary.
WordNet also contains a semantic word network among words, making it easy for terms to be search for in a conceptual manner rather than alphabetically like human-readable dictionaries. Nouns and verbs have a hierarchical structure in WordNet (some relations are hyponymy-hypernymy, meronymy-holonymy, entailments, and troponyms), adjectives are laid out as clusters in a multidimensional hyperspace, with the main relation between them being antonymy, while adverbs link to adjectives by the pertainym relation.
Synsets are nodes in the network, and the semantic relations among the synsets make up the edges between nodes. WordNet can be used reliably for many NLP tasks such as sentiment analysis, word sense disambiguation, machine translation, etc. Semantic similarity between words can also be measured using the distance between them in the WordNet ontological network.
Accessing WordNet using NLTK
WordNet can be easily accessed using NLTK, as shown in the code below.
Accessing all of the synsets that contain senses of the word “car”.
from nltk.corpus import wordnet as wn syns = wn.synsets("car") for syn in syns: print(syn)
Output:
Synset('car.n.01') Synset('car.n.02') Synset('car.n.03') Synset('car.n.04') Synset('cable_car.n.01')
Accessing the lemmas, or synonymous words in the synset “car.n.01” (the first sense of the noun “car”).
print(wn.synset('car.n.01').lemma_names())
Output:
['car', 'auto', 'automobile', 'machine', 'motorcar']
Iterating over the synsets for the word “good”, and printing their definitions. As you can see, there are 26 different senses of good (4 nouns, 2 adverbs, and 20 adjectives!).
counter = 0 syns = wn.synsets("good") for syn in syns: print(counter, syn, "---", str(syn.definition())) counter += 1
Output:
0 Synset('good.n.01') --- benefit 1 Synset('good.n.02') --- moral excellence or admirableness 2 Synset('good.n.03') --- that which is pleasing or valuable or useful 3 Synset('commodity.n.01') --- articles of commerce 4 Synset('good.a.01') --- having desirable or positive qualities especially those suitable for a thing specified 5 Synset('full.s.06') --- having the normally expected amount 6 Synset('good.a.03') --- morally admirable 7 Synset('estimable.s.02') --- deserving of esteem and respect 8 Synset('beneficial.s.01') --- promoting or enhancing well-being 9 Synset('good.s.06') --- agreeable or pleasing 10 Synset('good.s.07') --- of moral excellence 11 Synset('adept.s.01') --- having or showing knowledge and skill and aptitude 12 Synset('good.s.09') --- thorough 13 Synset('dear.s.02') --- with or in a close or intimate relationship 14 Synset('dependable.s.04') --- financially sound 15 Synset('good.s.12') --- most suitable or right for a particular purpose 16 Synset('good.s.13') --- resulting favorably 17 Synset('effective.s.04') --- exerting force or influence 18 Synset('good.s.15') --- capable of pleasing 19 Synset('good.s.16') --- appealing to the mind 20 Synset('good.s.17') --- in excellent physical condition 21 Synset('good.s.18') --- tending to promote physical well-being; beneficial to health 22 Synset('good.s.19') --- not forged 23 Synset('good.s.20') --- not left to spoil 24 Synset('good.s.21') --- generally admired 25 Synset('well.r.01') --- (often used as a combining form) in a good or proper or satisfactory manner or to a high standard (`good' is a nonstandard dialectal variant for `well') 26 Synset('thoroughly.r.02') --- completely and absolutely (`good' is sometimes used informally for `thoroughly')
Printing the senses (lemmas) in the synset happiness.n.01, and then just the lemma names.
lemmas = wn.synset('happiness.n.01').lemmas() lemma_names = wn.synset('happiness.n.01').lemma_names() for lemma in lemmas: print(lemma) print("Lemma names:", lemma_names)
Output:
Lemma('happiness.n.01.happiness') Lemma('happiness.n.01.felicity') Lemma names: ['happiness', 'felicity']
Iterating over all 117k synsets.
for syn in list(wn.all_synsets()): print(syn)
Iterating over only verbs.
for syn in list(wn.all_synsets(pos = 'v')): print(syn)
Accessing the lexical relations (synonyms and antonyms) of a synset.
synonyms = [] antonyms = [] for syn in wn.synsets("good"): for lemma in syn.lemmas(): synonyms.append(lemma.name()) if lemma.antonyms(): antonyms.append(lemma.antonyms()[0].name()) print("Synonyms of good: ", set(synonyms)) print("Antonyms of good: ", set(antonyms))
Output:
Synonyms of good: {'soundly', 'sound', 'beneficial', 'adept', 'right', 'unspoiled', 'full', 'respectable', 'secure', 'well', 'honest', 'serious', 'practiced', 'undecomposed', 'good', 'expert', 'thoroughly', 'salutary', 'in_force', 'safe', 'goodness', 'trade_good', 'commodity', 'skillful', 'effective', 'honorable', 'in_effect', 'unspoilt', 'proficient', 'just', 'estimable', 'upright', 'dear', 'skilful', 'near', 'dependable', 'ripe'} Antonyms of good: {'evil', 'badness', 'evilness', 'bad', 'ill'}
Accessing the synsets linked to the adjective good.a.01 via some some semantic relations.
print(wn.synset('good.a.01').also_sees()) print(wn.synset('good.a.01').similar_tos())
Output:
[Synset('best.a.01'), Synset('better.a.01'), Synset('favorable.a.01'), Synset('good.a.03'), Synset('obedient.a.01'), Synset('respectable.a.01')] [Synset('bang-up.s.01'), Synset('good_enough.s.01'), Synset('goodish.s.01'), Synset('hot.s.15'), Synset('redeeming.s.02'), Synset('satisfactory.s.02'), Synset('solid.s.01'), Synset('superb.s.02'), Synset('well-behaved.s.01')]
Accessing the synsets linked to the noun dog.n.01 via the hyponymy relation.
print(wn.synset('dog.n.01').hyponyms())
Output:
Synset('cur.n.01'), Synset('dalmatian.n.02'), Synset('great_pyrenees.n.01'), Synset('griffon.n.02'), Synset('hunting_dog.n.01'), Synset('lapdog.n.01'), Synset('leonberg.n.01'), Synset('mexican_hairless.n.01'), Synset('newfoundland.n.01'), Synset('pooch.n.01'), Synset('poodle.n.01'), Synset('pug.n.01'), Synset('puppy.n.01'), Synset('spitz.n.01'), Synset('toy_dog.n.01'), Synset('working_dog.n.01')]
Checking semantic similarity between synsets using wup similarity
word1 = wn.synset('mammal.n.01') word2 = wn.synset('dog.n.01') print(word1.wup_similarity(word2)) word1 = wn.synset('mammal.n.01') word2 = wn.synset('spider.n.01') print(word1.wup_similarity(word2)) word1 = wn.synset('mammal.n.01') word2 = wn.synset('car.n.01') print(word1.wup_similarity(word2))
Output:
0.8333333333333334 0.6666666666666666 0.38095238095238093
WordNet has so much more amazing features to offer, and more info can be found on their official website. The Global WordNet Association also contains a WordNet for many foreign languages. The examples shown above can give you an idea about how powerful WordNet is, especially in NLP and text manipulation.
1