How to Access WordNet from NLTK
* 4 min read

What is WordNet?

WordNet is a digital dictionary and a thesaurus at the same time, and contains a total of about 155k words grouped into 117k synsets. It is also seen as a lexical knowledge based, where a set of synonymous words are grouped into a synonym set (synset for short). Each synset is associated with a formal definition (gloss) and some example phrases. Each sense of a polysemous word is categorized under the synset that best fits its meaning. It is assumed to be one of the most extensive digital dictionaries available for English, and covers the majority of the English vocabulary.

WordNet also contains a semantic word network among words, making it easy for terms to be search for in a conceptual manner rather than alphabetically like human-readable dictionaries. Nouns and verbs have a hierarchical structure in WordNet (some relations are hyponymy-hypernymy, meronymy-holonymy, entailments, and troponyms), adjectives are laid out as clusters in a multidimensional hyperspace, with the main relation between them being antonymy, while adverbs link to adjectives by the pertainym relation.

Synsets are nodes in the network, and the semantic relations among the synsets make up the edges between nodes. WordNet can be used reliably for many NLP tasks such as sentiment analysis, word sense disambiguation, machine translation, etc. Semantic similarity between words can also be measured using the distance between them in the WordNet ontological network.

 

Accessing WordNet using NLTK

WordNet can be easily accessed using NLTK, as shown in the code below.

 

Accessing all of the synsets that contain senses of the word “car”.

from nltk.corpus import wordnet as wn

syns = wn.synsets("car")

for syn in syns:
    print(syn)

 

Output:

Synset('car.n.01')
Synset('car.n.02')
Synset('car.n.03')
Synset('car.n.04')
Synset('cable_car.n.01')

 

Accessing the lemmas, or synonymous words in the synset “car.n.01” (the first sense of the noun “car”).

print(wn.synset('car.n.01').lemma_names())

 

Output:

['car', 'auto', 'automobile', 'machine', 'motorcar']

 

Iterating over the synsets for the word “good”, and printing their definitions. As you can see, there are 26 different senses of good (4 nouns, 2 adverbs, and 20 adjectives!).

counter = 0
syns = wn.synsets("good")

for syn in syns:
    print(counter, syn, "---", str(syn.definition()))
    counter += 1

 

Output:

0 Synset('good.n.01') --- benefit
1 Synset('good.n.02') --- moral excellence or admirableness
2 Synset('good.n.03') --- that which is pleasing or valuable or useful
3 Synset('commodity.n.01') --- articles of commerce
4 Synset('good.a.01') --- having desirable or positive qualities especially those suitable for a thing specified
5 Synset('full.s.06') --- having the normally expected amount
6 Synset('good.a.03') --- morally admirable
7 Synset('estimable.s.02') --- deserving of esteem and respect
8 Synset('beneficial.s.01') --- promoting or enhancing well-being
9 Synset('good.s.06') --- agreeable or pleasing
10 Synset('good.s.07') --- of moral excellence
11 Synset('adept.s.01') --- having or showing knowledge and skill and aptitude
12 Synset('good.s.09') --- thorough
13 Synset('dear.s.02') --- with or in a close or intimate relationship
14 Synset('dependable.s.04') --- financially sound
15 Synset('good.s.12') --- most suitable or right for a particular purpose
16 Synset('good.s.13') --- resulting favorably
17 Synset('effective.s.04') --- exerting force or influence
18 Synset('good.s.15') --- capable of pleasing
19 Synset('good.s.16') --- appealing to the mind
20 Synset('good.s.17') --- in excellent physical condition
21 Synset('good.s.18') --- tending to promote physical well-being; beneficial to health
22 Synset('good.s.19') --- not forged
23 Synset('good.s.20') --- not left to spoil
24 Synset('good.s.21') --- generally admired
25 Synset('well.r.01') --- (often used as a combining form) in a good or proper or satisfactory manner or to a high standard (`good' is a nonstandard dialectal variant for `well')
26 Synset('thoroughly.r.02') --- completely and absolutely (`good' is sometimes used informally for `thoroughly')

 

Printing the senses (lemmas) in the synset happiness.n.01, and then just the lemma names.

lemmas = wn.synset('happiness.n.01').lemmas()
lemma_names = wn.synset('happiness.n.01').lemma_names()

for lemma in lemmas:
    print(lemma)
    
print("Lemma names:",  lemma_names)

 

Output:

Lemma('happiness.n.01.happiness')
Lemma('happiness.n.01.felicity')
Lemma names: ['happiness', 'felicity']

 

Iterating over all 117k synsets.

for syn in list(wn.all_synsets()):
    print(syn)

 

Iterating over only verbs.

for syn in list(wn.all_synsets(pos = 'v')):
    print(syn)

 

Accessing the lexical relations (synonyms and antonyms) of a synset.

synonyms = []
antonyms = []

for syn in wn.synsets("good"):
	for lemma in syn.lemmas():
		synonyms.append(lemma.name())
		if lemma.antonyms():
			antonyms.append(lemma.antonyms()[0].name())

print("Synonyms of good: ", set(synonyms))
print("Antonyms of good: ", set(antonyms))

 

Output:

Synonyms of good:  {'soundly', 'sound', 'beneficial', 'adept', 'right', 'unspoiled', 'full', 'respectable', 'secure', 'well', 'honest', 'serious', 'practiced', 'undecomposed', 'good', 'expert', 'thoroughly', 'salutary', 'in_force', 'safe', 'goodness', 'trade_good', 'commodity', 'skillful', 'effective', 'honorable', 'in_effect', 'unspoilt', 'proficient', 'just', 'estimable', 'upright', 'dear', 'skilful', 'near', 'dependable', 'ripe'}
Antonyms of good:  {'evil', 'badness', 'evilness', 'bad', 'ill'}

 

Accessing the synsets linked to the adjective good.a.01 via some some semantic relations.

print(wn.synset('good.a.01').also_sees()) 
print(wn.synset('good.a.01').similar_tos()) 

 

Output:

[Synset('best.a.01'), Synset('better.a.01'), Synset('favorable.a.01'), Synset('good.a.03'), Synset('obedient.a.01'), Synset('respectable.a.01')]
[Synset('bang-up.s.01'), Synset('good_enough.s.01'), Synset('goodish.s.01'), Synset('hot.s.15'), Synset('redeeming.s.02'), Synset('satisfactory.s.02'), Synset('solid.s.01'), Synset('superb.s.02'), Synset('well-behaved.s.01')]

 

Accessing the synsets linked to the noun dog.n.01 via the hyponymy relation.

print(wn.synset('dog.n.01').hyponyms()) 

 

Output:

Synset('cur.n.01'), Synset('dalmatian.n.02'), Synset('great_pyrenees.n.01'), Synset('griffon.n.02'), Synset('hunting_dog.n.01'), Synset('lapdog.n.01'), Synset('leonberg.n.01'), Synset('mexican_hairless.n.01'), Synset('newfoundland.n.01'), Synset('pooch.n.01'), Synset('poodle.n.01'), Synset('pug.n.01'), Synset('puppy.n.01'), Synset('spitz.n.01'), Synset('toy_dog.n.01'), Synset('working_dog.n.01')]

 

Checking semantic similarity between synsets using wup similarity

word1 = wn.synset('mammal.n.01') 
word2 = wn.synset('dog.n.01') 
print(word1.wup_similarity(word2)) 

word1 = wn.synset('mammal.n.01') 
word2 = wn.synset('spider.n.01') 
print(word1.wup_similarity(word2)) 

word1 = wn.synset('mammal.n.01') 
word2 = wn.synset('car.n.01') 
print(word1.wup_similarity(word2)) 

 

Output:

0.8333333333333334
0.6666666666666666
0.38095238095238093

 

WordNet has so much more amazing features to offer, and more info can be found on their official website. The Global WordNet Association also contains a WordNet for many foreign languages. The examples shown above can give you an idea about how powerful WordNet is, especially in NLP and text manipulation.

 

1

Mohammad D.

Mohammad D. works with sentiment anlaysis, NLP and Python. He loves to blog about these and other related topics in his free time.
LinkedIn

Leave a Reply