What is word sense disambiguation?
Word sense disambiguation (WSD) is an NLP task that involves solving the problem of ambiguity of words, based on their surrounding context. This is easy for humans to do, but not so much for machines. They have to rely on some kind of semantic knowledge base to map a polysemous word to its correct sense or meaning. WordNet, a lexical knowledge base that groups sets of synonyms (synsets) together based on their meanings, is especially helpful in WSD. WordNet synsets correspond to a gloss (formal definition), and are linked together in a semantic word network, hence, the name ‘WordNet’. Various relationships among synsets link them together in the network.
In English, words can be very ambiguous, and their meanings depend on the context in which they are applied in. Consider “I deposited money into my bank account”, and “I saw some fish swim by in the water when walking along the river bank.” The meaning of bank is different in each context (financial institution vs river). The aim of WSD is to computationally identify the meaning of bank, based on the meanings of the words surrounding it. There are several available WSD algorithms available, some of which are:
Some foundation WSD algorithms
- Lesk (Lesk, 1986)
- Extended Lesk (Banerjee and Pederson, 2002/2003)
- Simple Lesk Variation
- Maximizing Similarity
- Path similarity (Wu-Palmer, 1994; Leacock and Chodorow, 1998)
- Information Content (Resnik, 1995; Jiang and Corath, 1997; Lin, 1998)
Lesk WSD Implementation in NLTK
The following code disambiguates the word bank, based on its context of use. The Lesk algorithm is used in this case, and the word is mapped to the synset that it best belongs to. The synset and its definition are chosen from among multiple synsets.
import nltk from nltk.wsd import lesk from nltk.corpus import wordnet as wn counter = 1 print("All available synsets for \'bank\':") for syn in wn.synsets('bank'): print(counter, syn, syn.definition()) counter += 1 text = """I deposited some money in cash at the bank.""" text = text.strip(".") text = text.split() text = nltk.pos_tag(text) disambiguated_word = (lesk(text, 'bank', 'n')) print("\n***bank most closely resembles: ", disambiguated_word, disambiguated_word.definition())
All available synsets for 'bank': 1 Synset('bank.n.01') sloping land (especially the slope beside a body of water) 2 Synset('depository_financial_institution.n.01') a financial institution that accepts deposits and channels the money into lending activities 3 Synset('bank.n.03') a long ridge or pile 4 Synset('bank.n.04') an arrangement of similar objects in a row or in tiers 5 Synset('bank.n.05') a supply or stock held in reserve for future use (especially in emergencies) 6 Synset('bank.n.06') the funds held by a gambling house or the dealer in some gambling games 7 Synset('bank.n.07') a slope in the turn of a road or track; the outside is higher than the inside in order to reduce the effects of centrifugal force 8 Synset('savings_bank.n.02') a container (usually with a slot in the top) for keeping money at home 9 Synset('bank.n.09') a building in which the business of banking transacted 10 Synset('bank.n.10') a flight maneuver; aircraft tips laterally about its longitudinal axis (especially in turning) 11 Synset('bank.v.01') tip laterally 12 Synset('bank.v.02') enclose with a bank 13 Synset('bank.v.03') do business with a bank or keep an account at a bank 14 Synset('bank.v.04') act as the banker in a game or in gambling 15 Synset('bank.v.05') be in the banking business 16 Synset('deposit.v.02') put into a bank account 17 Synset('bank.v.07') cover with ashes so to control the rate of burning 18 Synset('trust.v.01') have confidence or faith in ***bank most closely resembles: Synset('savings_bank.n.02') a container (usually with a slot in the top) for keeping money at home
Please share your thoughts in the comments! 🙂