Unsupervised Lexicon-based Sentiment Analysis with TextBlob
Unsupervised Lexicon-based Sentiment Analysis with TextBlob
* 3 min read

In this article, we will construct a very simple end-to-end unsupervised sentiment analysis model. It is completely unsupervised because there is no requirement for any training data. The model only relies on a lexicon of predefined positive and negative words. The model also includes some contextual syntactic rules like handling negation, and increasing the overall sentiment strength of words that are preceded by intensifier words.

 

Problem definition

Taking a small corpus of movie reviews, the aim is to classify them as “positive”, “negative”, or “neutral”. We will use TextBlob, which is a Python library that has a readily available built-in sentiment classifier. It uses a sentiment lexicon, along with some syntactic rules, to mark a text with a polarity and a subjectivity score. We will then graphically display the number of reviews labeled positive, negative and neutral in a pie chart using matplotlib.

 

How TextBlob works

TextBlob scores each word for polarity (-1 to +1) and subjectivity (0 to 1). It negates any polar word that is preceded by a negation such as never or not (multiplying it with -0.5). It boosts the strength of a polar word that is preceded by an intensifier such as very or extremely (multiplying it with a value from 0.5 to 0.9, based on the intensifier). Finally, it diminishes the strength of a polar word that is preceded by a diminisher such as hardly or barely (multiplying it with a value from 1.1 to 2.0, based on the diminisher). Punctuation such as exclamation marks are also considered, giving a higher overall polarity.

 

Quick examples

Let us try a quick polarity classification example using TexBlob.

from textblob import TextBlob
text = TextBlob("I love to use TextBlob. It is a great tool.")
print(text)
print("Polarity is: ", text.sentiment.polarity)
print("Subjectivity is: ", text.sentiment.subjectivity)

 

Output:

I love to use TextBlob. It is a great tool.
Polarity is:  0.65
Subjectivity is:  0.675

 

Let’s try a negative text.

text = TextBlob("I hate to use TextBlob. It is not a great tool.")
print(text)
print("Polarity is: ", text.sentiment.polarity)
print("Subjectivity is: ", text.sentiment.subjectivity)

 

Output:

I hate to use TextBlob. It is not a great tool.
Polarity is:  -0.6000000000000001
Subjectivity is:  0.825

 

Now let’s try a few more examples to see how TextBlob handles them.

text = TextBlob("It is a good tool.")
print(text)
print("Polarity is: ", text.sentiment.polarity)
print("Subjectivity is: ", text.sentiment.subjectivity, "\n")

text = TextBlob("It is a good tool!") # with exclamation mark
print(text)
print("Polarity is: ", text.sentiment.polarity)
print("Subjectivity is: ", text.sentiment.subjectivity, "\n")
text = TextBlob("It is a very good tool!") # with an intensifier
print(text)
print("Polarity is: ", text.sentiment.polarity)
print("Subjectivity is: ", text.sentiment.subjectivity, "\n")
text = TextBlob("It is not a very good tool!") # with a negator
print(text)
print("Polarity is: ", text.sentiment.polarity)
print("Subjectivity is: ", text.sentiment.subjectivity)

 

The output shows that TextBlob handles them pretty well.

It is a good tool.
Polarity is:  0.7
Subjectivity is:  0.6000000000000001 
It is a good tool!
Polarity is:  0.875
Subjectivity is:  0.6000000000000001 
It is a very good tool!
Polarity is:  1.0
Subjectivity is:  0.7800000000000001 
It is not a very good tool!
Polarity is:  -0.33653846153846145
Subjectivity is:  0.46153846153846156

 

The full code

So now we can apply this on a dataset (link here) that consists of some short movie reviews, and output the polarity label, polarity weight, and subjectivity of each. After we iterate over all reviews, we finally use matplotlib to generate a pie chart to show the positive, negative and neutral reviews as a summary.

 

The full output is too long to display here, so we just take the output for the final two reviews, and the pie chart. The final summary in the pie chart consists of 31 reviews labeled positive, 12 labeled negative, and 12 labeled neutral.

Doc 54:
 I was not happy with this movie. What a waste of time!
Polarity: Negative
Polarity weight: -0.325
Subjectivity weight: 0.5

Doc 55:
 I was happy with the plot twist. Overall a great and enjoyable movie!
Polarity: Positive
Polarity weight: 0.55625
Subjectivity weight: 0.5875

Final labeled docs:
 Positive docs: 31 Negative docs: 12 Neutral docs: 12
 Pie chart... 

 

nlp textblob sentiment analysis

 

Final words

This was a brief lexicon-based sentiment analysis model using TextBlob. As shown, TextBlob is very easy and straightforward to use. Although we did not test for accuracy, we can see, out of the box, that it performs fairly well and does a decent job of taking advantage of context to give a final polarity and subjectivity score to text.

 

2

Mohammad D.

Mohammad D. works with sentiment anlaysis, NLP and Python. He loves to blog about these and other related topics in his free time.
LinkedIn

Leave a Reply