In this article, we will construct a very simple end-to-end unsupervised sentiment analysis model. It is completely unsupervised because there is no requirement for any training data. The model only relies on a lexicon of predefined positive and negative words. The model also includes some contextual syntactic rules like handling negation, and increasing the overall sentiment strength of words that are preceded by intensifier words.
Problem definition
Taking a small corpus of movie reviews, the aim is to classify them as “positive”, “negative”, or “neutral”. We will use TextBlob, which is a Python library that has a readily available built-in sentiment classifier. It uses a sentiment lexicon, along with some syntactic rules, to mark a text with a polarity and a subjectivity score. We will then graphically display the number of reviews labeled positive, negative and neutral in a pie chart using matplotlib.
How TextBlob works
TextBlob scores each word for polarity (-1 to +1) and subjectivity (0 to 1). It negates any polar word that is preceded by a negation such as never or not (multiplying it with -0.5). It boosts the strength of a polar word that is preceded by an intensifier such as very or extremely (multiplying it with a value from 0.5 to 0.9, based on the intensifier). Finally, it diminishes the strength of a polar word that is preceded by a diminisher such as hardly or barely (multiplying it with a value from 1.1 to 2.0, based on the diminisher). Punctuation such as exclamation marks are also considered, giving a higher overall polarity.
Quick examples
Let us try a quick polarity classification example using TexBlob.
from textblob import TextBlob text = TextBlob("I love to use TextBlob. It is a great tool.") print(text) print("Polarity is: ", text.sentiment.polarity) print("Subjectivity is: ", text.sentiment.subjectivity)
Output:
I love to use TextBlob. It is a great tool. Polarity is: 0.65 Subjectivity is: 0.675
Let’s try a negative text.
text = TextBlob("I hate to use TextBlob. It is not a great tool.") print(text) print("Polarity is: ", text.sentiment.polarity) print("Subjectivity is: ", text.sentiment.subjectivity)
Output:
I hate to use TextBlob. It is not a great tool. Polarity is: -0.6000000000000001 Subjectivity is: 0.825
Now let’s try a few more examples to see how TextBlob handles them.
text = TextBlob("It is a good tool.") print(text) print("Polarity is: ", text.sentiment.polarity) print("Subjectivity is: ", text.sentiment.subjectivity, "\n") text = TextBlob("It is a good tool!") # with exclamation mark print(text) print("Polarity is: ", text.sentiment.polarity) print("Subjectivity is: ", text.sentiment.subjectivity, "\n") text = TextBlob("It is a very good tool!") # with an intensifier print(text) print("Polarity is: ", text.sentiment.polarity) print("Subjectivity is: ", text.sentiment.subjectivity, "\n") text = TextBlob("It is not a very good tool!") # with a negator print(text) print("Polarity is: ", text.sentiment.polarity) print("Subjectivity is: ", text.sentiment.subjectivity)
The output shows that TextBlob handles them pretty well.
It is a good tool. Polarity is: 0.7 Subjectivity is: 0.6000000000000001 It is a good tool! Polarity is: 0.875 Subjectivity is: 0.6000000000000001 It is a very good tool! Polarity is: 1.0 Subjectivity is: 0.7800000000000001 It is not a very good tool! Polarity is: -0.33653846153846145 Subjectivity is: 0.46153846153846156
The full code
So now we can apply this on a dataset (link here) that consists of some short movie reviews, and output the polarity label, polarity weight, and subjectivity of each. After we iterate over all reviews, we finally use matplotlib to generate a pie chart to show the positive, negative and neutral reviews as a summary.
The full output is too long to display here, so we just take the output for the final two reviews, and the pie chart. The final summary in the pie chart consists of 31 reviews labeled positive, 12 labeled negative, and 12 labeled neutral.
Doc 54: I was not happy with this movie. What a waste of time! Polarity: Negative Polarity weight: -0.325 Subjectivity weight: 0.5 Doc 55: I was happy with the plot twist. Overall a great and enjoyable movie! Polarity: Positive Polarity weight: 0.55625 Subjectivity weight: 0.5875 Final labeled docs: Positive docs: 31 Negative docs: 12 Neutral docs: 12 Pie chart...
Final words
This was a brief lexicon-based sentiment analysis model using TextBlob. As shown, TextBlob is very easy and straightforward to use. Although we did not test for accuracy, we can see, out of the box, that it performs fairly well and does a decent job of taking advantage of context to give a final polarity and subjectivity score to text.
2