What is Sentiment Analysis?
Sentiment Analysis, often called Opinion Mining, is a natural language processing subtask that involves automatically detecting sentiment, emotion and subjectivity in text. The aim is to take a set of texts, and classify them into categories of positive, negative or neutral, based on their syntactic and/or semantic properties.
Sentiment Analysis Approaches
There are primarily three main approaches in SA, which are unsupervised, supervised and deep learning. The supervised approach typically makes use of a sentiment lexicon (a predefined list of positive and negative words) and some hand-crafted syntactic rules to classify a text according to the sentiment it expresses. The supervised approach makes use of supervised machine learning classifiers for this task (Support Vector Machines, Naive Bayes, Random Forest, etc.). The deep learning approach involves making use of artificial neural networks, the functionality of which is inspired by the human brain. A hybrid approach is also possible (e.g. using a lexicon of words tagged with sentiment weights as features for a supervised machine learning classifier). Regardless of the approach, sentiment can never be accurately measured by relying on only syntactical information and lexical affinity, e.g., word matching using a lexicon or a supervised classifier. A level of context-aware semantic interpretation is crucial in order to capture the true underlying sentiment expressed.
History of Sentiment Analysis
Before the widespread use of the Internet, people used to rely on word-of-mouth recommendations. Organizations relied on expensive strategies to collect sentiment by their customers. Ever since the foundation of Web 2.0 in 2000, however, people started to generate an unlimited volume of text data online, in the form of blog articles, forum discussions, social media posts, microblogs, etc. They use the Web and social media networks primarily to express their opinions, sentiments and emotions, allowing for infinite real-time text streams of valuable sentiment-carrying text data. Many natural language processing disciplines have come to life with the availability of this unlimited rich source of subjective text data. Sentiment analysis is no exception. In fact, it is among the trending NLP tasks today.
Why is it so Important?
This subjective data is useful to people who intend to buy a certain product or service. They can use it to see what others have to say about a product before buying it. For-profit companies can also use this data to monitor what consumers have to say about their products, and then make marketing decisions based on customer sentiment. Government organizations can use it to collect insights on sentiment by the public toward new policies, elected government officials, and so on. However, in raw form, text is messy, noisy and unstructured, especially social media style text. Applying sentiment analysis on this sentiment-laden textual data and deriving valuable insights for business decision-making, brand monitoring, customer support, product development and marketing campaigns has proven to be extremely effective for organizations.
SA has been applied in every domain and industry that exists today. Some domains are:
- products/service reviews
- movie/book reviews
- customer relationship management
- stock market prediction
- government intelligence and politics
- recommendation systems
- cyber bullying and flame detection
- search engine optimization and ad placement
- disaster recovery and management
- health care
- Binary classification: Involves classifying text documents to either the positive or negative classes.
- Three-class: Involves positive, negative and neutral classes.
- Multi-class: Involves a set of discrete classes that can also correspond to sentiment strength, e.g. very positive, mildly positive, neutral (no sentiment), mildly negative, very negative.
- Numerical output: Involves a numerical score that is often represented as the intensity or strength of the sentiment expressed. A value closer to 0 expresses a low strength, while a value closer to 1 expresses a high strength. This can also be considered a regression type of problem.
- Emotion classification: Involves classifying text into a set of classes representative of the fundamental human emotions, e.g. anger, sadness, fear, joy, surprise, etc.
- Appraisal classification: Involves classifying text into Affect, Appreciation and Judgement
- Subjectivity Classification: Involves first categorizing text into Subjective and Objective (factual), and then categorizing the Subjective text into positive or negative.
The granularity of the text to be parsed can vary based on the task required.
- Document-level analysis: Involves measuring the sentiment polarity of the document as a whole. This assumes the document talks about one certain topic.
- Sentence-level analysis: Involves measuring the polarity of each sentence in a document. However, a sentence can comprise mixed sentiment (“The movie was good but it was too long”).
- Clause-level analysis: Involves a finer-grained analysis compared to the above, and works on each clause at a time.
- Aspect-based analysis: Involves detecting an entity (e.g. laptop) and aspects of the entity (e.g. weigh, battery life, screen size, cpu, etc.), linking sentiment expression to each aspect, and then aggregating the overall sentiment of each aspect. This is sometimes called aspect-based sentiment summarization, and is popular in product reviews.
- Term-level analysis: Involves classification of individual words, and can be used to compile a sentiment dictionary of predefined positive and negative words.
- Sense-level analysis: Involves classification of the individual senses of polysemous words. This is useful when disambiguation of words appearing in text is needed before the text can be classified. This can be used to compile a sense-level sentiment dictionary.
One step beyond measuring the sentiment polarity of a text is to identify the underlying emotions expressed. This is called emotion recognition, and is inextricably tied to the discipline of sentiment analysis, and also overlaps with other areas such as human computer interaction, and affective computing. The main goal is to classify text according to the fundamental human emotions (anger, sadness, fear, joy, etc.), and this can be done using affective lexica or supervised machine learning algorithms. Emotions can also be explicitly written in a text, or can be expressed in an implicit or indirect manner. So this is a challenging area, but can provide more fine-grained information as opposed to polarity alone. Another field that overlaps with sentiment is subjectivity detection, which involves classifying text according to subjective (opinionated) or objective (factual) content. Multi-model sentiment and emotion analysis is also a recent topic that is made up of the combination of text, visual and audio information to detect sentiment or emotion.
Sentiment analysis can be implemented practically using many different tools. Python is a great tool to use, since this task involves dealing with textual data, and Python has proven to be reliable at this. Some libraries that have built in functionality for sentiment analysis are NLTK, TextBlob, Vader Sentiment and spaCy. These libraries also perform text preprocessing tasks required to clean the text before applying sentiment analysis.
That’s all for now! Please drop a comment below on your thoughts! 🙂