I hope you’re enjoying the “Advanced Analytics Introduction” blog post series; here is a link to the previous segments (Step One, Step Two, Step Three, Step Four and Step Five) to provide some helpful background. In the previous , I reviewed the practice of “topic mining” which is one of the steps in text mining and analysis activities used to get a deeper analytical view from text. In this post, I will explore the concept of sentiment analysis as part of word analysis and mining techniques.
Sentiment analysis is by far one of the most popular features of text analytics. As a search term in Google, it went from almost non-existent in 2004 and peaked last year.
According to a study by the University of Oulu (OYO) in Finland (Mäntylä et al., 2018), 99% of the papers on the topic of sentiment analysis were published after 2004. This study performed topic mining on 6,996 papers from the publisher Elsevier’s abstract and citation database called “Scopus.” According to Elsevier, this is the largest abstract and citation database of peer-reviewed literature in the world.
Furthermore, this study corroborates our findings from Google Trends. According to the researchers at OYO, sentiment analysis is “one of the fastest growing research areas in computer science.” The study also shows that the availability of online product reviews was one of the key drivers for sudden growth. What is absolutely fascinating about this study is that only 101 papers about this topic were published in 2005, but then in 2015, the number jumped to nearly 5,699. This is almost a 5,542% increase in just ten years!
Figure 1. Google Searches on Sentiment Analysis Over Time (Google Trends, 2019)
According to one of the most preeminent scholars in the area of text analytics, Professor Bing Lui Ph.D. (UIC), sentiment analysis is defined as follows, “Sentiment analysis, also called opinion mining, is the field of study that analyzes people’s opinion, sentiments, appraisals, attitudes, and emotions toward entities and their attributes expressed in the written text. The term sentiment analysis is more often used in industry and the term and opinion mining is most often used in the world of academia.”
The general process of sentiment analysis involves the following steps shown below. Many of these items were discussed in previous blogs (Step One, Step Two, Step Three, Step Four, and Step Five) within our “Advanced Analytics” series.
Figure 2. Illustration of the Sentiment Analysis Process (medium.com, 2018)
The first few steps in this process are related to data pre-processing, and are the most critical to get correct (to avoid “garbage in, garbage out”). Here is a summary of each of the steps displayed above:
Text Input: Inputting text to begin the process
Tokenization: Isolating individual words from a body of text
Stop Word Filtering: Removing words such as “the,” “a,” “an,” and “in”
Negation Handling: Comprehending the effect negative words have on a sentence
Stemming: Reducing words to their root meaning to improve the odds of finding variant meanings of the same word
Once these steps are completed, then we’re able to move on to the actual process of analyzing sentiments. Here is a summary of each of the final steps:
Classification: Determining the positive or negative tone of text
Sentiment Class: Associating the text with a sentiment class
There are several types of sentiment analysis and this is by no means an exhaustive list. The point here is to make sure that you are aware that there are multiple dimensions involved in getting the complete picture of sentiment. We can have sentiments related to:
Polarity (positive, negative, neutral): This is a 30,000-foot view of sentiment
Emotions (angry, happy, sad, etc.): We can look for keywords that would give us a sense of the person’s emotions when they expressed their opinion; this could either be direct or inferred emotions
Intentions (e.g. interested v. not interested): Perhaps one of the most difficult and valuable is to determine true intent; there is a great article in the Harvard Business Review, which questions the true predictive power of sentiment analysis in measuring true intentions
Fine-grained (amount of polarity, emotion, or intention): It’s not enough to simply say a person was negative or angry because it may not provide the full context of their experience
Aspect-based sentiment analysis (product or service related): Lastly, it’s critical to assess, if possible, a person’s experience with a product or service.
There are also other subtleties related to opinions that must be examined. For instance, there are direct and comparative opinions:
“The picture quality of television A is poor”
“The picture quality of television A is better than that of television B.”
Many opinions in online customer reviews are explicit. These are the most straightforward types of opinions for performing sentiment analysis. For example:
“I’m disappointed in the sound quality of these headphones.”
On the other hand, implicit opinions are hidden and more difficult to extract. We can easily determine that the overall sentence is negative but may not specifically detect the emotion of “disappointment”.
“The headphones broke the day they arrived”
This second statement still expresses disappointment, which must be inferred by understanding the tone of the statement. To further illustrate this point, I took the second statement and tried it against a few online sentiment analysis solutions:
IBM Watson: Decided that the tone was “sadness”
Microsoft Azure: Was able to determine that the sentence was negative overall
Tinword: Selected “disgust” and “sadness”
Sentiment analysis is typically either performed using a rule-based system or machine learning. A rule-based system involves the creation of a lexicon of positive and negative words. A scripting language reads the text and checks it against the lexicon for the presence of any of the defined words.
While rule-based systems are fast, and on average, accurate, they have a major downside of not being able to detect complexities of language such as irony. As an example:
“Your customer support is killing me!”
“You are killing it with your customer service!”
The machine learning approach is the more “modern” approach. Machine learning involves training a statistical algorithm to understand sentiment based on multiple executions using a large text corpus. The downside of this approach is that you’re dependent on your sample text being representative of all the cases you may encounter. There are cultural and social dynamics that may influence the training of your machine learning model.
There are many more sentiment analysis challenges with regard to language. Here are a few more examples:
The level of subjectivity and tone
“The iPhone is nice” (it’s positive, but how can we qualify that?)
“The package is silver” (stating a fact)
The influence of context on sentence polarity
A travel review site may list either of the following as the title:
“What did you like about the trip?” versus “What did you NOT like about the trip”
However, responses do not always contain the original question. As you can see, the same answer below can have the opposite meaning depending on the context of the original thought.
“Just about every aspect!” versus “I can’t think of a single thing!”
Using comparative opinions creates many shades of grey
“This truck is second to none”
“This truck is better than the old model”
“This truck is better than nothing”
There are many practical applications of sentiment analysis, these are just a few:
Social media monitoring
Voice of customer
We hope you have found this blog series on text analytics informative. Please contact firstname.lastname@example.org if you have any questions or need further help!