I hope you’re enjoying the “Advanced Analytics Introduction” blog post series; here is a link to the previous segments (Step One, Step Two, Step Three, Step Four and Step Five) to provide some helpful background. In the previous , I reviewed the practice of “topic mining” which is one of the steps in text mining and analysis activities used to get a deeper analytical view from text. In this post, I will explore the concept of sentiment analysis as part of word analysis and mining techniques.
Sentiment analysis is by far one of the most popular features of text analytics. As a search term in Google, it went from almost non-existent in 2004 and peaked last year.
According to a study by the University of Oulu (OYO) in Finland (Mäntylä et al., 2018), 99% of the papers on the topic of sentiment analysis were published after 2004. This study performed topic mining on 6,996 papers from the publisher Elsevier’s abstract and citation database called “Scopus.” According to Elsevier, this is the largest abstract and citation database of peer-reviewed literature in the world.
Furthermore, this study corroborates our findings from Google Trends. According to the researchers at OYO, sentiment analysis is “one of the fastest growing research areas in computer science.” The study also shows that the availability of online product reviews was one of the key drivers for sudden growth. What is absolutely fascinating about this study is that only 101 papers about this topic were published in 2005, but then in 2015, the number jumped to nearly 5,699. This is almost a 5,542% increase in just ten years!
Figure 1. Google Searches on Sentiment Analysis Over Time (Google Trends, 2019)
According to one of the most preeminent scholars in the area of text analytics, Professor Bing Lui Ph.D. (UIC), sentiment analysis is defined as follows, “Sentiment analysis, also called opinion mining, is the field of study that analyzes people’s opinion, sentiments, appraisals, attitudes, and emotions toward entities and their attributes expressed in the written text. The term sentiment analysis is more often used in industry and the term and opinion mining is most often used in the world of academia.”
The general process of sentiment analysis involves the following steps shown below. Many of these items were discussed in previous blogs (Step One, Step Two, Step Three, Step Four, and Step Five) within our “Advanced Analytics” series.
Figure 2. Illustration of the Sentiment Analysis Process (medium.com, 2018)
The first few steps in this process are related to data pre-processing, and are the most critical to get correct (to avoid “garbage in, garbage out”). Here is a summary of each of the steps displayed above:
Text Input: Inputting text to begin the process
Tokenization: Isolating individual words from a body of text
Stop Word Filtering: Removing words such as “the,” “a,” “an,” and “in”
Negation Handling: Comprehending the effect negative words have on a sentence
Stemming: Reducing words to their root meaning to improve the odds of finding variant meanings of the same word
Once these steps are completed, then we’re able to move on to the actual process of analyzing sentiments. Here is a summary of each of the final steps:
Classification: Determining the positive or negative tone of text
Sentiment Class: Associating the text with a sentiment class
There are several types of sentiment analysis and this is by no means an exhaustive list. The point here is to make sure that you are aware that there are multiple dimensions involved in getting the complete picture of sentiment. We can have sentiments related to:
Polarity (positive, negative, neutral): This is a 30,000-foot view of sentiment
Emotions (angry, happy, sad, etc.): We can look for keywords that would give us a sense of the person’s emotions when they expressed their opinion; this could either be direct or inferred emotions
Intentions (e.g. interested v. not interested): Perhaps one of the most difficult and valuable is to determine true intent; there is a great article in the Harvard Business Review, which questions the true predictive power of sentiment analysis in measuring true intentions
Fine-grained (amount of polarity, emotion, or intention): It’s not enough to simply say a person was negative or angry because it may not provide the full context of their experience
Aspect-based sentiment analysis (product or service related): Lastly, it’s critical to assess, if possible, a person’s experience with a product or service.