Making the Most of Sentiment Scores Using IKANOW and R
Despite sentiment analysis’ relative trendiness, most algorithms that extract sentiment often fail to capture the intricacies of language and context when run against generic data sets. The raw scores they produce are, at best, approximations and, at worst, misleading.
Defenders of automated sentiment analysis argue that to be the most accurate, a sentiment algorithm needs to be trained on data to ensure that it is tuned for the subject domain and time-frame being investigated.
Unfortunately many vendors and services that provide sentiment scoring don’t always support such customization and it can be difficult to determine the potential impact this limitation can have on business decisions.
This post explores a way to maximize the strengths of sentiment scoring (e.g. automation, speed, consistency), while minimizing the weaknesses (e.g. difficulty understanding context, improper classification). It can be applied to any large body of textual information where sentiment scores are available for each document and where each document can be reasonably attributed to a single author.
Using data processing techniques similar to some of our previous examples, we constructed sentiment indicators from the Enron emails. This technique let us use sentiment scores from text in the same way we might use context clues or non-verbal cues in a conversation, not as an objective themselves but as a smart trigger to investigate further with additional questions.
- We assumed for the purpose of this study that any sentiment expressed in an email can be attributed to the sender. There are instances where this may not be the case (e.g. an individual forwards an email they disagree with without providing context in the body of the email), but on the whole we expected these to be fringe cases which would be made clear during a final analyst review.
- While the Enron emails are the largest set of public emails available, they are not a true and complete representation of what an real organization’s emails may actually look like. Many of the original emails have been redacted or removed in response to individual privacy requests. Additionally, not all email senders were active targets of investigation, so their presence in the set is naturally uneven. Email coverage for each of the 211 individual senders ranges from late 1998 to mid 2002, with some email addresses only represented for a few months. This uneven distribution means that some judgement will have less data. Care should be taken in an operational setting to highlight these cases.
- To counteract the uneven coverage, we estimated some weekly sentiment values using R functionality. This step may not be necessary for data sets where you can expect a more constant stream from all users, but it was useful in this example. This process required us to make certain assumptions about missing values, but it worked out to be fairly functional by distributing an observed change in sentiment evenly across the number of weeks of missing values (e.g. the more missing values between observations, the less we assume sentiment changed on an individual weekly basis).
Phase 1: Initial Analysis
- We started our analysis by processing all 500K+ Enron emails through the Ikanow analytic engine, a pipeline of entity extractors and processing logic that produced a metadata schema consisting of entities, associations, and some basic statistics for each email in the set. Sentiment measurements, where available, were included on each entity. This example focused specifically on keywords, but other entities and associations were also available.
- Next, the Ikanow Hadoop / MapReduce architecture allowed us to easily aggregate keywords with sentiment in each e-mail, giving each document an average sentiment value. The scored emails were then aggregated by the email address of the sender over a given week and normalized to produce a weekly average sentiment value.
- We accessed the MapReduce output via Ikanow’s REST API and performed analysis on the data using a variety of R packages and RStudio. A GET custom results API call produced a JSON output which we converted into an R data frame using the ‘rjson‘ and ‘plyr‘ packages.
- The process to this point still left us mostly looking at raw sentiment measurements. Aggregating the sentiment scores weekly helped to compensate for some of the potential error in individual measurements but we wanted to take it a step further by looking for meaningful trends in the relative values of the scores. We used several R packages to perform the visualization and analysis necessary to accomplish these tasks.
- The resulting image (left image) showed that some shifts in sentiment were relatively gradual, while others were abrupt. We hypothesized that the weeks where abrupt negative shifts occurred would be most likely to point to significant problems within the organization.
To isolate these weeks, we looked for records in the data frame where a) the individual expressed negative sentiment, and b) there was an above-average negative shift in sentiment from the previous week. The mean sentiment change when the sentiment shift was in the negative direction was -0.161 points. For our purposes, we doubled this score to represent individuals who experienced a sharper than average decline.
The new visualization (right image) was produced with a few modifications in our original ggplot2 graphic.
Above: In both graphics above, each horizontal bar represents one sender’s weekly average sentiment across the time period. On the left side, positive sentiment values are shaded green and negative sentiment values dark red. On the right, weeks that display characteristics targeted by our heuristic are shaded pink with all others shaded blue.
Below: The two screenshots below show a subset of 20 emails from the larger view above.
Phase 2: Follow-On Analysis
- The sentiment indicators highlighted 801 of the 11,500 possible weeks worth of information. How you proceed from this point would depend on the specific analytic task to be accomplished. It may be worth performing a qualitative review of the flagged emails by a trained analyst, or it may be prudent to perform some additional automated analysis to further whittle down the list. With either approach, using sentiment indicators provide a substantially reduced amount of data for additional processing.
- The properties of these flagged bins can be used to perform queries back on the document set. These query results provide a distilled set of documents that an analyst can review manually to find the major drivers of negative sentiment. For each flagged week, we can query for all emails that were sent by the email address and the keywords with most negative sentiment from that time period to zero in on emails for qualitative review.
- The Ikanow user interface offers these options graphically, but queries can also be performed via the API. The results, to include the aggregations of entities and associations from the documents matching the query, can be brought into another platform like R, just as we did with the map reduce output table.
- We chose the latter route and accessed the query results directly from RStudio for each of the flagged weeks. We appended the top negative keywords to each record or a warning message for those weeks where we estimated the drop. A sample of the resulting output is below:
- The combinations of date, email, and negative keywords make the task of calling up relevant emails very simple. Using the third example from the list above points precisely to one email in a targeted week where the author, Larry Campbell, expressed negative sentiment around environmental inspections and their consequences. Depending on the analytic objective, this piece of information by itself may be useful or as part of a larger set of documents created by clustering the results by keywords or dates.
Caption: Querying the emails using the Ikanow interface is simple. The query builder (top images) allows users to search for specific associations and limit the results both geographically and temporally. Other widgets allow users to add the top negative keywords into the query to further refine the results.
Conclusion There is certainly room for additional sophistication to this process, but it is sufficient to show the basic concept of using sentiment scores an indicator to prioritize and streamline follow-on analysis, with this example reducing the possible workload by 93%. While we only processed emails, the same tools could easily be applied to a variety of other textual content ranging from tweets to forum posts, or even a mix of the two thanks to Ikanow’s unified semi-structured data format. Mixing data sets such as tweets and forum posts would be particularly powerful if your data contains links between unique identifiers like forum account names and email addresses. We only looked for negative shifts in sentiment, but by simply ingesting different data and changing some of the parameters entirely new applications can be built off this template. Individuals with consistently high sentiment who occasionally dip down could identify problems that loyal customers may be having, while looking for missing sentiment data altogether may help isolate areas where an established collection strategy is weak. If you’re interested in replicating this process on your own data, we’ll be releasing a short how-to guide which will go into this example in more detail and include samples of the code we used to produce these results.