In the process of using Vox you might ask yourself how the aggregation and other computations work behind the scenes. Here we describe some of the details so you can make more informed decisions about what’s being shown to you in the interface. For all of the details please see our VAST 2010 paper describing the system.
- The search box works by filtering for messages that contain all keywords entered (i.e. the keywords are ANDed).
- If you add the minus character, “-”, (e.g. -keyword) to the beginning of a keyword then it filters for messages NOT containing that keyword.
- If you put several keywords in quotes (e.g. “keyword1 keyword2″) then it filters for messages containing that exact quote.
Various filters can be applied to the dataset by clicking on the filter toggle buttons. Here’s what the filters do:
- Quotes: Filters for any message containing the ” character
- RTs: Filters for retweet messages containing “RT”
- Links: Filters for messages containing “http” or “www”
- Positive: Filters for messages that have been labeled as containing positive words (see below)
- Negative: Filters for messages that have been labeled as containing negative words (see below)
- Inquiries: Filters for messages containing a question
- More Novel: Filters for messages that are a bit different than the other messages
- More Relevant to Transcript: Filters for messages containing words similar to those used in the event audio transcript.
Below is what the sentiment bar looks like in the interface. For each time interval the color of the bar represents the polarity (positivity or negativity) of the aggregate of all messages in that interval. If the ratio of positive to negative messages is between 0.45 and 0.55 for an interval, that interval is labeled as “controversial”. If there are no positive or negative messages for an interval it is marked as “neutral”.
The degree of positivity or negativity is computed by using a dictionary of words that was produced in previous research. This method of sentiment analysis is not 100% accurate. In fact, accuracy is substantially lower; in the 50-60% range. Keep in mind that Twitter messages are very short. There’s a lot of slang, abbreviations, and acronyms that make this kind of text analysis difficult to do accurately, even for humans judging the sentiment. We do not show the absolute graphs of positive and negative messages because of the limited accuracy, however, the sentiment bar can still be used to get an idea for the dominant sentiment trends.
Salient Key Words
Toward the bottom of the Vox interface you’ll see a panel of words, pictured below. These keywords are laid-out so that they appear along the time-line at roughly the time when they occur in the aggregate social media. Some words may appear more than once: this means that they are salient at different points in the time-line. The words are extracted by looking at how often they occur in a given interval as well as how infrequently they occur in all other intervals.