12 Days of Data Analytics: Day 7 – Ask Twitter

What does Ireland want for Christmas? A fun trick we like to use every now and then to answer this type of question is to ask Twitter. This is pretty easily done using the Twitter streaming API and a little bit of very simple text processing. The Twitter streaming API allows you to collect a portion of the Twitter stream based on some filtering criteria – in particular search terms and locations (in spite of the fact that very few tweets have exact locations associated with the, Twitter locates most tweets to their nearest city). There are really nice APIs for most programming languages to make this especially easy – for example twitter4j for Java or tweepy for Python.

So, we might use the Twitter streaming API to collect tweets from Ireland (about latitude 51.28 to 55.32 and longitude -11.05 to -5.34) containing the text “all i want for christmas is” (we might also include some simple variants like “all i want for xmas is” and “all we want for christmas is“). Very quickly and easily we can collect a very large number of tweets – we ran this and collected 10,000 tweets over a few days. A simple word cloud (from the great tool Wordle) illustrating the frequency of words in this collection (with “all“, “i“, “want“, “for“, “christmas“, “is” and some other comment words like “and“, “a“, and “the” left out) shows Ireland’s romantic soul – the most popular thing that Ireland wants for Christmas is “you“, whoever you are!

After you are wishes are a little more mundane: “attention“, “sweaters” and a “third term” for some politically minded tweeters.

Word clouds are useful but have their detractors. A more useful way to visualise this type of data is using a word tree visualisation (invented by Martin Wattenberg and Fernanda Viégas). A word tree shows us the root sentence “all i want for christmas is” and the different paths that can follow. D3 whizz Jason Davies has created a great online tool for generating wordtrees from any text. the image below shows the first level of our tree and the interactive version can be accessed at Jason Davies’ wordtree page by pasting in the text from this text file (while we have made an serious effort to clean out the text in this file this is a relatively raw collection of thousands of tweets and features all that is good and bad of Twitter so be a little careful of it).

Digging further into the tree we can see that people’s wants range from a new jobs, to cars, to presidents!

And lastly for the academically minded amongst you we can see many people’s thoughts turning to exams!

The example here is a fun little trick, but this use of Twitter data can be genuinely useful for monitoring trends. The trick is to carefully chose search phrases and be careful of disappearing too far into the tree phrases that maybe only occur once or twice.