What does Ireland want for Christmas? A fun trick we like to use every now and then to answer this type of question is to ask Twitter. This is pretty easily done using the Twitter streaming API and a little bit of very simple text processing. The Twitter streaming API allows you to collect a portion of the Twitter stream based on some filtering criteria – in particular search terms and locations (in spite of the fact that very few tweets have exact locations associated with the, Twitter locates most tweets to their nearest city). There are really nice APIs for most programming languages to make this especially easy – for example twitter4j for Java or tweepy for Python.
So, we might use the Twitter streaming API to collect tweets from Ireland (about latitude 51.28 to 55.32 and longitude -11.05 to -5.34) containing the text “all i want for christmas is” (we might also include some simple variants like “all i want for xmas is” and “all we want for christmas is“). Very quickly and easily we can collect a very large number of tweets – we ran this and collected 10,000 tweets over a few days. A simple word cloud (from the great tool Wordle) illustrating the frequency of words in this collection (with “all“, “i“, “want“, “for“, “christmas“, “is” and some other comment words like “and“, “a“, and “the” left out) shows Ireland’s romantic soul – the most popular thing that Ireland wants for Christmas is “you“, whoever you are!
After you are wishes are a little more mundane: “attention“, “sweaters” and a “third term” for some politically minded tweeters.
Word clouds are useful but have their detractors. A more useful way to visualise this type of data is using a word tree visualisation (invented by Martin Wattenberg and Fernanda Viégas). A word tree shows us the root sentence “all i want for christmas is” and the different paths that can follow. D3 whizz Jason Davies has created a great online tool for generating wordtrees from any text. the image below shows the first level of our tree and the interactive version can be accessed at Jason Davies’ wordtree page by pasting in the text from this text file (while we have made an serious effort to clean out the text in this file this is a relatively raw collection of thousands of tweets and features all that is good and bad of Twitter so be a little careful of it).
Digging further into the tree we can see that people’s wants range from a new jobs, to cars, to presidents!
And lastly for the academically minded amongst you we can see many people’s thoughts turning to exams!
The example here is a fun little trick, but this use of Twitter data can be genuinely useful for monitoring trends. The trick is to carefully chose search phrases and be careful of disappearing too far into the tree phrases that maybe only occur once or twice.
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.