The 2010 World Cup proved to be one of the most tweeted about events of 2010. Through the millions of tweets saved to my local Cloudera CDH3 Hadoop cluster, I wrote a quick pig script to discover the ways that people are celebrating(ok, spelling) goals. Here are the top few variations of Goal/Gol. The full results can be found here. I’m happy to share the Pig code if anyone is interested.
158636 gol 126669 goal 31722 Goal 24735 Gol 19610 GOL 14317 GOAL 4178 gool 2981 ggol 2219 goll 1771 goool 1641 GOOOL 1564 Gooool 1498 Goool 1279 GOOOOL 1188 Goooool 1158 GOOOOOOOL 1124 GOOOOOL 1116 gooool 1075 GOOL
A few rough visualizations, they are in need of an update. The first is a scatter between the length of the GOAL and the number of Os contained. You’ll see that a lot of excited soccer fans like to use the full 140 characters in their celebrations, often using 138 Os. Huge thanks to @johnmyleswhite for the inspiration and the R/ggplot2 help. You don’t even want to see my original version!
Inspired by @peteskomoroch, who was inspired by the frequencies of the length of mentions of KAAAAAAHN! here are the frequencies of the length of GOAL! celebrations found on twitter.
A visualization of the hashtag mentions by country through all of the World Cup Tweets. Click for full-sized version.
And my personal favorite visualization, the number of tweets mentioning vuvuzela, hourly, during the course of the world cup.