Through the magic of hadoop, pig, over 300 million(and counting) tweets, and the never-ending creativity of my fellow twitter users, I thought I’d take a look at all of the hashtags containing the beloved f-word. Lets get the technical details out of the way. Since the middle of June, I’ve been saving as many tweets [...]
Using twitter gardenhose access, remove stopwords and punctuation sprinkle in a little bit of mapping, some reducing, and voila! The most frequently-occurring words in tweets that mentioned earthquake from June 23, 2010. I left earthquake out of the image itself because being that it was in every tweet, it overwhelmed the rest of the words. [...]
A really quick visualization I did while researching data for another project. Census.gov has a link to the most frequently occurring first names and surnames from the 1990 census. Surely more current data must exists; I found this dataset by accident. The original data is tab-delimited in the format: Name Frequency in percent Cumulative Frequency [...]
While browsing through Pete Skomoroch’s delicious bookmarks(which is a full-time job in and of itself), I learned that StackOverflow.com makes their underlying q&a data available. Just for fun, I wrote a few quick queries against this dataset, centered around the R tag. Here are a handful of findings – data is through 31-Oct-2009. Some of [...]