Skip to content

Tag Archives: data

And you thought you were the first to use #DONTFUCKWITHJUSTINBIEBER

09-Aug-10

Through the magic of hadoop, pig, over 300 million(and counting) tweets, and the never-ending creativity of my fellow twitter users, I thought I’d take a look at all of the hashtags containing the beloved f-word. Lets get the technical details out of the way.  Since the middle of June, I’ve been saving as many tweets [...]

Words mentioned in 23-Jun-2010 Canadian Earthquake tweets

24-Jun-10

Using twitter gardenhose access, remove stopwords and punctuation sprinkle in a little bit of mapping, some reducing, and voila! The most frequently-occurring words in tweets that mentioned earthquake from June 23, 2010. I left earthquake out of the image itself because being that it was in every tweet, it overwhelmed the rest of the words. [...]

Visualization of Frequently Occurring First Names and Surnames From the 1990 Census

19-Jan-10

A really quick visualization I did while researching data for another project.  Census.gov has a link to the most frequently occurring first names and surnames from the 1990 census.  Surely more current data must exists; I found this dataset by accident. The original data is tab-delimited in the format: Name Frequency in percent Cumulative Frequency [...]

A few quick observations on StackOverflow questions tagged R

12-Nov-09

While browsing through Pete Skomoroch’s delicious bookmarks(which is a full-time job in and of itself), I learned that StackOverflow.com makes their underlying q&a data available. Just for fun, I wrote a few quick queries against this dataset, centered around the R tag. Here are a handful of findings – data is through 31-Oct-2009. Some of [...]