Skip to content

Category Archives: Uncategorized

An analysis of Steve Jobs tribute messages displayed by Apple

20-Oct-11

Two weeks have passed since Apple’s Co-Founder/CEO Steve Jobs passed away.  Upon his passing, Apple encouraged people to share their memories, thoughts, and feelings by emailing rememberingsteve@apple.com. Earlier this week, Apple posted a site (http://www.apple.com/stevejobs) in tribute to Steve Jobs. According to the site, over a million people have submitted messages. The site cycles through the submitted [...]

Looking for some new quotes for @HelloooooNewman

17-Mar-11

I’m looking for some new quotes for my Seinfeld-quote-spewing twitter bot @hellooooonewman. There are currently 161 tweets in its database and the bot has sent over 246,000 tweets. If you take a simple average, each quote has been tweeted over 1,500 times. Its time to update the bot with some fresh content. I’m admittedly not [...]

Chuck Norris doesn’t screen-scrape, the data runs scared to his hard drive.

01-Mar-11

Inspired by a tweet from Roger Ehrenberg and my 11-year-old son who’s crazy about Chuck Norris facts, I screen-scraped the contents of http://www.chucknorrisfacts.com. Code and data can be found here. Using Python and BeautifulSoup, it simply loops through all of the pages on http://www.chucknorrisfacts.com and reads the items displayed on the page. Output looks like Visit [...]

Fun with awk and dead people

24-Feb-11

Just playing around with some Freebase data in preparation for a ‘who died today’ twitter bot. Get the data and determine on which date did the most people die? Surprised to see 1965-11-08 listed ahead of 2001-09-11. Why? Lets look at where people died on 1965-11-08: Upon further investigation, it looks as if Freebasers have [...]

Visualizations of Canabalt scores scraped from twitter

16-Feb-11

Canabalt, a ridiculously addicting web/IOS-device game allows one to show off their high scores, and their not-so-high scores to Twitter. Each of these tweets contains a bit of information – The score represented in meters, the method of death (hitting a wall and tumbling to my death) and the device (iPhone). Other useful information can [...]

Retrieving the US National Debt and Population using Python and BeautifulSoup

30-Jan-11

Update 12-May-2011, I cleaned up the code, added logging of the data to a tab-delimited file, and published it to github. Happy Hacking! Someone suggested I create a bot that tweets the US National Debt.  Here’s how I’m retrieving the National Debt amount from the US Treasury site.  I then retrieve the US Population from [...]

World Cup 2010: Analysis of tweets celebrating goals

10-Jan-11

The 2010 World Cup proved to be one of the most tweeted about events of 2010.  Through the millions of tweets saved to my local Cloudera CDH3 Hadoop cluster, I wrote a quick pig script to discover the ways that people are celebrating(ok, spelling) goals.  Here are the top few variations of Goal/Gol.  The full [...]

My Twitter bots: Tens of thousands of followers can’t be wrong

21-Dec-10

edit: March 17, 2011 I need your help! If you have additional Seinfeld quotes to contribute, or for a list of all of the current Seinfeld quotes, please visit this post. My current army of twitter bots and the keyword that each one responds to: @HelloooooNewman (seinfeld) Klout score 74 @TheBotLebowski (lebowski) Klout score 70 [...]

Word Cloud from 6,500 tweets mentioning Kayne West. From this morning

14-Dec-10

After removing a few stopwords and then clearing out a few other words(nowplaying, lastfm, and the like), here’s what’s left.  The data represents a half-day’s worth of tweets.   I’m sitting on about 90,000 tweets about Kanye and am looking forward to taking the time for some more in-depth analysis.  Huge thanks to @jrlevine and [...]

Visualization of top 150(now 300!) tweeters at pubcon

09-Nov-10

I built http://www.pubcontweets.com to track the tweets about pubcon, a social media conference that’s being held in Las Vegas.  A little python and twitter api, sprinked in with some Gephi yields the intra-network connections of the 150 users with the most #pubcon posts.  The visualization is in svg format – make sure you use your [...]