Over the course of generating a large item-item similarity matrix, I need to reduce the amount of data I’m returning to the calling program. In short, i’m computing the similarity between over 20,000 different ‘items’ and that results in a gigantic dataset, to the tune of about 3-4 million elements. I now need to reduce [...]
While I’ve had some success with getting a few celebrities to respond or show off @TheBotLebowski to others(fred durst, taleb kweli), Yesterday, Twifficiency one-upped me and took twitter and then the national media by storm. Fortunately for you, @jamescun, Not too many people I know read your little Time Magazine. (I really hope you’re old [...]
Through the magic of hadoop, pig, over 300 million(and counting) tweets, and the never-ending creativity of my fellow twitter users, I thought I’d take a look at all of the hashtags containing the beloved f-word. Lets get the technical details out of the way. Since the middle of June, I’ve been saving as many tweets [...]