<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>neilkodner.com</title>
	<atom:link href="http://www.neilkodner.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.neilkodner.com</link>
	<description>Data Driven.  Since 1971.</description>
	<lastBuildDate>Sun, 23 Oct 2011 16:40:04 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>An analysis of Steve Jobs tribute messages displayed by Apple</title>
		<link>http://www.neilkodner.com/2011/10/an-analysis-of-steve-jobs-tribute-messages-displayed-by-apple/</link>
		<comments>http://www.neilkodner.com/2011/10/an-analysis-of-steve-jobs-tribute-messages-displayed-by-apple/#comments</comments>
		<pubDate>Thu, 20 Oct 2011 21:08:26 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[apple]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[datamining]]></category>
		<category><![CDATA[nlp]]></category>
		<category><![CDATA[nltk]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[stevejobs]]></category>

		<guid isPermaLink="false">http://www.neilkodner.com/?p=569</guid>
		<description><![CDATA[Two weeks have passed since Apple&#8217;s Co-Founder/CEO Steve Jobs passed away.  Upon his passing, Apple encouraged people to share their memories, thoughts, and feelings by emailing rememberingsteve@apple.com. Earlier this week, Apple posted a site (http://www.apple.com/stevejobs) in tribute to Steve Jobs. According to the site, over a million people have submitted messages. The site cycles through the submitted [...]]]></description>
			<content:encoded><![CDATA[<p>Two weeks have passed since Apple&#8217;s Co-Founder/CEO Steve Jobs passed away.  Upon his passing, Apple encouraged people to share their memories, thoughts, and feelings by emailing <a href="https://mail.google.com/mail/?view=cm&amp;fs=1&amp;tf=1&amp;to=rememberingsteve@apple.com" target="_blank">rememberingsteve@apple.com</a>. Earlier this week, Apple posted a <a href="http://www.apple.com/stevejobs/" target="_blank">site</a> (<a href="http://www.apple.com/stevejobs/" target="_blank">http://www.apple.com/stevejobs</a>) in tribute to Steve Jobs. According to the site, over a million people have submitted messages. The site cycles through the submitted messages.</p>
<p>I decided to take a closer look at what people are saying about Steve Jobs, as a whole. Looking at how the site updates, it appears to use Ajax to retrieve and display new messages. Using Chrome&#8217;s developer tools, I monitored the requests it was making to get the new messages.</p>
<p><a href="http://www.neilkodner.com/wp-content/uploads/2011/10/Apple-Remembering-Steve-Jobs2.png"><img class="alignnone size-large wp-image-574" title="Apple - Remembering Steve Jobs" src="http://www.neilkodner.com/wp-content/uploads/2011/10/Apple-Remembering-Steve-Jobs2-1024x892.png" alt="" width="819" height="714" /></a><br />
Once I found the location of the individual messages, it was trivial to download all of them. The message endpoint URLs are in the format</p>
<pre class="brush: xml; title: ; notranslate">

http://www.apple.com/stevejobs/messages/3679.json?28106802
</pre>
<p>and a sample message looks like</p>
<pre class="brush: jscript; title: ; notranslate">
{
mainText: &quot;This is equivalent to my mom's generation of Elvis dying for me. I am very
sadden and emotionally moved at the moment. He was more influential on my
life than my parents and friends. While my parents loved me and friends
shared fun times. Steve influenced me, motivated me to become the innovated,
creative technologist I have become. I got into computer technology in 1980
and moved to Silicon Valley because of him. I have been one of his biggest
admirers and looked to him as a mentor to push the boundaries of my own
creative abilities to develop technology solutions which I hope made a
difference and impact to the industries I worked in. We've lost a
significant influence and icon in technology. We won't see another person of
his innovation and foresight within my life time. He was the Edison of
technology. He was and is one of my biggest inspirations.

I feel I have lost a close family member&quot;
header: &quot;What Steve Jobs meant to me&quot;
author: &quot;Skip&quot;
location: &quot;&quot;
}
</pre>
<p>The site makes a request to <a href="http://www.apple.com/stevejobs/messages/main.json" target="_blank">http://www.apple.com/stevejobs/messages/main.json</a> which returns</p>
<pre class="brush: jscript; title: ; notranslate">
 {
 totalMessages: &quot;10975&quot;
 timestamp: &quot;28106802&quot;
 }
</pre>
<p>So it appears that it cycles through 10975 messages. I didn&#8217;t decompose the javascript powering the site to determine this, I just made an assumption. I tried querying values greater than 10975 and they returned 404. I wrote a quick python program to download the messages:</p>
<pre class="brush: python; title: ; notranslate">
#!/usr/bin/python
import urllib2
import simplejson as json
import time
import codecs

# a page on apple's site shows the # of messages available
# start with 0 and retrieve up to message_range messages
metadata = json.loads(urllib2.urlopen('http://www.apple.com/stevejobs/messages/main.json').read())
message_range = metadata['totalMessages']

# the url for each message. i learned of this url by inspecting
# the network calls to http://www.apple.com/stevejobs
# using chrome's developer tools
url=&quot;http://www.apple.com/stevejobs/messages/%d.json&quot;

# create our destination file
# i'm using codecs because it does a better job at handling international characters
output_file = 'stevejobs_tribute.txt'
file_handle = codecs.open(output_file,'w','utf-8')

# helper function to remove tabs and linefeeds
def clean(txt):
  return txt.replace('\n','').replace('\t','')

# iterate from 0 to the max # of messages and download the message text
# for these purposes, I'm ignoring the other fields as they weren't always present
for i in range(0, message_range):
  req = url % i
  data = urllib2.urlopen(req).read()
  data = json.loads(data)
  file_handle.write(clean(data['mainText']) + '\n')
file_handle.close()
</pre>
<p><span style="direction: ltr;"><br />
</span><br />
<span style="direction: ltr;">So now, we have over ten thousand tribute messages saved to the file <a href="https://github.com/neilkod/steve_jobs_tribute_messages/tree/master/data">stevejobs_tribute.txt</a>. What I was most interested in seeing how many of these messages contain a reference to a certain Apple product.</span><br />
I came up with a few search terms based on some legendary Apple product names including</p>
<ul>
<li>Newton</li>
<li>Macintosh</li>
<li>MacBook</li>
<li>iBook</li>
<li>Mac</li>
<li>iPhone</li>
<li>iPod</li>
<li>iMac</li>
<li>iPad</li>
<li>Apple II family</li>
<li>OSX</li>
<li>iMovie</li>
<li>Apple TV</li>
<li>iTunes</li>
<li>LaserWriter (yes, <a href="http://en.wikipedia.org/wiki/LaserWriter" target="_blank">Laserwriter</a>)</li>
</ul>
<div>Each product received an entry in a python dictionary. The value is another dictionary containing a regex for the product name and a count for the running totals. Some of the regular expressions are as simple as testing for an optional s at the end of the product name, some are a little more complex &#8211; check the Apple II regular expression to match all of entire product Apple 2 line. As I&#8217;m ok but not great with regular expressions, I welcome your corrections.</div>
<pre class="brush: python; title: ; notranslate">
products = {'iPhone':{'regex':'iphones?','count':0},
	'iMac':{'regex':'imacs?','count':0},
	'iPad':{'regex':'ipads?','count':0},
	'iTunes':{'regex':'itunes','count':0},
	'iPod':{'regex':'ipods?','count':0},
	'cube':{'regex':'cubes?','count':0},
	'MacBook':{'regex':'macbooks?','count':0},
	'iBook':{'regex':'ibooks?','count':0},
	'Apple TV':{'regex':'apple ?tvs?','count':0},
	'Apple II Family':{'regex':r'(apple )?(2|ii|\]\[|\/\/)([ce\+|]|gs|s)?[^0-9]', 'count':0},
	'LaserWriter':{'regex':'laserwriter?','count':0},
	'PowerBook':{'regex':'powerbook?','count':0},
	'Newton':{'regex':'newton?','count':0},
	'OSX':{'regex':'osx','count':0},
	'iMovie':{'regex':'imovie','count':0},
	'Macintosh':{'regex':'macintosh','count':0},
	'Lisa':{'regex':'lisa','count':0},
	'Mac':{'regex':'mac','count':0},
}
</pre>
<p>Here&#8217;s a screenshot of me testing the Apple II regular expression, using the excellent <a href="http://gskinner.com/RegExr/" target="_blank">Regexr</a>.</p>
<p><a href="http://www.neilkodner.com/wp-content/uploads/2011/10/apple-2-regex-testing.png"><img class="alignnone size-full wp-image-623" title="apple 2 regex testing" src="http://www.neilkodner.com/wp-content/uploads/2011/10/apple-2-regex-testing.png" alt="" width="424" height="388" /></a></p>
<p>Overall, out of 10975 messages downloaded(as of now), 2,186, or just under 20% mentioned an apple product by name. Here&#8217;s the breakdown of the products mentioned:</p>
<pre class="brush: plain; title: ; notranslate">
LaserWriter        1
iMovie             3
OSX                9
iBook             22
PowerBook         22
Lisa              24
Apple TV          31
Newton            33
iTunes            52
Macintosh        163
iMac             235
MacBook          366
Apple II Family  481
iPad             574
iPod             575
iPhone           875
Mac             1315
</pre>
<p>More than one out of every ten messages included a reference to a Mac! Nearly one in ten mentioned an iPhone &#8211; not bad for a device that&#8217;s been out a fraction of the time the Mac has been available.I&#8217;m pleased to see so many references to the Apple II including several mentions of the//c, which was my first Apple product.</p>
<p>It&#8217;s also interesting to note that out of 33 mentions of Newton, only a handful of those were about the actual Apple product &#8211; most were comparing Steve Jobs to Newton himself. Check out my <a href="http://www.neilkodner.com/2010/10/fun-with-nltk-and-zoolander-part-1-concordance/" target="_blank">earlier post on NLTK concordance</a> for details on how I did this:</p>
<pre class="brush: python; title: ; notranslate">
import nltk
import string
f = open('stevejobs_tribute.txt').read()
f = f.translate(string.maketrans(&quot;&quot;,&quot;&quot;), string.punctuation)
foo=nltk.Text(f.split())
print foo.concordance('newton')
</pre>
<p>result:</p>
<pre class="brush: plain; title: ; notranslate">
op If history misses men like Isaac Newton Graham Bell Galileu Thomas Edison a
mbered though his legacy Now he met Newton Einstein and other geniuses like hi
oday I was one of the few who had a Newton Today I have an iPhone 4 an iPad2 a
oduct that came thereafter from the Newton to the Cube to the iPhone 4S God Bl
with the likes of Edison Garcia and Newton for his impact and vision I wish hi
ntioned in the same breath as Isaac Newton Thomas Edison and Bill Gates The le
 off a tree we are thinking of Adam Newton and Steve Jobs He open new dimensio
Jobs will be missed Da Vinci Mozart Newton Franklin Jobs Nobody is out of plac
ged my life starting with the Apple Newton followed by the iPod and then the i
 sorely missed nbsp Da Vinci Mozart Newton Franklin Jobs Nobody is out of plac
ve dared to Einstein Freud Da Vinci Newton Galileo Darwin among others is prou
embered beside Einstein Pasteur and Newton The world is moving toward his crea
irst Apple Mac I remember the first Newton I willnbspremembernbspSteves creati
e to contact us againnbsp How Isaac Newton and Albert Einstein contributed gre
 world One seduced Eve One awakened Newton and One was in the hands of Steve J
the way you have influenced mine If Newton discovered something as remarkable
rld One seduced Eve second awakened Newton the third one was in the hands of S
lent to Leonardo Da Vinci Sir Issac Newton Albert Einstein and the like He was
t of the caliber of that of DaVinci Newton Pythagorous etc The list can go on
hen people say names like ie Edison Newton and Einstein I guarantee that the n
 Computers” The Apple II Lisa Mac Newton iPod iTunes store iPod Touch iPhone
ember Steve Jobs the way I remember Newton or Einstein I lived with Apple prod
set consultant who bought his first Newton MacBook 170 and all the dozens of o
 br 3 Apples change the world Adán Newton Steve Jobs 19552011 Rest in Peace t
back to the Apple IIGS I also had a Newton Steve Jobs death hurts me personall
ed the world apple to adam apple to newton and apple to steve jobs Steve was a
dam and Eva Second one that wake up newton third one that Steve Jobs create St
</pre>
<p>Also interesting where the number of mentions to other historical figures in the Steve Job remembrance messages. According to the submitters, Steve Jobs is clearly in some elite company. I don&#8217;t know if I&#8217;d go so far as to group him with the man who brought automobiles and light bulbs to the masses but hey, we all have our priorities. All counts were determined through a simple grep command piped to wc -l.Here are a few examples:</p>
<ul>
<li>Einstein &#8211; 70</li>
<li>Ford &#8211; 189</li>
<li>Edison &#8211; 110</li>
<li>DaVinci &#8211; 15</li>
<li>Bill Gates &#8211; 8</li>
</ul>
<p>Finally, I wanted to see what how people were speaking about Steve Jobs and especially what terms were being used to describe him. There was no point in performing sentiment analysis on this text as all of the texts were not only obviously positive but were also vetted by Apple for content. Using NLTK, I performed part-of-speech tagging on every word in each tribute message and then wrote some code to total the adjectives and adverbs used in the tribute messages.</p>
<p>The most commonly-used adjectives are</p>
<pre class="brush: plain; title: ; notranslate">
('great', 1961)
('steve', 1808)
('many', 1459)
('first', 917)
('sad', 862)
('better', 857)
('such', 727)
('best', 721)
('visionary', 645)
('new', 579)
('more', 556)
('true', 538)
('most', 476)
('creative', 471)
('apple', 435)
('other', 427)
('same', 415)
('good', 412)
('greatest', 376)
('wonderful', 373)
('sorry', 362)
('old', 325)
('brilliant', 283)
('able', 281)
('incredible', 267)
('big', 260)
</pre>
<p>Humorously, NLTK frequently considered &#8220;Steve&#8221; to be an adjective. This is likely because it is always followed by the proper noun &#8220;Jobs.&#8221; A <a href="http://twitter.com/#!/japerk/status/127054008060878848">tweet</a> from <a href="http://www.streamhacker.com">NLTK expert Jacob Perkins</a> reminded me that machines are dumb and proper nouns should be capitalized. In order to aggregate the counts, I normalized the text by converting to lowercase &#8211; I wasn&#8217;t interested in nouns, only adjectives so proper nouns didn&#8217;t matter to me.<br />
The top adverbs, according to NLTK, were not as interesting, at least to me.</p>
<pre class="brush: plain; title: ; notranslate">
('so', 2220)
('never', 2111)
('not', 1897)
('always', 1798)
('just', 1402)
('now', 1028)
('truly', 989)
('only', 945)
('very', 919)
('much', 908)
('ever', 751)
('even', 743)
('really', 567)
('forever', 508)
('more', 486)
('still', 447)
('well', 398)
('most', 375)
('personally', 352)
</pre>
<p>And finally, I ran tri-gram analysis, again using NLTK.<span style="direction: ltr;"> </span></p>
<pre class="brush: python; title: ; notranslate">
trigrams = defaultdict(int)
nltk_trigrams = nltk.trigrams(text)
for itm in nltk_trigrams:
  trigrams[itm] += 1
</pre>
<p>As one would expect, the leading trigram was &#8216;<strong>rest in peace</strong>&#8216; with 1838 mentions, 16.7% of all mentions. &#8216;<strong>thank you for</strong>&#8216; was found in 1446 messages, &#8216;<strong>will be missed</strong>&#8216; was found in 827 messages. Other interesting trigrams are &#8216;<strong>thank you steve</strong>&#8216; with 791 mentions and &#8216;<strong>changed the world</strong>&#8216; with 551 mentions.</p>
<p>The full python code and resulting data can be found on <a href="https://github.com/neilkod/steve_jobs_tribute_messages" target="_blank">github</a>.</p>
<pre class="brush: python; title: ; notranslate">

#!/usr/bin/python
#nltk.help.upenn_tagset('RB')
from collections import defaultdict
from operator import itemgetter
import re
import urllib2
import string
import simplejson as json

import codecs
import nltk

OUTPUT_FILE = 'data/stevejobs_tribute.txt'

adverbs = defaultdict(int)
adjectives = defaultdict(int)
trigrams = defaultdict(int)

message_has_adjective = False
message_has_adverb = False
message_contains_product_mention = False
messages_with_adjective = 0
messages_with_adverb = 0
messages = 0
messages_with_product_mention = 0

exclude = set(string.punctuation)

products = {'iPhone':{'regex':'iphones?','count':0},
	'iMac':{'regex':'imacs?','count':0},
	'iPad':{'regex':'ipads?','count':0},
	'iTunes':{'regex':'itunes','count':0},
	'iPod':{'regex':'ipods?','count':0},
	'cube':{'regex':'cubes?','count':0},
	'MacBook':{'regex':'macbooks?','count':0},
	'iBook':{'regex':'ibooks?','count':0},
	'Apple TV':{'regex':'apple ?tvs?','count':0},
	'Apple II Family':{'regex':r'(apple )?(2|ii|\]\[|\/\/)([ce\+|]|gs|s)?[^0-9]', 'count':0},
	'LaserWriter':{'regex':'laserwriter?','count':0},
	'PowerBook':{'regex':'powerbook?','count':0},
	'Newton':{'regex':'newton?','count':0},
	'OSX':{'regex':'osx','count':0},
	'iMovie':{'regex':'imovie','count':0},
	'Macintosh':{'regex':'macintosh','count':0},
	'Lisa':{'regex':'lisa','count':0},
	'Mac':{'regex':'mac','count':0},
}

def top_n(dct,n = 10):
	srtd=sorted(dct.iteritems(), key=itemgetter(1), reverse=True)
	for x in srtd[0:n+1]:
		print x

def nltk_concordance(term,text_file):
	f = open(text_file).read()
	# remove punctuation
	f = f.translate(string.maketrans(&quot;&quot;,&quot;&quot;), string.punctuation)
	split_text=nltk.Text(f.split())
	split_text.concordance(term,lines=100)

	# &gt;&gt;&gt; f = f.translate(string.maketrans(&quot;&quot;,&quot;&quot;), string.punctuation)
	# &gt;&gt;&gt; foo=nltk.Text(f.split())
	# &gt;&gt;&gt; print foo.concordance('newton')

def unescape(s):
	&quot;&quot;&quot;unescapes html codes&quot;&quot;&quot;
	s = s.replace(&quot;&lt;&quot;, &quot;	s = s.replace(&quot; &quot;, &quot; &quot;)
	# this has to be last:
	s = s.replace(&quot;&amp;&quot;, &quot;&amp;&quot;)
	return s

for line in open(OUTPUT_FILE):
	message_has_adjective = False
	message_has_adverb = False
	message_contains_product_mention = False

	# remove the trailing linefeed and convert to lower-case
	# and remove html control characters
	messages += 1
	data = line.strip()
	data = data.lower()
	data = unescape(data)

	# check for product mentions
	for k,v in products.iteritems():
		if re.search(v['regex'],data):
			products[k]['count'] += 1
			message_contains_product_mention = True

	# if the message contains a product mention
	# increment the product mention counter
	if message_contains_product_mention:
		messages_with_product_mention += 1

# tokenize the sentences using nltk's wordpuncttokenizer
	text = nltk.WordPunctTokenizer().tokenize(data)

# compute trigrams
	nltk_trigrams = nltk.trigrams(text)
	for itm in nltk_trigrams:
		trigrams[itm] += 1

# pos-tag each token. we're interested in adjectives and adverbs
	parts_of_speech = nltk.pos_tag(text)
	# test for adjectives and adverbs, increment the counters
	# when we find one.

	for (word,pos) in parts_of_speech:
		if pos.startswith('JJ'):
			message_has_adjective = True
			adjectives[word] += 1

		if pos.startswith('RB'):
			message_has_adverb = True
			adverbs[word] += 1

	# if the message contains an adverb or an adjective, increment a counter
	if message_has_adjective:
		messages_with_adjective += 1
	if message_has_adverb:
		messages_with_adverb += 1

# output the 25 most frequently-used adjectives and adverbs
n = 25
print &quot;top %s adverbs&quot; % n
top_n(adverbs, n)
print
print &quot;top %s adjectives&quot; % n
top_n(adjectives, n)

print &quot;messages with adjectives: %s&quot; % messages_with_adjective
print &quot;messages with adverbs: %s&quot; % messages_with_adverb
print &quot;total messages with product mentions: %s&quot; % messages_with_product_mention
print &quot;total messages: %s&quot; % messages

# output the top 50 most-common trigrams
n = 50
print &quot;top %s trigrams&quot; % n
top_n(trigrams, n)
srtd=sorted(products.iteritems(),key=itemgetter(1))
for x,y in srtd:
	print &quot;%s\t\t%s&quot; % (x,y['count'])

print
print
# concordance for newton
print &quot;concordance for newton:&quot;
nltk_concordance('newton',OUTPUT_FILE)
</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.neilkodner.com/2011/10/an-analysis-of-steve-jobs-tribute-messages-displayed-by-apple/feed/</wfw:commentRss>
		<slash:comments>52</slash:comments>
		</item>
		<item>
		<title>Looking for some new quotes for @HelloooooNewman</title>
		<link>http://www.neilkodner.com/2011/03/looking-for-some-new-quotes-for-hellooooonewman/</link>
		<comments>http://www.neilkodner.com/2011/03/looking-for-some-new-quotes-for-hellooooonewman/#comments</comments>
		<pubDate>Thu, 17 Mar 2011 19:47:02 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.neilkodner.com/?p=547</guid>
		<description><![CDATA[I&#8217;m looking for some new quotes for my Seinfeld-quote-spewing twitter bot @hellooooonewman. There are currently 161 tweets in its database and the bot has sent over 246,000 tweets. If you take a simple average, each quote has been tweeted over 1,500 times. Its time to update the bot with some fresh content. I&#8217;m admittedly not [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m looking for some new quotes for my Seinfeld-quote-spewing twitter bot <a href="http://www.twitter.com/#!/neilkodsbots">@hellooooonewman</a>. There are currently 161 tweets in its database and the bot has sent over 246,000 tweets. If you take a simple average, each quote has been tweeted over 1,500 times. Its time to update the bot with some fresh content.</p>
<p>I&#8217;m admittedly not the biggest Seinfeld fan so I need your help! Please leave some new quotes in the comments. The quotes will make it back into <a href="http://www.twitter.com/#!/hellooooonewman">@HelloooooNewman</a>&#8216;s replies. Ideally the quotes are no longer than 125 or so characters in length to allow for a username to be included in the reply. My program will truncate any tweets longer than 140 characters. Full quote database after the break.</p>
<p><span id="more-547"></span></p>
<pre class="brush: plain; title: ; notranslate">

I had a dream last night that a hamburger was eating ME!
If you've got a t-shirt with blood stains all over it, maybe laundry isn't your biggest problem right now.
Look, Vanessa, of course the market fluctuates. Everybody knows that. I just got fluctuated out of four thousand dollars!
I don't return fruit. Fruit's a gamble. I know that going in.
Sometimes the road less travelled is less travelled for a reason.
Two hundred seats on a plane, I gotta wind up next to Yukon Jack and his dog Cujo.
Oh, you're as pretty as them. You just need a nose job
No, no, I don't think I'm special. My mother always said I'm not special.
I'm speechless! I have no speech!
I hate asking for change. They always make a face. Like I'm asking them to donate a kidney.
You know a muffin can be very filling.
I don't wanna get a movie hot dog! I want a Papaya King hot dog!
I'm disturbed, I'm depressed, I'm inadequate, I've got it all!
Looking at a cleavage is like looking at the sun. You don't stare at it.Its too risky.You get a sense of it and then you look away.
Everyone thinks we're gay!... Not that there's anything wrong with that.
And one more thing; they're real, and they're spectacular.
You know, sometimes when I think youre the shallowest man I've ever met,you somehow manage to drain a little more out of the pool.
This is beyond B.O. This is B.B.O.
This is our best model: The Cougar 9000. It's the Rolls Royce of wheelchairs. This is like...you're almost glad to be handicapped.
When you control the mail, you control information!
Jerry...this woman hates me so much...I'm starting to like her.
A woman that hates me this much comes along once in a lifetime.
I'm doing a coffee-table book on coffee tables
He's a male bimbo...He's a mimbo!
The sea was angry that day, my friends, like an old man trying to return soup at a deli!
Jerry, you stand on the threshold to the magical world of sensual delights that most men dare not dream of.
You're in the kitchen. You see an eclair in the receptacle. So you think to yourself, &quot;What the hell, I'll just eat some trash.&quot;
He recycled this gift. He's a regifter
Jerry, just remember, it's not a lie if you believe it.
Newman, you magnificent bastard, you've done it!
You may know it better as Myanmar, but it'll always be Burma to me.
Why do they call it Ovaltine? The mug is round. The jar is round. They should call it Roundtine. That's gold, Jerry! Gold!
You can stuff your sorries in a sack, mister!
It's Christmas for some, a Festivus for the rest of us!!
Ew, Mr. Apple, you have a brown spot!
Hey, I'm on First and... First. How can the same street intersect with itself? I must be at the nexus of the universe.
My parents didn't want to move to Florida, but they turned sixty, and that's the law.
Sometimes the road less traveled is less traveled for a reason
Why do they call it a &quot;building&quot;? It looks like they're finished. Why isn't it a &quot;built&quot;?
People who read the tabloids deserve to be lied to
Men don't care what's on TV. They only care what else is on TV.
Boy, these pretzels are makin' me thirsty.
Yo Yo Ma.
You know I always wanted to pretend I was an architect
I'm not a lesbian. I hate men, but I'm not a lesbian.
I'm speechless. I have no speech.
God... it's like a sauna in here.
I have a bad feeling that whenever a lesbian looks at me they think &quot;That's why I'm not a heterosexual.&quot;
Hi, my name is George, I'm unemployed and I live with my parents.
I've driven women to lesbianism before but never to a mental institution.
Divorce is always hard. Especially on the kids. 'Course I am the result of my parents having stayed together so ya never know.
My father was a quitter, my grandfather was a quitter, I was raised to give up. It's one of the few things I do well.
Boy, a little too much chlorine in that gene pool.
Hoochie Mama.
People don't just bump into each other and have sex. This isn't Cinemax.
Here's to those who wish us well, and those who don't can go to hell.
That's a lotta potatoes.
Yada, Yada, Yada.
You have the rooster, the hen, and the chicken. The rooster goes with the chicken... So who's having sex with the hen?
I don't know if it's possible, but could you people conduct the psychopath convention down the hall?
You're killing independent George.
Helllllooooo.
That... is one magic loogie.
You, my friend, have crossed the line between man and bum.
Why don't you just get a pair of white shoes, move down to Miami Beach and get this whole thing over with?
See... I have two friends. You were up, he was down. Now he's up and you're down. See how it all evens out for me?
I'll be back. We'll make out.
What is this salty discharge?
You ever dream in 3-D? It's like the Boogie Man is coming RIGHT AT YOU.
The carpet sweeper is the biggest scam perpetrated on the American public since One Hour Martinizing.
Is this a gym or a fitness museum?
He fires people like it's a bodily function!
You very bad man, Jerry. Very bad man.
I don't know what it is about that mirror in that bathroom. I love the way I look in it... I feel like Robert Wagner.
The cat - mmrrrooowwwrr - is out of the bag!
This thing is like an onion: the more layers you peel, the more it stinks!
Somewhere in this hospital, the anguished squeal of Pigman cries out!
Produce section. Very provocative area. A lot of melons and shapes. Everyone's squeezing and smelling..
It pains me to say this, but I may be getting too mature for details.
I've always been a stall man.
It's one day. Half a day, really. I mean you subtract showers and meals, it's like twenty minutes.
I can't go to a bad movie by myself. What, am I gonna make sarcastic remarks to strangers?
You should've seen her face. It was the exact same look my father gave me when I told him I wanted to be a ventriloquist.
I hate asking for change. They always make a face. It's like asking them to donate a kidney.
The apples are mealy, the oranges are dry... I don't know what's going on with the papayas!
I can feel his blood inside of me. Borrowing things from my blood.
Women don't respect salad eaters.
Man, it's the nineties... It's Hammer time!
Why is nice bad? What kind of a sick society are we living in when nice is bad?
I'm much more comfortable criticizing people behind their backs.
There's nothing more sophisticated than diddling the maid and then chewing some gum.
I've never heard of a relationship being affected by punctuation.
Moles: Freckle's ugly cousin.
I'm in the unfortunate position of having to consider other people's feelings.
I would drape myself in velvet if it were socially acceptable.
She had man-hands!
You know what they say, 'You don't sell the steak, you sell the sizzle.
It's more like a full-body dry heave set to music.
If you can't say something bad about a relationship you shouldn't say anything at all.
Did you know that the original title for War and Peace was War, What Is It Good For?
This woman is bending my mind into a pretzel.
It's amazing that the amount of news that happens in the world every day always just exactly fits the newspaper.
There is no such thing as fun for the whole family.
Where lipstick is concerned, the important thing is not color, but to accept God's final word on where your lips end
A two-year old is kind of like having a blender, but you don't have a top for it.
Make no mistake about why these babies are here - they are here to replace us.
A bookstore is one of the only pieces of evidence we have that people are still thinking.
You think people will still be using napkins in the year 2000? Or is this mouth vacuum thing for real?
If I had a son, I would name him Isosceles. Isosceles Kramer.
My neck is one gargantuan monkey fist.
Vomitting is not a deal breaker. If Hitler had vomitted on Chamberlain, Chamberlain still would have given him Czechoslovakia.
When you're shopping on Madison Avenue, you don't want to skimp on the swank.
Are you still master of your domain?
Well, I'm Out!
To the idiotmobile.
You're an anti-dentite.
You're killing independent George.
I was in the pool, I was in the pool.
I think I can sum up the show for you with one word; Nothing.
Yeah, I'm a great quitter: It's one of the few things I do well...I come from a long line of quitters.
I come from a long line of quitters. My father was a quitter, my grandfather was a quitter.
Hello Newman.
No soup for you.
You can stuff your sorrys in a sack, mister.
The sea was angry that day my friends, like an old man sending back soup in a deli.
So you're killing yourself, because your dreams of becoming a banker have gone unfulfilled!
Ted Danson makes $800,000 an episode. I can't live knowing that Ted Danson makes that much more than me.
No, I don't have a square to spare. I can't spare a square.
You win: I drop dead. I win: I don't drop dead, and I get 100% anti-drop-dead protection - forever!
You know you're not supposed to drink while you're keeping a secret.
You see, my dear, all certified mail is registered... but registered mail is not necessarily certified.
I'll tell you a little secret about zip codes: They're meaningless.
In any other shoe, I lose two inches; I can't have a drop-down. We were eye-to-eye; I can't go eye-to-chin.
You put the balm on? Who told you to put the balm on? I didn't tell you to put the balm on! Why'd you put the balm on?
Two months ago, I saw a provocative movie on cable TV. It was called 'The Net,' with that girl from the bus.
What kind of a man are you? The guy is unconscious in a coma and you don't have the guts to kiss his girlfriend?
I think that ginger ale at the coffee shop is just Coke and Sprite mixed together. How can I prove it? Ah! Can't, dammit.
Do you ever get down on your knees and thank God you know me and have access to my dementia?
Yeah, you better give me the insurance. Because I'm gonna beat the hell out of this car.
I need to talk to you about my friend, Dr. Tim Whatley. I think he's converted to Judaism just for the jokes!
Here's to those who wish us well, and those who don't can go to hell!
Speaking of ex's, my ex-boyfriend came over late last night, and, yada yada yada, anyway. I'm really tired today.
Too many people got their mail. Close to 80%. Nobody's ever cracked the 50% barrier.
I'm sorry, but I can't be with someone whose protege is a hack.
I don't know if it's possible, but could you people conduct the psychopath convention down the hall?
Did the medical journal mention anything about standing in a pool of somebody else's urine???
Have ya been to the Motor Vehicle Bureau? It's a leper colony there!
Who buys an umbrella anyway? You can get them for free at the coffee shop in those metal cans!
You say you're a dermatologist? Well, I call you Pimple Popper, MD.
Oh my God, I'm having an affair. That's so adult. It's like with stockings and martinis, and William Holden.
I don't know what it is about that mirror in that bathroom. I love the way I look in it... I feel like Robert Wagner.
Cheap fabric, and dim lighting. That's how you move merchandise.
The Dewey Decimal System... What a scam that was!
I don't judge a man by the length of his hair or the kind of music he listens to. Rock was never my bag.
You'd better not screw up again, Seinfeld, because if you do, I'll be all over you like a pit bull on a poodle.
It's Risk. It's a game of world domination being played by two guys who can barely run their own lives.
Let me just finish my coffee, and then we'll go watch them cut the fat bastard up.
Who's gonna turn down a Junior Mint? It's chocolate, it's peppermint -- it's delicious!
I think we really need to be in front of the television set. You take TV out of this relationship, it is just torture.
It's a reverse peephole. Now I can peek in and see if anyone is waiting to jack me with a sock full of pennies.
My boyfriend said I got gonorrhea from riding the tractor in my bathing suit.
</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.neilkodner.com/2011/03/looking-for-some-new-quotes-for-hellooooonewman/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Chuck Norris doesn&#8217;t screen-scrape, the data runs scared to his hard drive.</title>
		<link>http://www.neilkodner.com/2011/03/chuck-norris-doesnt-screen-scrape-the-data-runs-scared-to-his-hard-drive/</link>
		<comments>http://www.neilkodner.com/2011/03/chuck-norris-doesnt-screen-scrape-the-data-runs-scared-to-his-hard-drive/#comments</comments>
		<pubDate>Tue, 01 Mar 2011 20:51:10 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[beautifulsoup]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[screen-scraping]]></category>

		<guid isPermaLink="false">http://www.neilkodner.com/?p=534</guid>
		<description><![CDATA[Inspired by a tweet from Roger Ehrenberg and my 11-year-old son who&#8217;s crazy about Chuck Norris facts, I screen-scraped the contents of http://www.chucknorrisfacts.com. Code and data can be found here. Using Python and BeautifulSoup, it simply loops through all of the pages on http://www.chucknorrisfacts.com and reads the items displayed on the page. Output looks like Visit [...]]]></description>
			<content:encoded><![CDATA[<p>Inspired by a <a href="http://twitter.com/#!/infoarbitrage/status/42611272902115328">tweet</a> from <a href="http://twitter.com/#!/infoarbitrage">Roger Ehrenberg</a> and my 11-year-old son who&#8217;s crazy about Chuck Norris facts, I screen-scraped the contents of <a href="http://www.chucknorrisfacts.com/" target="_blank">http://www.chucknorrisfacts.com</a>. Code and data can be found <a href="https://github.com/neilkod/chucknorrisfacts">here</a>.</p>
<p>Using Python and <a href="http://www.crummy.com/software/BeautifulSoup/">BeautifulSoup</a>, it simply loops through all of the pages on <a title="www.chucknorrisfacts.com" href="http://www.chucknorrisfacts.com" target="_blank">http://www.chucknorrisfacts.com</a> and reads the items displayed on the page.</p>
<pre class="brush: python; title: ; notranslate">

#!/usr/bin/python
import urllib2, time
from BeautifulSoup import BeautifulSoup, BeautifulStoneSoup

# 674 pages last time I checked. Oddly enough, their pages seem zero-based. Additionally, if you
# substitute an arbitrary number, outside of the range of pages, you'll get data back instead
# of 404. I'm not sure why they're doing this.
for page_num in range(0,674):
	url = 'http://www.chucknorrisfacts.com/all-chuck-norris-facts?page=%d' % page_num
	html = urllib2.urlopen(url)
	soup = BeautifulSoup(html)

	entries = soup.findAll(&quot;li&quot;,&quot;views-row&quot;)
	for entry in entries:

		# use BeautifulStoneSoup to remove any HTML-escaped text that BS returns.
		the_quote = BeautifulStoneSoup(entry.div.text,
		                   convertEntities=BeautifulStoneSoup.HTML_ENTITIES).contents[0]

		# print it to stdout. I just redirect the program's output to a file.
		print the_quote.encode('utf-8')
	# be a good citizen and wait a few seconds before visiting the next page
	time.sleep(6)
</pre>
<p>Output looks like</p>
<pre class="brush: plain; title: ; notranslate">
if the mountain won't come to Muhammad, Chuck Norris will bring it.
if you watch the ring you die in 7 days,if you look at Ckuck Norris you die instantly
in a real zombie apocalypse, Chuck Norris can roundhouse-kick 53,596 zombies dead.
in space no-one can hear you scream.....except chuck norris!
iphone 4? chuck norris has iphone 8
most kids pee their name into snow... Chuck Norris pisses his in concreate...
never say you can'thurt a fly to chuck norris because he will hurt you
new never-before-seen behind-the-scenes shots from Walker Texas Range shows Chuck Norris carrying his truck home after it broke down
no one has ever found where the smurfs live thats  becuase they live  in chuck norrises beard
...
</pre>
<p>Visit my <a href="https://github.com/neilkod/chucknorrisfacts" target="_blank">github</a> for the full dataset (5500 entries).</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.neilkodner.com/2011/03/chuck-norris-doesnt-screen-scrape-the-data-runs-scared-to-his-hard-drive/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fun with awk and dead people</title>
		<link>http://www.neilkodner.com/2011/02/fun-with-awk-and-dead-people/</link>
		<comments>http://www.neilkodner.com/2011/02/fun-with-awk-and-dead-people/#comments</comments>
		<pubDate>Thu, 24 Feb 2011 19:36:15 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[awk]]></category>
		<category><![CDATA[curl]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[freebase]]></category>

		<guid isPermaLink="false">http://www.neilkodner.com/?p=521</guid>
		<description><![CDATA[Just playing around with some Freebase data in preparation for a &#8216;who died today&#8217; twitter bot. Get the data and determine on which date did the most people die? Surprised to see 1965-11-08 listed ahead of 2001-09-11. Why? Lets look at where people died on 1965-11-08: Upon further investigation, it looks as if Freebasers have [...]]]></description>
			<content:encoded><![CDATA[<p>Just playing around with some <a href="http://www.freebase.com">Freebase</a> data in preparation for a &#8216;who died today&#8217; twitter bot.</p>
<p><strong>Get the data and determine on which date did the most people die?</strong></p>
<pre class="brush: bash; title: ; notranslate">

hadoop3:Downloads nkodner$ curl -O &quot;http://download.freebase.com/datadumps/latest/browse/people/deceased_person.tsv&quot;
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 16.3M  100 16.3M    0     0   209k      0  0:01:19  0:01:19 --:--:--  248k
hadoop3:Downloads nkodner$ awk -F'\t' '{print $4}' deceased_person.tsv|grep &quot;-&quot;|sort|uniq -c|sort -n|tail -11|head
  22 2008-01-03
  22 2008-02-21
  22 2008-05-20
  23 1989-06-07
  23 2009-01-13
  24 2009-01-11
  26 2009-04-03
  27 1912-04-15
  63 2001-09-11
  65 1965-11-08
</pre>
<p>Surprised to see 1965-11-08 listed ahead of 2001-09-11. Why? <strong>Lets look at where people died on 1965-11-08</strong>:</p>
<pre class="brush: bash; title: ; notranslate">
hadoop3:Downloads nkodner$ grep &quot;1965-11-08&quot; deceased_person.tsv |awk -F'\t' '{print $5}' |sort|uniq -c|sort -n
   1 Kenton County
   1 Latium
   1 Leicester
   1 New York City
   1 Toronto
   3
  57 American Airlines Flight 383 Crash Site
</pre>
<p>Upon further investigation, it looks as if Freebasers have set up a <a href="http://www.freebase.com/view/base/americanairlinesflight383/views/victims_of_aa_flight_383">Victims of AA Flight 383 page</a>, containing info on the deceased. Works for me.</p>
<p><strong>How about which month/year did the most people die on?</strong></p>
<pre class="brush: bash; title: ; notranslate">
hadoop3:Downloads nkodner$ awk -F'\t' '{print $4}' deceased_person.tsv|grep &quot;-&quot;|awk -F'-' '{print $2&quot;-&quot;$3}'|sort|uniq -c|sort -n|tail -11|head
 668 02-08
 668 03-06
 672 01-06
 673 02-11
 676 01-28
 677 01-10
 683 01-04
 692 12-31
 702 01-22
 752 02-02
</pre>
<p><strong>Method of death?</strong></p>
<pre class="brush: bash; title: ; notranslate">
hadoop3:Downloads nkodner$ awk -F'\t' '{print $3}' deceased_person.tsv|sort|uniq -c|sort -n|tail -11|head
 505 Cardiovascular disease
 603 Tuberculosis
 742 Assassination
 745 Stroke
 799 Pneumonia
 832 Lung cancer
 913 Murder
1618 Suicide
1978 Cancer
2503 Myocardial infarction
</pre>
<p><strong>And finally, the most common names of the deceased people listed on Freebase</strong></p>
<pre class="brush: bash; title: ; notranslate">
hadoop3:Downloads nkodner$ awk -F '\t' '{print $1}' deceased_person.tsv |sort|uniq -c|sort -n|tail -11|head
  21 William Anderson
  23 John White
  25 John Campbell
  25 John Wilson
  29 George Smith
  30 John Anderson
  32 William Smith
  34 John Williams
  35 John Taylor
  36 John Smith
</pre>
<p>Nothing too deep today, maybe this data might be worth a closer look in R someday.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.neilkodner.com/2011/02/fun-with-awk-and-dead-people/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Visualizations of Canabalt scores scraped from twitter</title>
		<link>http://www.neilkodner.com/2011/02/visualizations-of-canabalt-scores-scraped-from-twitter/</link>
		<comments>http://www.neilkodner.com/2011/02/visualizations-of-canabalt-scores-scraped-from-twitter/#comments</comments>
		<pubDate>Wed, 16 Feb 2011 22:56:46 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[canabalt]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[r]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://www.neilkodner.com/?p=479</guid>
		<description><![CDATA[Canabalt, a ridiculously addicting web/IOS-device game allows one to show off their high scores, and their not-so-high scores to Twitter. Each of these tweets contains a bit of information &#8211; The score represented in meters, the method of death (hitting a wall and tumbling to my death) and the device (iPhone). Other useful information can [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.canabalt.com/">Canabalt</a>, a ridiculously addicting web/IOS-device game allows one to show off their high scores, and their <a href="http://twitter.com/#!/neilkod/status/37964035903324160">not-so-high scores</a> to Twitter.</p>
<p><a href="http://www.neilkodner.com/wp-content/uploads/2011/02/canabaltscore.png"><img class="alignnone size-medium wp-image-485" title="canabaltscore" src="http://www.neilkodner.com/wp-content/uploads/2011/02/canabaltscore-300x105.png" alt="" width="300" height="105" /></a></p>
<p>Each of these tweets contains a bit of information &#8211; The score represented in meters, the method of death (hitting a wall and tumbling to my death) and the device (iPhone). Other useful information can easily be extracted such as the date/time played and information about the user (name, location, friend count, follower count, etc). Over the next few weeks I aim to see what features, if any, has any influence on Canabalt scores.</p>
<p>The first thing I needed to do was capture the tweeted Canabalt scores. I have a process running on an EC2 micro instance that downloads tweets from the Twitter Streaming API based on certain key words, one of them being canabalt. The process loads each matching tweet into a MongoDB instance hosted on <a href="http://www.mongohq.com">MongoHQ.com</a>.</p>
<pre class="brush: bash; title: ; notranslate">

curl -s -u $TWITTER_USERNAME:$TWITTER_PASSWORD -d @/home/ec2-user/trackingkeywords http://stream.twitter.com/1/statuses/filter.json |/home/ec2-user/mongodb/bin/mongoimport &amp;
</pre>
<p>Where trackingkeywords is a file containing a comma-separated list of keywords that I track on twitter. Additionally, I left connection details out of the mongoimport command. You&#8217;ll need to provide a host, port, database, and collection into the mongoimport command.</p>
<p>I then run some python code to query the MongoDB instance and retrieve tweets mentioning Canabalt, based on a simple regular expression. I&#8217;m expecting the tweet to begin with &#8216;I&#8217; and contain the word Canabalt. Pretty naive but it worked fine. If it&#8217;s not a true Canabalt score, I&#8217;ll be able to determine in no time. From there, I use regular expressions to extract(for now) the score, the method of death, and the device name.</p>
<pre class="brush: python; title: ; notranslate">
def canabalt_tweets():

	# connect to MongoDB
	tweets = create_connection(False)

	# regular expression to extract components of a canabalt score
	canabalt_regexp = re.compile(r'I ran (\d{3,7})m before (.*) on my ([^.]+)\.')

	# regular expression to match tweets that begin with I ran and mention canabalt
	regexp = re.compile('^I ran .*canabalt')

	# create a MongoDB cursor(query)
	cur = tweets.conftweets.find({'text': regexp}, {'text': 1})

	# iterate through the cursor. If a tweet fits the pattern, print it.
	for item in cur:
		try:
			(score,death,device) = canabalt_regexp.search(item['text']).groups()
			print ','.join([strip_text(score),strip_text(death),strip_text(device)])
		except:
			pass
</pre>
<p>Function strip_text() is part of my data tools Bat-Utility Belt and cleans text by removing leading/trailing spaces, crlf, tabs and some other junk.</p>
<p>We now have some comma-separated data in this shape</p>
<pre class="brush: plain; title: ; notranslate">
score,death,device
2860,hitting a wall and tumbling to my death,iPhone
3427,hitting a wall and tumbling to my death,iPad
4496,hitting a wall and tumbling to my death,iPad
3635,missing another window,iPhone
2040,colliding with some enormous obstacle,iPhone
6017,somehow hitting the edge of a billboard,iPhone
8374,knocking a building down,iPhone
2939,hitting a wall and tumbling to my death,iPad
2021,turning into a fine mist,iPad
</pre>
<p>Now for some more fun &#8211; visualization and analysis. This is performed in R because, well, R is awesome. That, and I really need some more practice with R.</p>
<p>To date, I&#8217;ve collected just over 1200 Canabalt &#8216;events&#8217;. I will likely turn this into a web app if there&#8217;s enough interest.</p>
<p>A couple of summaries:</p>
<p>scores by device type:</p>
<pre class="brush: plain; title: ; notranslate">
      device count mean stddev median   max min range
      iPhone   735 4491   3882 3419.0 36332 102 36230
        iPad   284 4723   3884 4041.5 40630 104 40526
  iPod touch   189 3734   3644 2713.0 28024 102 27922&gt;
</pre>
<p>scores by type of death:</p>
<pre class="brush: plain; title: ; notranslate">
                                            death count mean stddev median   max  min range
          hitting a wall and tumbling to my death   684 4155   3481 3319.5 36332  102 36230
                           missing another window   243 5898   4981 4486.0 40630  409 40221
                         turning into a fine mist    86 3592   2698 2662.5 16441  614 15827
            colliding with some enormous obstacle    40 4768   4247 3256.5 16933  433 16500
                              falling to my death    37 4176   3160 3619.0 13573  567 13006
                       missing a crane completely    22 2950   1774 2923.5  7883  381  7502
                         knocking a building down    21 3399   2267 2849.0  8374  336  8038
                   not quite reaching a billboard    19 3098   1244 2980.0  5772  444  5328
              landing where a building used to be    17 4804   4970 3631.0 22685 1170 21515
          somehow hitting the edge of a billboard    14 5991   3827 5518.5 13547  566 12981
   just barely stumbling out of the first hallway    13  104      1  104.0   104  102     2
              somehow hitting the edge of a crane     7 5497   4835 4942.0 13275  510 12765
       riding a falling building all the way down     4 4278   2162 4195.5  6993 1727  5266
           completely  missing the entire hallway     1 1046     NA 1046.0  1046 1046     0
</pre>
<p>And now, in the spirit of killing the almighty ink-data ratio, here are some pictures:<br />
<img class="alignnone size-full wp-image-503" title="overall plot of scores" src="http://www.neilkodner.com/wp-content/uploads/2011/02/canabaltscores.png" alt="plot of scores" width="619" height="630" /></p>
<p><a href="http://www.neilkodner.com/wp-content/uploads/2011/02/bydeathfactedbytype11.png"><img class="alignnone size-large wp-image-505" title="by death faceted by device type" src="http://www.neilkodner.com/wp-content/uploads/2011/02/bydeathfactedbytype11-1024x779.png" alt="by death faceted by device type" width="717" height="545" /></a></p>
<p><a href="http://www.neilkodner.com/wp-content/uploads/2011/02/scores-by-device.png"><img class="alignnone size-full wp-image-507" title="scores by device" src="http://www.neilkodner.com/wp-content/uploads/2011/02/scores-by-device.png" alt="scores by device" width="534" height="539" /></a></p>
<p><a href="http://www.neilkodner.com/wp-content/uploads/2011/02/bydeathtype.png"><img class="alignnone size-large wp-image-509" title="bydeathtype" src="http://www.neilkodner.com/wp-content/uploads/2011/02/bydeathtype-1024x641.png" alt="by death type" width="614" height="385" /></a></p>
<p>What have we learned? So far, while my data set isn&#8217;t altogether that large(1200 events), we might have enough to make some basic observations and assumptions(correction please!). Going into this experiment I thought that iPad players would have generally higher scores. This is because of #1 the larger screen size and #2 players wouldn&#8217;t necessarily be playing &#8216;on-the-go&#8217; as they would be (I know I am) on an iPhone or iPod touch. The iPad has higher median and average scores than the other devices. I&#8217;d like to revisit this as I collect more data.</p>
<p>The leading cause of Canabalt death, by far, is hitting a wall and tumbling to one&#8217;s death. This surprised me as I thought it would be falling to death &#8211; that&#8217;s how my Canabalt games seem to end.</p>
<p>I&#8217;d like to hear your comments suggestions for new analysis, and most of all, your corrections.  You know who you are and this is how I learn. The data and python/R source can be found on <a href="https://github.com/neilkod/canabalt">github</a>.</p>
<p>The stack: Twitter Streaming API, EC2, MongoDB, Python, Regular Expressions, R</p>
<p>Things I learned working on this: <a href="http://had.co.nz/plyr/">plyr</a>(group-by and aggregation in R), sorting dataframes in R, couple of new <a href="http://had.co.nz/ggplot2/">ggplot2</a> tricks.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.neilkodner.com/2011/02/visualizations-of-canabalt-scores-scraped-from-twitter/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Retrieving the US National Debt and Population using Python and BeautifulSoup</title>
		<link>http://www.neilkodner.com/2011/01/retrieving-the-us-national-debt-using-python-and-beautifulsoup/</link>
		<comments>http://www.neilkodner.com/2011/01/retrieving-the-us-national-debt-using-python-and-beautifulsoup/#comments</comments>
		<pubDate>Mon, 31 Jan 2011 02:41:10 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.neilkodner.com/?p=462</guid>
		<description><![CDATA[Update 12-May-2011, I cleaned up the code, added logging of the data to a tab-delimited file, and published it to github. Happy Hacking! Someone suggested I create a bot that tweets the US National Debt.  Here&#8217;s how I&#8217;m retrieving the National Debt amount from the US Treasury site.  I then retrieve the US Population from [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Update 12-May-2011</strong>, I cleaned up the code, added logging of the data to a tab-delimited file, and published it to <a href="https://github.com/neilkod/national_debt">github</a>. Happy Hacking!</p>
<p>Someone suggested I create a bot that tweets the <a href="http://en.wikipedia.org/wiki/United_States_public_debt">US National Debt</a>.  Here&#8217;s how I&#8217;m retrieving the National Debt amount from the <a href="http://www.treasurydirect.gov">US Treasury</a> site.  I then retrieve the US Population from census.gov to figure out each person&#8217;s share of the national debt.</p>
<p>I present to you <a href="http://twitter.com/#!/usadebtlevel">@usadebtlevel</a>, my debt-tweeting bot.</p>
<pre class="brush: python; title: ; notranslate">
from BeautifulSoup import BeautifulSoup
import urllib2
debt_url = 'http://www.treasurydirect.gov/NP/BPDLogin?application=np'
page = urllib2.urlopen(debt_url)
soup = BeautifulSoup(page)
national_debt = soup.find('table',{'class':'data1'}).findAll('td')[3].text
as_of = soup.find('table',{'class':'data1'}).findAll('td')[0].text

population_url = 'http://www.census.gov/main/www/popclock.html'
population_page = urllib2.urlopen(population_url)
soup = BeautifulSoup(population_page)
population = soup.find('span',{'id':'usclocknum'}).text
debt_amount = float(national_debt.replace(',',''))
population_amount = int(population.replace(',',''))
per_person = debt_amount / population_amount
print &quot;US National debt as of %s is %s, or %.2f for each person(%s) in the US&quot; % (as_of, debt,per_person,population)
</pre>
<p>Output:</p>
<pre class="brush: plain; title: ; notranslate">US National debt as of 01/27/2011 is 14,059,409,159,678.42, or 45064.71 for each person(311,982,692) in the US</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.neilkodner.com/2011/01/retrieving-the-us-national-debt-using-python-and-beautifulsoup/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>World Cup 2010: Analysis of tweets celebrating goals</title>
		<link>http://www.neilkodner.com/2011/01/world-cup-2010-analysis-of-95gb-of-tweets-containing-goal/</link>
		<comments>http://www.neilkodner.com/2011/01/world-cup-2010-analysis-of-95gb-of-tweets-containing-goal/#comments</comments>
		<pubDate>Mon, 10 Jan 2011 14:17:19 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.neilkodner.com/?p=226</guid>
		<description><![CDATA[The 2010 World Cup proved to be one of the most tweeted about events of 2010.  Through the millions of tweets saved to my local Cloudera CDH3 Hadoop cluster, I wrote a quick pig script to discover the ways that people are celebrating(ok, spelling) goals.  Here are the top few variations of Goal/Gol.  The full [...]]]></description>
			<content:encoded><![CDATA[<p>The 2010 World Cup proved to be one of the most tweeted about events of 2010.  Through the millions of tweets saved to my local Cloudera CDH3 Hadoop cluster, I wrote a quick pig script to discover the ways that people are celebrating(ok, spelling) goals.  Here are the top few variations of Goal/Gol.  The full results can be found <a href="http://www.neilkodner.com/gooool.txt">here</a>.  I&#8217;m happy to share the Pig code if anyone is interested.</p>
<pre class="brush: plain; title: ; notranslate">
158636 gol
126669 goal
31722 Goal
24735 Gol
19610 GOL
14317 GOAL
4178 gool
2981 ggol
2219 goll
1771 goool
1641 GOOOL
1564 Gooool
1498 Goool
1279 GOOOOL
1188 Goooool
1158 GOOOOOOOL
1124 GOOOOOL
1116 gooool
1075 GOOL
</pre>
<p>A few rough visualizations, they are in need of an update. The first is a scatter between the length of the GOAL and the number of Os contained. You&#8217;ll see that a lot of excited soccer fans like to use the full 140 characters in their celebrations, often using 138 Os.  Huge thanks to <a href="http://www.twitter.com/johnmyleswhite">@johnmyleswhite</a> for the inspiration and the <a href="http://gist.github.com/447830">R/ggplot2 help</a>.  You don&#8217;t even want to see my original version!</p>
<p><a href="http://www.neilkodner.com/wp-content/uploads/2010/06/exponentialFunctionOfWordLength.png"><img class="alignnone size-medium wp-image-230" title="exponentialFunctionOfWordLength" src="http://www.neilkodner.com/wp-content/uploads/2010/06/exponentialFunctionOfWordLength-300x255.png" alt="Number of Os in GOAL vs GOAL length" width="300" height="255" /></a></p>
<p><a href="http://www.neilkodner.com/wp-content/uploads/2010/06/exponentialFunctionOfWordLength.png"></a> Inspired by <a href="http://www.twitter.com/peteskomoroch">@peteskomoroch</a>, who was inspired by the <a href="http://www.wired.com/geekdad/2009/01/khaaaaaan/">frequencies of the length of mentions of KAAAAAAHN!</a> here are the frequencies of the length of GOAL! celebrations found on twitter.</p>
<p><a href="http://www.neilkodner.com/wp-content/uploads/2010/06/length-of-goooooallll-tweets.png"><img class="alignnone size-medium wp-image-233" title="length of goooooallll tweets" src="http://www.neilkodner.com/wp-content/uploads/2010/06/length-of-goooooallll-tweets-300x289.png" alt="" width="300" height="289" /></a></p>
<p>A visualization of the hashtag mentions by country through all of the World Cup Tweets.  Click for full-sized version.</p>
<p><a href="http://www.neilkodner.com/wp-content/uploads/2010/06/countryHashtags.png"><img class="alignnone size-medium wp-image-245" title="countryHashtags" src="http://www.neilkodner.com/wp-content/uploads/2010/06/countryHashtags-300x149.png" alt="Counts of country hastags" width="300" height="149" /></a></p>
<p>And my personal favorite visualization, the number of tweets mentioning vuvuzela, hourly, during the course of the world cup.</p>
<p><a href="http://www.neilkodner.com/images/littlesnapper/vuvuzelatweets.png"><img class="alignnone" title="Vuvuzela tweets per hour" src="http://www.neilkodner.com/images/littlesnapper/vuvuzelatweets.png" alt="Vuvuzela tweets per hour" width="1146" height="533" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.neilkodner.com/2011/01/world-cup-2010-analysis-of-95gb-of-tweets-containing-goal/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>My Twitter bots:  Tens of thousands of followers can&#8217;t be wrong</title>
		<link>http://www.neilkodner.com/2010/12/my-twitter-bots-tens-of-thousands-of-followers-cant-be-wrong/</link>
		<comments>http://www.neilkodner.com/2010/12/my-twitter-bots-tens-of-thousands-of-followers-cant-be-wrong/#comments</comments>
		<pubDate>Tue, 21 Dec 2010 12:20:24 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[seinfeld]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://www.neilkodner.com/?p=412</guid>
		<description><![CDATA[edit: March 17, 2011 I need your help! If you have additional Seinfeld quotes to contribute, or for a list of all of the current Seinfeld quotes, please visit this post. My current army of twitter bots and the keyword that each one responds to: @HelloooooNewman (seinfeld) Klout score 74 @TheBotLebowski (lebowski) Klout score 70 [...]]]></description>
			<content:encoded><![CDATA[<p><strong>edit: March 17, 2011 I need your help! If you have additional Seinfeld quotes to contribute, or for a list of all of the current Seinfeld quotes, <a href="http://www.neilkodner.com/2011/03/looking-for-some-new-quotes-for-hellooooonewman/">please visit this post.</a></strong></p>
<p><a href="http://www.neilkodner.com/wp-content/uploads/2010/12/bot-followers.png"><img class="alignnone size-full wp-image-414" title="bot followers" src="http://www.neilkodner.com/wp-content/uploads/2010/12/bot-followers.png" alt="" width="839" height="182" /></a></p>
<p>My current army of twitter bots and the keyword that each one responds to:</p>
<ul>
<li><a href="http://www.twitter.com/#!/hellooooonewman">@HelloooooNewman</a> (seinfeld) <a href="http://klout.com/hellooooonewman">Klout score 74</a></li>
<li><a href="http://www.twitter.com/#!/thebotlebowski">@TheBotLebowski</a> (lebowski) <a href="http://klout.com/thebotlebowski">Klout score 70</a></li>
<li><a href="http://www.twitter.com/#!/acenterforants">@ACenterForAnts</a> (zoolander) <a href="http://klout.com/acenterforants">Klout score 70</a></li>
<li><a href="http://www.twitter.com/#!/iamjacksbot">@IAmJacksBot</a> (fight club) <a href="http://klout.com/iamjacksbot">Klout Score 74</a></li>
<li><a href="http://www.twitter.com/#!/amaninamask">@AManInAMask</a> (V for Vendetta)</li>
<li><a href="http://www.twitter.com/#!/worldofshit">@WorldOfShit</a> (full metal jacket) <a href="http://klout.com/worldofshit">Klout score 65</a></li>
<li><a href="http://www.twitter.com/#!/somegrenades">@SomeGrenades</a> (serenity + firefly) <a href="http://klout.com/somegrenades">Klout score 56</a></li>
<li><a href="http://www.twitter.com/#!/gunshowtickets">@GunShowTickets</a> (Ron Burgundy) No Klout score yet</li>
<li><a href="http://www.twitter.com/#!/pleasebe18">@PleaseBe18</a> (Ricky Bobby)</li>
<li><a href="http://www.twitter.com/#!/abakingpowder">@ABakingPowder</a> (schwing) <a href="http://klout.com/abakingpowder">Klout score 54</a></li>
<li><a href="http://www.twitter.com/#!/which_is_nice">@Which_is_nice</a> (caddyshack) <a href="http://klout.com/which_is_nice">Klout score 56</a></li>
<li><a href="http://www.twitter.com/#!/mitchhedbot">@MitchHedbot</a> (mitch hedberg) No Klout Score yet</li>
<li><a href="http://www.twitter.com/#!/dubbbya">@dubbbya</a> (gwb) &#8212; banned from twitter</li>
<li><a href="http://www.twitter.com/#!/dreidly">@dreidly</a> (dreidel) &#8212; retired</li>
</ul>
<p>I&#8217;ve also built a few programs that scrape air quality data from the State of Utah and tweet the results.</p>
<ul>
<li><a href="http://www.twitter.com/#!/utahairquality">@UtahAirQuality</a> serving Salt Lake and Davis Counties</li>
<li><a href="http://www.twitter.com/#!/webercountyair">@WeberCountyAir</a></li>
<li><a href="http://www.twitter.com/#!/cachecountyair">@CacheCountyAir</a></li>
<li><a href="http://www.twitter.com/#!/utahcountyair">@UtahCountyAir</a></li>
</ul>
<p><a href="http://twitter.com/#!/usadebtlevel">@usadebtlevel</a> which tweets the US National Debt and each US Citizen&#8217;s share.</p>
<p>And here&#8217;s a sneak preview: @SarahEffinPalin was conceived after a friend, Willie Morris (<a href="http://www.twitter.com/#!/morewillie">@morewillie</a>) suggested a bot that, lets say, repurposes <a href="http://www.twitter.com/#!/sarahpalinusa">Sarah Palin&#8217;s</a> tweets.  I think <a href="http://www.twitter.com/#!/saraheffinpalin">@SarahEffinPalin</a> is going to be a hit.</p>
<p>We all know The Big Lebowski is a cult classic and one of the most quoteable movies of all time.  I don&#8217;t exactly remember how this started but a long time ago, I thought people who mentioned &#8220;Lebowski&#8221; in a tweet would appreciate receiving a quote from the movie.  So with nothing but the Twitter API docs and a little bit of python, I built <a href="http://www.twitter.com/#!/thebotlebowski">@thebotlebowski</a>, my first auto-responder.  The idea was simple &#8211; using urllib2, perform a search for &#8220;lebowski&#8221;, and iterate through the results.  For each result, retrieve a random entry out of a quotes database and tweet it as a reply to the original tweet.</p>
<p><span style="font-size: 11.6667px;">After a ton of retweets, #ff mentions, replies, and followers, it became pretty obvious that people liked it.  I needed a followup &#8211; another infinitely quoteable movie.  Zoolander!  Thus, <a href="http://www.twitter.com/#!/">@ACenterForAnts</a> was born.  Again, my research showed that all mentions of Zoolander on twitter were either references to the movie or Derek Zoolander himself.</span></p>
<p>Another follow-up was in order.  A <a href="http://www.twitter.com/#!/abstractdata">friend</a> suggested a Seinfeld one.  Done. Welcome <a href="http://www.twitter.com/#!/hellooooonewman">@HelloooooNewman</a>. And then <a href="http://www.twitter.com/#!/iamjacksbot">@IAmJacksBot</a> and then the others.  The key was to create bots that use search terms that are not vague &#8211; If someone tweets &#8220;Full Metal Jacket&#8221;, then they&#8217;re obviously talking about the movie.  Same with &#8220;Fight Club.&#8221;</p>
<p>One of the lessons learned was that not everyone who tweets about &#8220;GWB&#8221; was necessarily referencing the president.  A <a href="http://www.twitter.com/#!/dtseiler">friend</a> suggested a bot that replies to mentions of GWB with one of George Bush&#8217;s self-butchered quotes.  People loved it except for people in New York &#8211; hey, I didn&#8217;t realize that so many people tweeted about the George Washington Bridge in abbreviated format!  This includes several NYC twitter accounts that automatically post traffic conditions.  The complaints came in quicker than I could add people to the ignore list.  Eventually, the well-loved but polarizing <a href="http://www.twitter.com/#!/dubbbya">@dubbbya</a> was banned from twitter.  May his <a href="http://www.neilkodner.com/georgewbushquotes.txt">quotes</a> live on in infamy.</p>
<p>While the bots have been very well-received, not everyone likes them.  When there were only a handful of bots, I used to monitor their responses.  Not that there are so many, I&#8217;ve added an ignore list for just this reason.  To add yourself to the ignore list, either contact me, <a href="http://www.twitter.com/#!/@neilkod">tweet me</a>, or visit <a href="http://neilkodsbots.appspot.com">http://neilkodsbots.appspot.com</a>.</p>
<p><strong>Frequently Asked Questions:</strong></p>
<p><strong>Have any celebrities found your bots?</strong></p>
<p>The bots tweet out to celebrities all of the time.  @ACenterForAnts has tweeted <a href="http://www.twitter.com/#!/redhourben">Ben Stiller</a> many, many times but Ben has never replied.  Sometimes, the celebrities tweet back.  I don&#8217;t actively monitor the mentions and replies to the bots &#8211; there are just too many.  My favorite anecdote, so far, is when <a href="http://www.twitter.com/#!/adamsbaldwin">Adam Baldwin</a> discovered <a href="http://www.twitter.com/#!/worldofshit">@worldofshit</a>, my Full Metal Jacket bot and immediately triggered it over and over to receive new quotes.  He then started tweeting about the bot to his followers and it quickly picked up steam.  I was thrilled to see that one of the stars of Full Metal Jacket was tweeting so favorably about a program that I wrote that I created <a href="http://www.twitter.com/#!/somegrenades">@somegrenades</a> in his honor.</p>
<p>If you notice a celebrity or otherwise notable person referencing one of my bots, please let me know.  <a href="http://www.delicious.com/neilkod/celebrity">The mentions that I know about</a> include Q-Tip, Taleb Kweli, and Fred Durst.</p>
<p><strong>But you&#8217;re a data geek, not a twitter programmer!  Are you doing anything cool with the data?</strong></p>
<p>Yes!  Every tweet that I find, I log.  For example, since <a href="http://www.twitter.com/#!/">@HelloooooNewman</a> has already sent out over 170,000 replies, I have at least that many incoming tweets mentioning Seinfeld in my logs.  I am able to tell who&#8217;s tweeting about Seinfeld, when people are talking about Seinfeld, what they&#8217;re saying, and so on and so forth.  I can even tell if certain events, such as the release of a box set or a new event have resulted in an increase of Seinfeld tweets.  For examples of some of the things I&#8217;ve done with the twitter data, check out this <a href="http://www.neilkodner.com/2010/04/hacking-seinfeld-tweets-with-apache-pig-a-work-in-progress/">analysis of Seinfeld Tweets</a> or this <a href="http://www.neilkodner.com/2010/11/what-do-23000-charlie-sheen-tweets-look-like/">word cloud generated from 23,000 tweets about Charlie Sheen</a>.  Please contact me if you&#8217;d like to hear more.</p>
<p><span style="font-size: 11.6667px;"><strong>Would you create a bot for me/my company/my promotion?</strong></span></p>
<p>I get asked this all of the time.  The answer is:  It depends.  Lets talk.  Before we go about doing this, we&#8217;d need to establish a few ground rules.  I&#8217;ve worked very hard to keep the bots entertaining and not spammy.</p>
<p><strong>The tweets don&#8217;t include urls or advertisements &#8211; Are you making any money off of them?</strong></p>
<p>While the bots don&#8217;t generate income directly, they have led to other opportunities and benefits.  For starters, I&#8217;ve picked up a ton of quality followers and contacts that I would have never met.  Additionally, through this experience, I&#8217;ve learned a great deal about twitter the twitter API, and numerous features of Python that I wouldn&#8217;t have normally dived into.  To answer the question, A few companies and web sites have licensed the technology and I&#8217;ve created custom bots and twitter searches for them.  I&#8217;ve elected to not mention them directly in this post.</p>
<p><strong>I don&#8217;t want the bots to reply to my tweets.  Can they ignore me?</strong></p>
<p>Sure, the easiest way to be ignored is to visit <a href="http://neilkodsbots.appspot.com">http://neilkodsbots.appspot.com</a> and add yourself to the ignore list.  Honor system please!  I didn&#8217;t feel it was necessary to ask users to authenticate via twitter just so my application could ignore them.</p>
<p><strong>I&#8217;m selling Seinfeld/Zoolander/Lebowski products &#8211; will you tweet this link to all of your bots followers?</strong></p>
<p>I also get asked this all of the time.  <a href="http://www.twitter.com/#!/hellooooonewman">@HelloooooNewman</a> has over ten thousand followers.  <a href="http://www.twitter.com/#!/ACenterForAnts">@ACenterForAnts</a> and <a href="http://www.twitter.com/#!/TheBotLebowski">@TheBotLebowski</a> also combine for another ten thousand followers.  While I won&#8217;t send a mention of your product/URL/promotion to their followers, I do have other methods of driving traffic and building awareness to a targeted group of followers.  Lets talk.</p>
<p><strong>May I have the source code?</strong></p>
<p>Since I&#8217;ve been using this program for a few not-mentioned commercial purposes, I&#8217;m not interested in sharing the secret sauce.  I will, however, let you know it was pretty straightforward to do.  Anyone with a minimum of programming skill should be able to do this.</p>
<p><strong>Will there be more bots?</strong></p>
<p>Always.  I&#8217;m always on the lookout for new ideas.  Let me know if you have any.  The next one on my plate will be one for Eastbound and Down.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.neilkodner.com/2010/12/my-twitter-bots-tens-of-thousands-of-followers-cant-be-wrong/feed/</wfw:commentRss>
		<slash:comments>36</slash:comments>
		</item>
		<item>
		<title>Word Cloud from 6,500 tweets mentioning Kayne West.  From this morning</title>
		<link>http://www.neilkodner.com/2010/12/word-cloud-from-6500-tweets-mentioning-kayne-west-from-this-morning/</link>
		<comments>http://www.neilkodner.com/2010/12/word-cloud-from-6500-tweets-mentioning-kayne-west-from-this-morning/#comments</comments>
		<pubDate>Tue, 14 Dec 2010 22:25:32 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[kanye]]></category>
		<category><![CDATA[kanyewest]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://www.neilkodner.com/?p=403</guid>
		<description><![CDATA[After removing a few stopwords and then clearing out a few other words(nowplaying, lastfm, and the like), here&#8217;s what&#8217;s left.  The data represents a half-day&#8217;s worth of tweets.   I&#8217;m sitting on about 90,000 tweets about Kanye and am looking forward to taking the time for some more in-depth analysis.  Huge thanks to @jrlevine and [...]]]></description>
			<content:encoded><![CDATA[<p>After removing a few <a href="http://www.neilkodner.com/stopwords.txt">stopwords</a> and then clearing out a few other words(nowplaying, lastfm, and the like), here&#8217;s what&#8217;s left.  The <a href="http://www.neilkodner.com/kanyetoday.txt">data</a> represents a half-day&#8217;s worth of tweets.   I&#8217;m sitting on about 90,000 tweets about Kanye and am looking forward to taking the time for some more in-depth analysis.  Huge thanks to <a href="http://www.twitter.com/#!/jrlevine">@jrlevine</a> and <a href="http://www.twitter.com/#!/alexmr">@alexmr</a> from <a href="http://www.twordsie.com">twordsie.com</a> for curating the awesome stopwords list, which I found in their <a href="https://github.com/jakelevine/twordsie">github project</a>.</p>
<p><a href="http://www.neilkodner.com/wp-content/uploads/2010/12/kanye-word-cloud.png"><img class="alignnone size-large wp-image-404" title="kanye word cloud" src="http://www.neilkodner.com/wp-content/uploads/2010/12/kanye-word-cloud-1024x447.png" alt="kayne west tweets word cloud" width="1024" height="447" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.neilkodner.com/2010/12/word-cloud-from-6500-tweets-mentioning-kayne-west-from-this-morning/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Visualization of top 150(now 300!) tweeters at pubcon</title>
		<link>http://www.neilkodner.com/2010/11/visualization-of-top-150-tweeters-at-pubcon/</link>
		<comments>http://www.neilkodner.com/2010/11/visualization-of-top-150-tweeters-at-pubcon/#comments</comments>
		<pubDate>Tue, 09 Nov 2010 23:42:24 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.neilkodner.com/?p=394</guid>
		<description><![CDATA[I built http://www.pubcontweets.com to track the tweets about pubcon, a social media conference that&#8217;s being held in Las Vegas.  A little python and twitter api, sprinked in with some Gephi yields the intra-network connections of the 150 users with the most #pubcon posts.  The visualization is in svg format &#8211; make sure you use your [...]]]></description>
			<content:encoded><![CDATA[<p>I built <a href="http://www.pubcontweets.com">http://www.pubcontweets.com</a> to track the tweets about pubcon, a social media conference that&#8217;s being held in Las Vegas.  A little python and twitter api, sprinked in with some Gephi yields the intra-network connections of the 150 users with the most #pubcon posts.  The visualization is in svg format &#8211; make sure you use your browser&#8217;s zoom feature to see the smaller nodes.</p>
<div class="wp-caption alignnone" style="width: 605px"><img title="visualization of the network top 150 pubcon-tweeters" src="http://www.neilkodner.com/pubcontop150.svg" alt="" width="595" height="600" /><p class="wp-caption-text">Visualization of the twitter networks of the top 150 pubcon tweeters</p></div>
<p>The size of the node indicates their <a href="http://en.wikipedia.org/wiki/Indegree#Indegree_and_outdegree">in-degree</a>, which represents the number of pubcon-tweeters that that follow a given user.  The color, from light-to-dark, represents the <a href="http://en.wikipedia.org/wiki/Indegree#Indegree_and_outdegree">out-degree</a> &#8211; the number of users within the community that they themselves follow.</p>
<p>Note that the degree does not represent the total number of followers/friends pertaining to each node &#8211; rather, it&#8217;s the connectedness within the group I extracted.  I&#8217;d like to run the visualization on a larger group of users, perhaps 300 or so, but due to rate limiting, I have to wait ten seconds between each twitter api query.</p>
<p>None of this would have been possible without the help and inspiration of <a href="http://twitter.com/#!/psychemedia">@psychmedia</a> and his excellent blog, <a href="http://blog.ouseful.info/">OUseful.info</a>.    His Gephi Tutorials have been a tremendous inspiration &#8211; I encourage all of you to follow him on twitter and subscribe to his blog.</p>
<p><strong>Edit</strong>, here are the top 300 twitter-ers at pubcon 2010.  I&#8217;m going to try and run some numbers on an even larger piece of the pubcon network.  Make sure you use your browser&#8217;s zoom feature to view the image in more detail.</p>
<div class="wp-caption alignnone" style="width: 605px"><a href="http://www.neilkodner.com/pubcontop300.svg"><img class=" " title="top 300 pubcon twitter-ers" src="http://www.neilkodner.com/pubcontop300.svg" alt="" width="595" height="705" /></a><p class="wp-caption-text">The connected-ness of the top 300 pubcon twitter-ers</p></div>
]]></content:encoded>
			<wfw:commentRss>http://www.neilkodner.com/2010/11/visualization-of-top-150-tweeters-at-pubcon/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

