Two weeks have passed since Apple’s Co-Founder/CEO Steve Jobs passed away. Upon his passing, Apple encouraged people to share their memories, thoughts, and feelings by emailing rememberingsteve@apple.com. Earlier this week, Apple posted a site (http://www.apple.com/stevejobs) in tribute to Steve Jobs. According to the site, over a million people have submitted messages. The site cycles through the submitted messages.
I decided to take a closer look at what people are saying about Steve Jobs, as a whole. Looking at how the site updates, it appears to use Ajax to retrieve and display new messages. Using Chrome’s developer tools, I monitored the requests it was making to get the new messages.

Once I found the location of the individual messages, it was trivial to download all of them. The message endpoint URLs are in the format
http://www.apple.com/stevejobs/messages/3679.json?28106802
and a sample message looks like
{
mainText: "This is equivalent to my mom's generation of Elvis dying for me. I am very
sadden and emotionally moved at the moment. He was more influential on my
life than my parents and friends. While my parents loved me and friends
shared fun times. Steve influenced me, motivated me to become the innovated,
creative technologist I have become. I got into computer technology in 1980
and moved to Silicon Valley because of him. I have been one of his biggest
admirers and looked to him as a mentor to push the boundaries of my own
creative abilities to develop technology solutions which I hope made a
difference and impact to the industries I worked in. We've lost a
significant influence and icon in technology. We won't see another person of
his innovation and foresight within my life time. He was the Edison of
technology. He was and is one of my biggest inspirations.
I feel I have lost a close family member"
header: "What Steve Jobs meant to me"
author: "Skip"
location: ""
}
The site makes a request to http://www.apple.com/stevejobs/messages/main.json which returns
{
totalMessages: "10975"
timestamp: "28106802"
}
So it appears that it cycles through 10975 messages. I didn’t decompose the javascript powering the site to determine this, I just made an assumption. I tried querying values greater than 10975 and they returned 404. I wrote a quick python program to download the messages:
#!/usr/bin/python
import urllib2
import simplejson as json
import time
import codecs
# a page on apple's site shows the # of messages available
# start with 0 and retrieve up to message_range messages
metadata = json.loads(urllib2.urlopen('http://www.apple.com/stevejobs/messages/main.json').read())
message_range = metadata['totalMessages']
# the url for each message. i learned of this url by inspecting
# the network calls to http://www.apple.com/stevejobs
# using chrome's developer tools
url="http://www.apple.com/stevejobs/messages/%d.json"
# create our destination file
# i'm using codecs because it does a better job at handling international characters
output_file = 'stevejobs_tribute.txt'
file_handle = codecs.open(output_file,'w','utf-8')
# helper function to remove tabs and linefeeds
def clean(txt):
return txt.replace('\n','').replace('\t','')
# iterate from 0 to the max # of messages and download the message text
# for these purposes, I'm ignoring the other fields as they weren't always present
for i in range(0, message_range):
req = url % i
data = urllib2.urlopen(req).read()
data = json.loads(data)
file_handle.write(clean(data['mainText']) + '\n')
file_handle.close()
So now, we have over ten thousand tribute messages saved to the file stevejobs_tribute.txt. What I was most interested in seeing how many of these messages contain a reference to a certain Apple product.
I came up with a few search terms based on some legendary Apple product names including
- Newton
- Macintosh
- MacBook
- iBook
- Mac
- iPhone
- iPod
- iMac
- iPad
- Apple II family
- OSX
- iMovie
- Apple TV
- iTunes
- LaserWriter (yes, Laserwriter)
products = {'iPhone':{'regex':'iphones?','count':0},
'iMac':{'regex':'imacs?','count':0},
'iPad':{'regex':'ipads?','count':0},
'iTunes':{'regex':'itunes','count':0},
'iPod':{'regex':'ipods?','count':0},
'cube':{'regex':'cubes?','count':0},
'MacBook':{'regex':'macbooks?','count':0},
'iBook':{'regex':'ibooks?','count':0},
'Apple TV':{'regex':'apple ?tvs?','count':0},
'Apple II Family':{'regex':r'(apple )?(2|ii|\]\[|\/\/)([ce\+|]|gs|s)?[^0-9]', 'count':0},
'LaserWriter':{'regex':'laserwriter?','count':0},
'PowerBook':{'regex':'powerbook?','count':0},
'Newton':{'regex':'newton?','count':0},
'OSX':{'regex':'osx','count':0},
'iMovie':{'regex':'imovie','count':0},
'Macintosh':{'regex':'macintosh','count':0},
'Lisa':{'regex':'lisa','count':0},
'Mac':{'regex':'mac','count':0},
}
Here’s a screenshot of me testing the Apple II regular expression, using the excellent Regexr.
Overall, out of 10975 messages downloaded(as of now), 2,186, or just under 20% mentioned an apple product by name. Here’s the breakdown of the products mentioned:
LaserWriter 1 iMovie 3 OSX 9 iBook 22 PowerBook 22 Lisa 24 Apple TV 31 Newton 33 iTunes 52 Macintosh 163 iMac 235 MacBook 366 Apple II Family 481 iPad 574 iPod 575 iPhone 875 Mac 1315
More than one out of every ten messages included a reference to a Mac! Nearly one in ten mentioned an iPhone – not bad for a device that’s been out a fraction of the time the Mac has been available.I’m pleased to see so many references to the Apple II including several mentions of the//c, which was my first Apple product.
It’s also interesting to note that out of 33 mentions of Newton, only a handful of those were about the actual Apple product – most were comparing Steve Jobs to Newton himself. Check out my earlier post on NLTK concordance for details on how I did this:
import nltk
import string
f = open('stevejobs_tribute.txt').read()
f = f.translate(string.maketrans("",""), string.punctuation)
foo=nltk.Text(f.split())
print foo.concordance('newton')
result:
op If history misses men like Isaac Newton Graham Bell Galileu Thomas Edison a mbered though his legacy Now he met Newton Einstein and other geniuses like hi oday I was one of the few who had a Newton Today I have an iPhone 4 an iPad2 a oduct that came thereafter from the Newton to the Cube to the iPhone 4S God Bl with the likes of Edison Garcia and Newton for his impact and vision I wish hi ntioned in the same breath as Isaac Newton Thomas Edison and Bill Gates The le off a tree we are thinking of Adam Newton and Steve Jobs He open new dimensio Jobs will be missed Da Vinci Mozart Newton Franklin Jobs Nobody is out of plac ged my life starting with the Apple Newton followed by the iPod and then the i sorely missed nbsp Da Vinci Mozart Newton Franklin Jobs Nobody is out of plac ve dared to Einstein Freud Da Vinci Newton Galileo Darwin among others is prou embered beside Einstein Pasteur and Newton The world is moving toward his crea irst Apple Mac I remember the first Newton I willnbspremembernbspSteves creati e to contact us againnbsp How Isaac Newton and Albert Einstein contributed gre world One seduced Eve One awakened Newton and One was in the hands of Steve J the way you have influenced mine If Newton discovered something as remarkable rld One seduced Eve second awakened Newton the third one was in the hands of S lent to Leonardo Da Vinci Sir Issac Newton Albert Einstein and the like He was t of the caliber of that of DaVinci Newton Pythagorous etc The list can go on hen people say names like ie Edison Newton and Einstein I guarantee that the n Computers” The Apple II Lisa Mac Newton iPod iTunes store iPod Touch iPhone ember Steve Jobs the way I remember Newton or Einstein I lived with Apple prod set consultant who bought his first Newton MacBook 170 and all the dozens of o br 3 Apples change the world Adán Newton Steve Jobs 19552011 Rest in Peace t back to the Apple IIGS I also had a Newton Steve Jobs death hurts me personall ed the world apple to adam apple to newton and apple to steve jobs Steve was a dam and Eva Second one that wake up newton third one that Steve Jobs create St
Also interesting where the number of mentions to other historical figures in the Steve Job remembrance messages. According to the submitters, Steve Jobs is clearly in some elite company. I don’t know if I’d go so far as to group him with the man who brought automobiles and light bulbs to the masses but hey, we all have our priorities. All counts were determined through a simple grep command piped to wc -l.Here are a few examples:
- Einstein – 70
- Ford – 189
- Edison – 110
- DaVinci – 15
- Bill Gates – 8
Finally, I wanted to see what how people were speaking about Steve Jobs and especially what terms were being used to describe him. There was no point in performing sentiment analysis on this text as all of the texts were not only obviously positive but were also vetted by Apple for content. Using NLTK, I performed part-of-speech tagging on every word in each tribute message and then wrote some code to total the adjectives and adverbs used in the tribute messages.
The most commonly-used adjectives are
('great', 1961)
('steve', 1808)
('many', 1459)
('first', 917)
('sad', 862)
('better', 857)
('such', 727)
('best', 721)
('visionary', 645)
('new', 579)
('more', 556)
('true', 538)
('most', 476)
('creative', 471)
('apple', 435)
('other', 427)
('same', 415)
('good', 412)
('greatest', 376)
('wonderful', 373)
('sorry', 362)
('old', 325)
('brilliant', 283)
('able', 281)
('incredible', 267)
('big', 260)
Humorously, NLTK frequently considered “Steve” to be an adjective. This is likely because it is always followed by the proper noun “Jobs.” A tweet from NLTK expert Jacob Perkins reminded me that machines are dumb and proper nouns should be capitalized. In order to aggregate the counts, I normalized the text by converting to lowercase – I wasn’t interested in nouns, only adjectives so proper nouns didn’t matter to me.
The top adverbs, according to NLTK, were not as interesting, at least to me.
('so', 2220)
('never', 2111)
('not', 1897)
('always', 1798)
('just', 1402)
('now', 1028)
('truly', 989)
('only', 945)
('very', 919)
('much', 908)
('ever', 751)
('even', 743)
('really', 567)
('forever', 508)
('more', 486)
('still', 447)
('well', 398)
('most', 375)
('personally', 352)
And finally, I ran tri-gram analysis, again using NLTK.
trigrams = defaultdict(int) nltk_trigrams = nltk.trigrams(text) for itm in nltk_trigrams: trigrams[itm] += 1
As one would expect, the leading trigram was ‘rest in peace‘ with 1838 mentions, 16.7% of all mentions. ‘thank you for‘ was found in 1446 messages, ‘will be missed‘ was found in 827 messages. Other interesting trigrams are ‘thank you steve‘ with 791 mentions and ‘changed the world‘ with 551 mentions.
The full python code and resulting data can be found on github.
#!/usr/bin/python
#nltk.help.upenn_tagset('RB')
from collections import defaultdict
from operator import itemgetter
import re
import urllib2
import string
import simplejson as json
import codecs
import nltk
OUTPUT_FILE = 'data/stevejobs_tribute.txt'
adverbs = defaultdict(int)
adjectives = defaultdict(int)
trigrams = defaultdict(int)
message_has_adjective = False
message_has_adverb = False
message_contains_product_mention = False
messages_with_adjective = 0
messages_with_adverb = 0
messages = 0
messages_with_product_mention = 0
exclude = set(string.punctuation)
products = {'iPhone':{'regex':'iphones?','count':0},
'iMac':{'regex':'imacs?','count':0},
'iPad':{'regex':'ipads?','count':0},
'iTunes':{'regex':'itunes','count':0},
'iPod':{'regex':'ipods?','count':0},
'cube':{'regex':'cubes?','count':0},
'MacBook':{'regex':'macbooks?','count':0},
'iBook':{'regex':'ibooks?','count':0},
'Apple TV':{'regex':'apple ?tvs?','count':0},
'Apple II Family':{'regex':r'(apple )?(2|ii|\]\[|\/\/)([ce\+|]|gs|s)?[^0-9]', 'count':0},
'LaserWriter':{'regex':'laserwriter?','count':0},
'PowerBook':{'regex':'powerbook?','count':0},
'Newton':{'regex':'newton?','count':0},
'OSX':{'regex':'osx','count':0},
'iMovie':{'regex':'imovie','count':0},
'Macintosh':{'regex':'macintosh','count':0},
'Lisa':{'regex':'lisa','count':0},
'Mac':{'regex':'mac','count':0},
}
def top_n(dct,n = 10):
srtd=sorted(dct.iteritems(), key=itemgetter(1), reverse=True)
for x in srtd[0:n+1]:
print x
def nltk_concordance(term,text_file):
f = open(text_file).read()
# remove punctuation
f = f.translate(string.maketrans("",""), string.punctuation)
split_text=nltk.Text(f.split())
split_text.concordance(term,lines=100)
# >>> f = f.translate(string.maketrans("",""), string.punctuation)
# >>> foo=nltk.Text(f.split())
# >>> print foo.concordance('newton')
def unescape(s):
"""unescapes html codes"""
s = s.replace("<", " s = s.replace(" ", " ")
# this has to be last:
s = s.replace("&", "&")
return s
for line in open(OUTPUT_FILE):
message_has_adjective = False
message_has_adverb = False
message_contains_product_mention = False
# remove the trailing linefeed and convert to lower-case
# and remove html control characters
messages += 1
data = line.strip()
data = data.lower()
data = unescape(data)
# check for product mentions
for k,v in products.iteritems():
if re.search(v['regex'],data):
products[k]['count'] += 1
message_contains_product_mention = True
# if the message contains a product mention
# increment the product mention counter
if message_contains_product_mention:
messages_with_product_mention += 1
# tokenize the sentences using nltk's wordpuncttokenizer
text = nltk.WordPunctTokenizer().tokenize(data)
# compute trigrams
nltk_trigrams = nltk.trigrams(text)
for itm in nltk_trigrams:
trigrams[itm] += 1
# pos-tag each token. we're interested in adjectives and adverbs
parts_of_speech = nltk.pos_tag(text)
# test for adjectives and adverbs, increment the counters
# when we find one.
for (word,pos) in parts_of_speech:
if pos.startswith('JJ'):
message_has_adjective = True
adjectives[word] += 1
if pos.startswith('RB'):
message_has_adverb = True
adverbs[word] += 1
# if the message contains an adverb or an adjective, increment a counter
if message_has_adjective:
messages_with_adjective += 1
if message_has_adverb:
messages_with_adverb += 1
# output the 25 most frequently-used adjectives and adverbs
n = 25
print "top %s adverbs" % n
top_n(adverbs, n)
print
print "top %s adjectives" % n
top_n(adjectives, n)
print "messages with adjectives: %s" % messages_with_adjective
print "messages with adverbs: %s" % messages_with_adverb
print "total messages with product mentions: %s" % messages_with_product_mention
print "total messages: %s" % messages
# output the top 50 most-common trigrams
n = 50
print "top %s trigrams" % n
top_n(trigrams, n)
srtd=sorted(products.iteritems(),key=itemgetter(1))
for x,y in srtd:
print "%s\t\t%s" % (x,y['count'])
print
print
# concordance for newton
print "concordance for newton:"
nltk_concordance('newton',OUTPUT_FILE)


23 Comments
Awesome stuff! Love it!
You know you can use NLTK to save you some coding work. I like to use Conditional Frequency Distributions to find most frequent postags.
Here’s a fun method to play with:
def findtags(tag_prefix, tagged_text):
cfd = nltk.ConditionalFreqDist((tag, word) for (word, tag) in tagged_text
if tag.startswith(tag_prefix))
return dict((tag, cfd[tag].keys()[:5]) for tag in cfd.conditions())
tagged text here is raw text that’s been sentence and word tokenized and then pos tagged.
Awesome post, so fun to read.
Very cool!
Awesome! Liked the query tweaking part especially. Inspired me to try something similar myself.
Impressive hacking, but IMHO it does seem somewhat disrespectful to harvest data from Steve Jobs online memorial.
This is pretty awesome. It is definitely great to see the analysis of the tributes to Steve, and it was good to see the method of analysis side by side with the code that did it. It makes me interested in python.
Thanks!
Awesome work. Can you also grep for “changed”
Great post and work!
I’ve put most of the data into a sheet:
https://docs.google.com/spreadsheet/pub?hl=en_US&key=0Ao7Goz9waxjbdEI3ZHd4bXRiS1pFSW1oZXJ4SXJfN1E&hl=en_US&gid=0
Can you update your mac regex to “^mac\s” and re-run. I bet the iphone will win.
I love this! Way to use your coding brilliance for good
Very cool analysis! I took the liberty to run an apriori algorithm on the tribute messages, this is the result: http://www.guidovo.com/an-apriori-algorithm-analysis-of-steve-jobs-tribute-messages/
It’s worth pointing out that the “Chrome” developer tools you used to do this were developed by Apple as part of webkit before Google incorporated them in to Chrome.
One more thing that Jobs was partially responsible for
Just wanted to say this dude is brilliant and sick at data analysis, especially social data. He’s worth talking to if you meet him at a local SoFL meetup.
Very interesting. Could you please also do an analysis on countries? I noticed (in stevejobs_tribute.txt) that tributes came from all over the world. I searched “China”, “Japan”, “France”, “UK”, “Russia”, etc. and found many results. Some tributes were even written in Chinese, either entirely or partially. I think it would be an analysis that’s worth doing as well… Also try searching “Chinese”, “Japanese”, etc. as well as city names.
I’m also wondering how the messages were selected; because I also sent in a tribute message (on 10/6 at 3 AM), but it wasn’t among the 10975 messages. I’m sure way more than 10975 messages were received. Are these the first 10975 messages they received? What does the timestamp: “28106802″ mean?
Other keywords that I find interesting to search for are:
- 3GS, 4S and other models (to see which exact product/model is the most popular)
- commencement, speech, standford
- keynote, presentation
Thank you.
You have two “Bookmarklets” folders
.
Are the product names types or tokens (per document)?
Most of these analysis you might have gotten without any coding from corpus analysis tools. Bet yeah, coding is fun, I know.
Spotted an error in your regex list. The reason both the search terms ‘Lisa’ and ‘Macintosh’ resulted in 163 mentions is because of this code:
‘Macintosh’:{‘regex’:'macintosh’,'count’:0},
‘Lisa’:{‘regex’:'macintosh’,'count’:0},
Both their regexes are ‘macintosh’. I guess this might have been due to copy pasting to avoid retyping the whole line over and over. Anyway, you may want to that part again
Otherwise, this made for a great sunday morning read
Just for kicks, a quick word cloud based off the text of the tribute messages: http://www.wordle.net/show/wrdl/4286603/Steve_Jobs%3A_Tribute_Messages
Thanks for pointing this out. I’ve updated the post, the counts, and most importantly, the github repo
There should be no mention of Newton at all, because the only involvement Steve Jobs had to do with it, was to axe the project when he got back to Apple.
For the non-programmers, are you willing to post a text file, or Google doc, with the raw data? It would be great to get the 1,000,000 messages but 10,000+ is a good start.
I did! I mentioned the file stevejobs_tribute.txt in the post. It’s in the github repository located here.
As a researcher this was a great lesson in data mining. Thanks!
29 Trackbacks/Pingbacks
[...] all the messages to his system as a big ol’ text file. He then wrote up various scripts to analyze the appearance of key expressions within these messages just to see what the general “tribute zeitgeist” was shaping out [...]
[...] all the messages to his system as a big ol’ text file. He then wrote up various scripts to analyze the appearance of key expressions within these messages just to see what the general “tribute zeitgeist” was shaping out [...]
[...] all the messages to his system as a big ol’ text file. He then wrote up various scripts to analyze the appearance of key expressions within these messages just to see what the general “tribute zeitgeist” was shaping out [...]
[...] Kodner:An analysis of Steve Jobs tribute messages displayed by Apple — Two weeks have passed since Apple’s Co-Founder/CEO Steve Jobs passed away. Upon his [...]
[...] A tag cloud created by Infectious Greed‘s Paul Kedrosky from the most-frequently used words on Apple’s (AAPL) Steve Jobs tribute page. Data extracted from more than a million messages by Neil Kodner. [...]
[...] What a great idea: A tag cloud created by Infectious Greed‘s Paul Kedrosky from the most-frequently used words on Apple’s (AAPL) Steve Jobs tribute page. Data extracted from more than a million messages by Neil Kodner. [...]
[...] all the messages to his system as a big ol’ text file. He then wrote up various scripts to analyze the appearance of key expressions within these messages just to see what the general “tribute zeitgeist” was shaping out [...]
[...] http://www.neilkodner.com/2011/10/an-analysis-of-steve-jobs-tribute-messages-displayed-by-apple/ [...]
[...] erwähnenswert ist auch diese Analyse der auf http://www.apple.com/stevejobs/ veröffentlichten Kondolenz-eMails von Neil Kodner. Kodner hat [...]
[...] The emails cycle through, showing off appreciation for the Apple co-founder.If you want to see a breakdown of the Steve Jobs tribute emails, you should check out this analysis by Neil Kodner.Kodner analyzed all of the messages, and found [...]
[...] tribute page scrolls through 10,975 messages, according to an analysis published recently by Neil Kodner, a data engineer and developer. Kodner is an employee at CBS, the [...]
[...] no more. Instead, check out the following frequency analysis of Jobs tributes, courtesy of Neil Kodner, a CBS data engineer and developer. CBS publishes [...]
[...] no more. Instead, check out the following frequency analysis of Jobs tributes, courtesy of Neil Kodner, a CBS data engineer and developer. CBS publishes [...]
[...] no more. Instead, check out the following frequency analysis of Jobs tributes, courtesy of Neil Kodner, a CBS data engineer and developer. CBS publishes [...]
[...] no more. Instead, check out the following frequency analysis of Jobs tributes, courtesy of Neil Kodner, a CBS data engineer and developer. CBS publishes [...]
[...] no more. Instead, check out the following frequency analysis of Jobs tributes, courtesy of Neil Kodner, a CBS data engineer and developer. CBS publishes [...]
[...] no more. Instead, check out the following frequency analysis of Jobs tributes, courtesy of Neil Kodner, a CBS data engineer and developer. CBS publishes [...]
[...] page dedicated to Steve Jobs that displays messages from friends, colleagues, and fans. Neil Kodner downloaded those messages and extracted overall themes: I wanted to see what how people were speaking about Steve Jobs and especially what terms were [...]
[...] no more. Instead, check out the following frequency analysis of Jobs tributes, courtesy of Neil Kodner, a CBS data engineer and developer. CBS publishes [...]
[...] find if you analyze the text of a cross section of the tributes Apple was posting at their site? Thanks to the efforts of Neil Kodner, we have an [...]
[...] 1, 2] Steve Jobs 1955-2011 Tweet var adsense_client="ca-pub-1808872088436564";var [...]
[...] super Apple nerd Neil Kodner has analyzed the submissions, like only a brilliant internet nerd [...]
[...] specific Apple devices in their tributes. So blogger and programmer Neil Kodner decided to do a computer analysis on which words were mentioned how frequently. Included in his analysis was the number of times each [...]
[...] specific Apple devices in their tributes. So blogger and programmer Neil Kodner decided to do a computer analysis on which words were mentioned how frequently. Included in his analysis was the number of times each [...]
[...] An analysis of Steve Jobs tribute messages displayed by Apple (read if you care about Steve Jobs, data or sentiment analysis) | neilkodner.com [...]
[...] Neil Kodner.com [...]
[...] n’en fallait pas plus à un blogueur, Neil Kodner, pour trouver la liste de tous les commentaires et programmer un petit programme en Python afin [...]
[...] website. Neilkodner.com pulled together a very interesting aggregation of those messages to see if any patterns emerged. Of the more than 10,000 tributes posted, nearly 20 percent referenced an Apple product. The [...]
[...] An analysis of Steve Jobs tribute messages displayed by Apple (read if you care about Steve Jobs, data or sentiment analysis) | neilkodner.com [...]
Post a Comment