Two weeks have passed since Apple’s Co-Founder/CEO Steve Jobs passed away. Upon his passing, Apple encouraged people to share their memories, thoughts, and feelings by emailing rememberingsteve@apple.com. Earlier this week, Apple posted a site (http://www.apple.com/stevejobs) in tribute to Steve Jobs. According to the site, over a million people have submitted messages. The site cycles through the submitted messages.
I decided to take a closer look at what people are saying about Steve Jobs, as a whole. Looking at how the site updates, it appears to use Ajax to retrieve and display new messages. Using Chrome’s developer tools, I monitored the requests it was making to get the new messages.

Once I found the location of the individual messages, it was trivial to download all of them. The message endpoint URLs are in the format
http://www.apple.com/stevejobs/messages/3679.json?28106802
and a sample message looks like
{
mainText: "This is equivalent to my mom's generation of Elvis dying for me. I am very
sadden and emotionally moved at the moment. He was more influential on my
life than my parents and friends. While my parents loved me and friends
shared fun times. Steve influenced me, motivated me to become the innovated,
creative technologist I have become. I got into computer technology in 1980
and moved to Silicon Valley because of him. I have been one of his biggest
admirers and looked to him as a mentor to push the boundaries of my own
creative abilities to develop technology solutions which I hope made a
difference and impact to the industries I worked in. We've lost a
significant influence and icon in technology. We won't see another person of
his innovation and foresight within my life time. He was the Edison of
technology. He was and is one of my biggest inspirations.
I feel I have lost a close family member"
header: "What Steve Jobs meant to me"
author: "Skip"
location: ""
}
The site makes a request to http://www.apple.com/stevejobs/messages/main.json which returns
{
totalMessages: "10975"
timestamp: "28106802"
}
So it appears that it cycles through 10975 messages. I didn’t decompose the javascript powering the site to determine this, I just made an assumption. I tried querying values greater than 10975 and they returned 404. I wrote a quick python program to download the messages:
#!/usr/bin/python
import urllib2
import simplejson as json
import time
import codecs
# a page on apple's site shows the # of messages available
# start with 0 and retrieve up to message_range messages
metadata = json.loads(urllib2.urlopen('http://www.apple.com/stevejobs/messages/main.json').read())
message_range = metadata['totalMessages']
# the url for each message. i learned of this url by inspecting
# the network calls to http://www.apple.com/stevejobs
# using chrome's developer tools
url="http://www.apple.com/stevejobs/messages/%d.json"
# create our destination file
# i'm using codecs because it does a better job at handling international characters
output_file = 'stevejobs_tribute.txt'
file_handle = codecs.open(output_file,'w','utf-8')
# helper function to remove tabs and linefeeds
def clean(txt):
return txt.replace('\n','').replace('\t','')
# iterate from 0 to the max # of messages and download the message text
# for these purposes, I'm ignoring the other fields as they weren't always present
for i in range(0, message_range):
req = url % i
data = urllib2.urlopen(req).read()
data = json.loads(data)
file_handle.write(clean(data['mainText']) + '\n')
file_handle.close()
So now, we have over ten thousand tribute messages saved to the file stevejobs_tribute.txt. What I was most interested in seeing how many of these messages contain a reference to a certain Apple product.
I came up with a few search terms based on some legendary Apple product names including
- Newton
- Macintosh
- MacBook
- iBook
- Mac
- iPhone
- iPod
- iMac
- iPad
- Apple II family
- OSX
- iMovie
- Apple TV
- iTunes
- LaserWriter (yes, Laserwriter)
products = {'iPhone':{'regex':'iphones?','count':0},
'iMac':{'regex':'imacs?','count':0},
'iPad':{'regex':'ipads?','count':0},
'iTunes':{'regex':'itunes','count':0},
'iPod':{'regex':'ipods?','count':0},
'cube':{'regex':'cubes?','count':0},
'MacBook':{'regex':'macbooks?','count':0},
'iBook':{'regex':'ibooks?','count':0},
'Apple TV':{'regex':'apple ?tvs?','count':0},
'Apple II Family':{'regex':r'(apple )?(2|ii|\]\[|\/\/)([ce\+|]|gs|s)?[^0-9]', 'count':0},
'LaserWriter':{'regex':'laserwriter?','count':0},
'PowerBook':{'regex':'powerbook?','count':0},
'Newton':{'regex':'newton?','count':0},
'OSX':{'regex':'osx','count':0},
'iMovie':{'regex':'imovie','count':0},
'Macintosh':{'regex':'macintosh','count':0},
'Lisa':{'regex':'lisa','count':0},
'Mac':{'regex':'mac','count':0},
}
Here’s a screenshot of me testing the Apple II regular expression, using the excellent Regexr.
Overall, out of 10975 messages downloaded(as of now), 2,186, or just under 20% mentioned an apple product by name. Here’s the breakdown of the products mentioned:
LaserWriter 1 iMovie 3 OSX 9 iBook 22 PowerBook 22 Lisa 24 Apple TV 31 Newton 33 iTunes 52 Macintosh 163 iMac 235 MacBook 366 Apple II Family 481 iPad 574 iPod 575 iPhone 875 Mac 1315
More than one out of every ten messages included a reference to a Mac! Nearly one in ten mentioned an iPhone – not bad for a device that’s been out a fraction of the time the Mac has been available.I’m pleased to see so many references to the Apple II including several mentions of the//c, which was my first Apple product.
It’s also interesting to note that out of 33 mentions of Newton, only a handful of those were about the actual Apple product – most were comparing Steve Jobs to Newton himself. Check out my earlier post on NLTK concordance for details on how I did this:
import nltk
import string
f = open('stevejobs_tribute.txt').read()
f = f.translate(string.maketrans("",""), string.punctuation)
foo=nltk.Text(f.split())
print foo.concordance('newton')
result:
op If history misses men like Isaac Newton Graham Bell Galileu Thomas Edison a mbered though his legacy Now he met Newton Einstein and other geniuses like hi oday I was one of the few who had a Newton Today I have an iPhone 4 an iPad2 a oduct that came thereafter from the Newton to the Cube to the iPhone 4S God Bl with the likes of Edison Garcia and Newton for his impact and vision I wish hi ntioned in the same breath as Isaac Newton Thomas Edison and Bill Gates The le off a tree we are thinking of Adam Newton and Steve Jobs He open new dimensio Jobs will be missed Da Vinci Mozart Newton Franklin Jobs Nobody is out of plac ged my life starting with the Apple Newton followed by the iPod and then the i sorely missed nbsp Da Vinci Mozart Newton Franklin Jobs Nobody is out of plac ve dared to Einstein Freud Da Vinci Newton Galileo Darwin among others is prou embered beside Einstein Pasteur and Newton The world is moving toward his crea irst Apple Mac I remember the first Newton I willnbspremembernbspSteves creati e to contact us againnbsp How Isaac Newton and Albert Einstein contributed gre world One seduced Eve One awakened Newton and One was in the hands of Steve J the way you have influenced mine If Newton discovered something as remarkable rld One seduced Eve second awakened Newton the third one was in the hands of S lent to Leonardo Da Vinci Sir Issac Newton Albert Einstein and the like He was t of the caliber of that of DaVinci Newton Pythagorous etc The list can go on hen people say names like ie Edison Newton and Einstein I guarantee that the n Computers” The Apple II Lisa Mac Newton iPod iTunes store iPod Touch iPhone ember Steve Jobs the way I remember Newton or Einstein I lived with Apple prod set consultant who bought his first Newton MacBook 170 and all the dozens of o br 3 Apples change the world Adán Newton Steve Jobs 19552011 Rest in Peace t back to the Apple IIGS I also had a Newton Steve Jobs death hurts me personall ed the world apple to adam apple to newton and apple to steve jobs Steve was a dam and Eva Second one that wake up newton third one that Steve Jobs create St
Also interesting where the number of mentions to other historical figures in the Steve Job remembrance messages. According to the submitters, Steve Jobs is clearly in some elite company. I don’t know if I’d go so far as to group him with the man who brought automobiles and light bulbs to the masses but hey, we all have our priorities. All counts were determined through a simple grep command piped to wc -l.Here are a few examples:
- Einstein – 70
- Ford – 189
- Edison – 110
- DaVinci – 15
- Bill Gates – 8
Finally, I wanted to see what how people were speaking about Steve Jobs and especially what terms were being used to describe him. There was no point in performing sentiment analysis on this text as all of the texts were not only obviously positive but were also vetted by Apple for content. Using NLTK, I performed part-of-speech tagging on every word in each tribute message and then wrote some code to total the adjectives and adverbs used in the tribute messages.
The most commonly-used adjectives are
('great', 1961)
('steve', 1808)
('many', 1459)
('first', 917)
('sad', 862)
('better', 857)
('such', 727)
('best', 721)
('visionary', 645)
('new', 579)
('more', 556)
('true', 538)
('most', 476)
('creative', 471)
('apple', 435)
('other', 427)
('same', 415)
('good', 412)
('greatest', 376)
('wonderful', 373)
('sorry', 362)
('old', 325)
('brilliant', 283)
('able', 281)
('incredible', 267)
('big', 260)
Humorously, NLTK frequently considered “Steve” to be an adjective. This is likely because it is always followed by the proper noun “Jobs.” A tweet from NLTK expert Jacob Perkins reminded me that machines are dumb and proper nouns should be capitalized. In order to aggregate the counts, I normalized the text by converting to lowercase – I wasn’t interested in nouns, only adjectives so proper nouns didn’t matter to me.
The top adverbs, according to NLTK, were not as interesting, at least to me.
('so', 2220)
('never', 2111)
('not', 1897)
('always', 1798)
('just', 1402)
('now', 1028)
('truly', 989)
('only', 945)
('very', 919)
('much', 908)
('ever', 751)
('even', 743)
('really', 567)
('forever', 508)
('more', 486)
('still', 447)
('well', 398)
('most', 375)
('personally', 352)
And finally, I ran tri-gram analysis, again using NLTK.
trigrams = defaultdict(int) nltk_trigrams = nltk.trigrams(text) for itm in nltk_trigrams: trigrams[itm] += 1
As one would expect, the leading trigram was ‘rest in peace‘ with 1838 mentions, 16.7% of all mentions. ‘thank you for‘ was found in 1446 messages, ‘will be missed‘ was found in 827 messages. Other interesting trigrams are ‘thank you steve‘ with 791 mentions and ‘changed the world‘ with 551 mentions.
The full python code and resulting data can be found on github.
#!/usr/bin/python
#nltk.help.upenn_tagset('RB')
from collections import defaultdict
from operator import itemgetter
import re
import urllib2
import string
import simplejson as json
import codecs
import nltk
OUTPUT_FILE = 'data/stevejobs_tribute.txt'
adverbs = defaultdict(int)
adjectives = defaultdict(int)
trigrams = defaultdict(int)
message_has_adjective = False
message_has_adverb = False
message_contains_product_mention = False
messages_with_adjective = 0
messages_with_adverb = 0
messages = 0
messages_with_product_mention = 0
exclude = set(string.punctuation)
products = {'iPhone':{'regex':'iphones?','count':0},
'iMac':{'regex':'imacs?','count':0},
'iPad':{'regex':'ipads?','count':0},
'iTunes':{'regex':'itunes','count':0},
'iPod':{'regex':'ipods?','count':0},
'cube':{'regex':'cubes?','count':0},
'MacBook':{'regex':'macbooks?','count':0},
'iBook':{'regex':'ibooks?','count':0},
'Apple TV':{'regex':'apple ?tvs?','count':0},
'Apple II Family':{'regex':r'(apple )?(2|ii|\]\[|\/\/)([ce\+|]|gs|s)?[^0-9]', 'count':0},
'LaserWriter':{'regex':'laserwriter?','count':0},
'PowerBook':{'regex':'powerbook?','count':0},
'Newton':{'regex':'newton?','count':0},
'OSX':{'regex':'osx','count':0},
'iMovie':{'regex':'imovie','count':0},
'Macintosh':{'regex':'macintosh','count':0},
'Lisa':{'regex':'lisa','count':0},
'Mac':{'regex':'mac','count':0},
}
def top_n(dct,n = 10):
srtd=sorted(dct.iteritems(), key=itemgetter(1), reverse=True)
for x in srtd[0:n+1]:
print x
def nltk_concordance(term,text_file):
f = open(text_file).read()
# remove punctuation
f = f.translate(string.maketrans("",""), string.punctuation)
split_text=nltk.Text(f.split())
split_text.concordance(term,lines=100)
# >>> f = f.translate(string.maketrans("",""), string.punctuation)
# >>> foo=nltk.Text(f.split())
# >>> print foo.concordance('newton')
def unescape(s):
"""unescapes html codes"""
s = s.replace("<", " s = s.replace(" ", " ")
# this has to be last:
s = s.replace("&", "&")
return s
for line in open(OUTPUT_FILE):
message_has_adjective = False
message_has_adverb = False
message_contains_product_mention = False
# remove the trailing linefeed and convert to lower-case
# and remove html control characters
messages += 1
data = line.strip()
data = data.lower()
data = unescape(data)
# check for product mentions
for k,v in products.iteritems():
if re.search(v['regex'],data):
products[k]['count'] += 1
message_contains_product_mention = True
# if the message contains a product mention
# increment the product mention counter
if message_contains_product_mention:
messages_with_product_mention += 1
# tokenize the sentences using nltk's wordpuncttokenizer
text = nltk.WordPunctTokenizer().tokenize(data)
# compute trigrams
nltk_trigrams = nltk.trigrams(text)
for itm in nltk_trigrams:
trigrams[itm] += 1
# pos-tag each token. we're interested in adjectives and adverbs
parts_of_speech = nltk.pos_tag(text)
# test for adjectives and adverbs, increment the counters
# when we find one.
for (word,pos) in parts_of_speech:
if pos.startswith('JJ'):
message_has_adjective = True
adjectives[word] += 1
if pos.startswith('RB'):
message_has_adverb = True
adverbs[word] += 1
# if the message contains an adverb or an adjective, increment a counter
if message_has_adjective:
messages_with_adjective += 1
if message_has_adverb:
messages_with_adverb += 1
# output the 25 most frequently-used adjectives and adverbs
n = 25
print "top %s adverbs" % n
top_n(adverbs, n)
print
print "top %s adjectives" % n
top_n(adjectives, n)
print "messages with adjectives: %s" % messages_with_adjective
print "messages with adverbs: %s" % messages_with_adverb
print "total messages with product mentions: %s" % messages_with_product_mention
print "total messages: %s" % messages
# output the top 50 most-common trigrams
n = 50
print "top %s trigrams" % n
top_n(trigrams, n)
srtd=sorted(products.iteritems(),key=itemgetter(1))
for x,y in srtd:
print "%s\t\t%s" % (x,y['count'])
print
print
# concordance for newton
print "concordance for newton:"
nltk_concordance('newton',OUTPUT_FILE)












