Skip to content

Chuck Norris doesn’t screen-scrape, the data runs scared to his hard drive.

Inspired by a tweet from Roger Ehrenberg and my 11-year-old son who’s crazy about Chuck Norris facts, I screen-scraped the contents of http://www.chucknorrisfacts.com. Code and data can be foundĀ here.

Using Python and BeautifulSoup, it simply loops through all of the pages on http://www.chucknorrisfacts.com and reads the items displayed on the page.


#!/usr/bin/python
import urllib2, time
from BeautifulSoup import BeautifulSoup, BeautifulStoneSoup

# 674 pages last time I checked. Oddly enough, their pages seem zero-based. Additionally, if you
# substitute an arbitrary number, outside of the range of pages, you'll get data back instead
# of 404. I'm not sure why they're doing this.
for page_num in range(0,674):
	url = 'http://www.chucknorrisfacts.com/all-chuck-norris-facts?page=%d' % page_num
	html = urllib2.urlopen(url)
	soup = BeautifulSoup(html)

	entries = soup.findAll("li","views-row")
	for entry in entries:

		# use BeautifulStoneSoup to remove any HTML-escaped text that BS returns.
		the_quote = BeautifulStoneSoup(entry.div.text,
		                   convertEntities=BeautifulStoneSoup.HTML_ENTITIES).contents[0]

		# print it to stdout. I just redirect the program's output to a file.
		print the_quote.encode('utf-8')
	# be a good citizen and wait a few seconds before visiting the next page
	time.sleep(6)

Output looks like

if the mountain won't come to Muhammad, Chuck Norris will bring it.
if you watch the ring you die in 7 days,if you look at Ckuck Norris you die instantly
in a real zombie apocalypse, Chuck Norris can roundhouse-kick 53,596 zombies dead.
in space no-one can hear you scream.....except chuck norris!
iphone 4? chuck norris has iphone 8
most kids pee their name into snow... Chuck Norris pisses his in concreate...
never say you can'thurt a fly to chuck norris because he will hurt you
new never-before-seen behind-the-scenes shots from Walker Texas Range shows Chuck Norris carrying his truck home after it broke down
no one has ever found where the smurfs live thats  becuase they live  in chuck norrises beard
...

Visit my github for the full dataset (5500 entries).

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*