<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>neilkodner.com &#187; census</title>
	<atom:link href="http://www.neilkodner.com/tag/census/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.neilkodner.com</link>
	<description>Data Driven.  Since 1971.</description>
	<lastBuildDate>Sun, 23 Oct 2011 16:40:04 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Visualization of Frequently Occurring First Names and Surnames From the 1990 Census</title>
		<link>http://www.neilkodner.com/2010/01/visualization-of-frequently-occurring-first-names-and-surnames-from-the-1990-census/</link>
		<comments>http://www.neilkodner.com/2010/01/visualization-of-frequently-occurring-first-names-and-surnames-from-the-1990-census/#comments</comments>
		<pubDate>Tue, 19 Jan 2010 13:48:28 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[census]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[visualization]]></category>
		<category><![CDATA[wordcloud]]></category>

		<guid isPermaLink="false">http://www.neilkodner.com/?p=147</guid>
		<description><![CDATA[A really quick visualization I did while researching data for another project.  Census.gov has a link to the most frequently occurring first names and surnames from the 1990 census.  Surely more current data must exists; I found this dataset by accident. The original data is tab-delimited in the format: Name Frequency in percent Cumulative Frequency [...]]]></description>
			<content:encoded><![CDATA[<p>A really quick visualization I did while researching data for another project.  Census.gov has a <a title="census.gov first names and surnames" href="http://www.census.gov/genealogy/names/">link</a> to the most frequently occurring first names and surnames from the 1990 census.  Surely more current data must exists; I found this dataset by accident.</p>
<p>The original data is tab-delimited in the format:</p>
<ul>
<li> Name</li>
<li>Frequency in percent</li>
<li>Cumulative Frequency in percent</li>
<li>Rank</li>
</ul>
<p>The data was already sorted by rank so it was easy to build lists of the top 500 names in each category(male first, female first,surname):</p>
<pre class="brush: bash; title: ; notranslate">head -500 dist.female.first | awk '{print $1&quot;:&quot;$2}'</pre>
<p>The data was then loaded into wordle for a quick visualization. Thumbnails are linked to full-size versions. Where I&#8217;m headed with this data is to build a corpus of first/last/surnames so that I can develop a spelling corrector, along the lines of <a title="Peter Norvig's spelling corrector" href="http://norvig.com/spell-correct.html">Peter Norvig&#8217;s sublime spelling corrector</a>.  Think Google&#8217;s Did You Mean&#8230; rather than a spel checker.  I&#8217;m plan on a proof-of-concept in Python, followed by an Oracle PL/SQL version.  Another fun project would be to calculate the probability of a given first name + surname.  I plan on spending some time searching for more current data.</p>
<div id="attachment_154" class="wp-caption alignnone" style="width: 310px"><a href="http://www.neilkodner.com/wp-content/uploads/2010/01/last_names.jpg"><img class="size-medium wp-image-154 " title="last_names" src="http://www.neilkodner.com/wp-content/uploads/2010/01/last_names-300x173.jpg" alt="500 most popular surnames" width="300" height="173" /></a><p class="wp-caption-text">top 500 surnames</p></div>
<div id="attachment_153" class="wp-caption alignnone" style="width: 310px"><a href="http://www.neilkodner.com/wp-content/uploads/2010/01/male_first.jpg"><img class="size-medium wp-image-153 " title="male_first" src="http://www.neilkodner.com/wp-content/uploads/2010/01/male_first-300x169.jpg" alt="male first names" width="300" height="169" /></a><p class="wp-caption-text">top male first names</p></div>
<div id="attachment_152" class="wp-caption alignnone" style="width: 310px"><a href="http://www.neilkodner.com/wp-content/uploads/2010/01/female_first.jpg"><img class="size-medium wp-image-152 " title="female_first" src="http://www.neilkodner.com/wp-content/uploads/2010/01/female_first-300x172.jpg" alt="female first names" width="300" height="172" /></a><p class="wp-caption-text">top female first names</p></div>
]]></content:encoded>
			<wfw:commentRss>http://www.neilkodner.com/2010/01/visualization-of-frequently-occurring-first-names-and-surnames-from-the-1990-census/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

